NOTE: We are working on migrating this site away from MediaWiki, so editing pages will be disabled for now.

Argos

From GMOD
Revision as of 18:43, 25 January 2007 by 165.124.152.78 (Talk)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Description

Argos, a.k.a. Flybase-NG, a.k.a. biodb, is designed to provide automatic
replication, installation and updates of genome and organism databases
and information servers, including FlyBase and euGenes. It should be not
too difficult to add other organism/genome services to this replication
structure.

Its main value is a collection of pre-tested and implemented
common database/information service tools needed for organism database
systems, which can be automatically distributed and updated to any
computer.

The replication includes scripts, configurations, data, and Unix binaries
for all needed programs except Perl, Java and rsync. Rsync is
used as the primary distribution program.

This is server distribution system is still in development. It
will be possible to use this for automated updates of mirrored
servers. Uses of an automated server distribution system include
local use for load distribution (apache backhand module is included
for this), world-wide mirror sites for rapid local access,
institution/company mirror servers for local projects and
data mining. An automatable mirroring system differs from the
method of providing software and data downloads by FTP in that
packages of data and software in this system are kept up-to-date
without human intervention. Similar package management systems
such as RPM, pacman and others are well developed tools but don't
quite meet the needs of this bio-database distribution.

The basic system structure is:


  common/
    java/ ; perl/ -- language packages
    servers/ -- major programs (blast, dbms, internet servers)
    systems/ -- operating system binaries of programs, packages
  docs/      -- general documents
  logs/      -- server logs
  template/  -- template information system structure
  flybase/   -- implemented genome information system structures
  eugenes/
  daphnia/

This design allows segregation common infrastructure from
project-specific parts. Projects may contain any needed software
along with data, web docs, database files, etc. A common symbolic link
folder in each project is used to access the common software structure.

Per-package installations and updates are available, to
allow customer choices of packages.
This includes logic to update infrastructure software from
different source sites, and focuses on using rsync as
primary distribution/update tool (ftp, http, others are possible;
rsync has needed file-system aware updating methods).

Evaluation of RPM, pacman, cluster-backup/mirror tools,
grid packaging tools found none were quite right, so a 'quick hack'
perl installation program has been built.

Developer notes
Current developers are Don Gilbert, Nihar Sheth and Victor
Strelets for FlyBase-NG and euGenes uses. We hope others will
try it and join us in using and developing it. Email us at
argos@eugenes.org or flybase-ng@flybase.net

Contents in cvs.gmod.sourceforge.net:/cvsroot/gmod/argos/ for this project are
installation and configuration files. CVS is not designed for storage
and distribution of bulk data, program binaries, and the many package
installations included in Argos repositories.

For this, argos/install/packages.conf configurations point to source servers
for fetching ready-to-use packages, similar to the distribution system
used by Globus.org for grid computing packages that are distributed to
multiple grid node computers.

Also it is presumed that each implemented service will maintain software
and documents separately from Argos, as the open-source software
collected into the Argos commons are separately maintained, but installed
for use with Argos.

GMOD developers can add new package sets to the
argos/install/packages.conf which point to rsync servers for
the packages.

Demo & Screenshots
Genome information systems running in Argos are at
http://flybase.net/flybase-ng/


A slide set outlines Argos/FlyBase-NG here: flybase-ng-may03.
ppt
html


These are overviews of FlyBase's server system structures:
last generation
-- next generation
Requirements
A current Unix computer, with several free Gigabytes of disk space, depending
on which system packages are to be installed. The following software needs
to be pre-installed on the system. Argos includes all other packages needed
for its operation, drawn from common open-source software tools and packages used for
bioinformatics databases and information systems.

These packages need to be and commonly are preinstalled

The Argos system will replicate updates to compiled programs for these
operating systems, obviating need for any human-attended compiling and
installation. Unix systems that have binary package support are:

  • Apple MacOSX (v10.2 build)
  • Intel Linux (kernel 2.4 build)
  • Sun Solaris (v8 build)

In this alpha 0.3 (june 2003) release,
installation of common Argos packages uses ~ 200 MB of disk.
Installation of a full FlyBase service uses ~ 2.5 GB of disk.
Installation of a full euGenes service uses ~ 4 GB of disk.

Documentation

Quick start:
Fetch

argos/install/installng.pl

and run from a command line ('perl installng.pl').

Summary of steps to installation of a Argos server system

  1. Fetch the install script from a command line with

    rsync rsync://flybase.net/biodb/install/installng.pl .
    (or use web link on this page)
  2. Run perl installng.pl

    for summary help.
  3. Run perl installng.pl -root=/usr/local/biodb -install

    to create root folder and fetch the installation package
    (location for -root= is your choice; change below steps to match)
  4. Edit /usr/local/biodb/install/install.conf.local

    to set configuration. Change package set, paths and ports
    as desired in this install.conf.
  5. Run /usr/local/biodb/install/installng.pl -install

    to add the full set of packages. Packages selected from
    packages.conf will by copied from servers.
  6. Run /usr/local/biodb/install/run-apache

    to start servers
  7. Run /usr/local/biodb/install/installng.pl -update

    to update server periodically.
Downloads
  • Argos-based servers:
    http://flybase.net/flybase-ng/
    for FlyBase Next Generation,
    euGenes genome database,
    and other services in development.
  • Main package distribution: rsync://flybase.net/biodb

with project-specific packages distributed from
other servers, as specified in the

argos/install/packages.conf