Workshop Laboratory 1

Navigating Genome Databases on the World Wide Web


The number of genome databases available on the World Wide Web has grown explosively in the last few years. These databases are not simply sequence repositories, but include a variety of data types including genetic mapping and literature reference data. These database sites also often include many administrative functions to help members of the community researching a specific genome such as gene name registries and colleague databases.

The primary goal of these databases is generally shared by all; to store and curate large amounts of a wide variety of data while making access and analysis of this data easily and freely available. To this end, most genome databases use a database program called ACEDB, which is mainly run on Unix-based computers. This system is powerful, but not particularly easy to use when constructing database queries. So, genome database curators have designed web-based interfaces to ACEDB, and the other tools and data, that are much easier to navigate. These interfaces also strive to make interpretation of the data more intuitive by providing, for example, graphical representations of mapping data and hyperlinking to other data-types within the database.

Each research community, however, has different ideas of how this common goal should or can be accomplished. Therefore, genome databases may employ many different tools to access, annotate, and search for genome data. Most employ some type of text-based, Boolean query form on their web pages, but many also provide for easy interactive browsing of their data using JAVA applets or other programs embedded in their pages (see CyanoBase for an excellent example).

In this exercise, we want you to explore several of these databases and become familiar with the various ways data queries can be generated. We have provided some suggested routes for exploration in each database, but feel free to go where you want and ask the questions you are interested in.


Saccharomyces Genome Database (SGD) http://genome-www.stanford.edu/Saccharomyces

Curated by the SGD staff in the Department of Genetics of the Stanford University School of Medicine. This database includes a database of the molecular biology of the yeast Saccharomyces cerevisiae (SacchDB) as well as the S. cerevisiae Gene Name Registry and graphical chromosome maps.

  1. Select "Gene/Sequence Resources" from the main page.

  2. Under "Enter a Name" type ac* and click the "Submit Form" button.

  3. Click on the link to ACT4 from the list of available sequence names.

    Note that ACT4 is a non-standard gene name for ARP3. Hence the information on ARP3.

  4. Click the link "Gene Details" under the Biology/Literature heading.

    Now a number of types of data about ARP3 are presented. These include genetic map locus information, physical map location, DNA open reading frame information, and information on the ARP3 protein.

    Try several hyperlinks to see what information is available.

  5. Return to the Gene/Sequence Resources page for Locus ARP3 (use the back button if necessary). Under Sequence Retrieval, select several links for the different types of sequence information available. Compare sequence file formats.

  6. Return to the main SGD page. Select the "Maps" link on the left side of the page and click on "Physical and Genetic Maps".

    Look at the explanation of how maps are presented in this database. This information may or may not be the same for other databases, nor will this information necessarily be as easily available.

  7. Select chromosome I to view the map.

    A clickable image of the combined physical and genetic maps is shown. Note that ORFs on different strands are indicated. ORFs that have been both physically and genetically mapped are distinguished from those that have only been mapped physically.

  8. At the top of the map on the right side of the blue bar, click on the gene cdc24.

    Now, information on this locus similar to what we found for  ARP3 is presented. Select one of the links under the "Reference" heading. The page displayed has a summary of the paper selected as well as links to the PubMed entry for it and other related papers.

  9. Return to the cdc24 gene information page. Click on Mapping_data for one of the cdc24 loci. Follow links as they interest you.

The cross-linking and integration of such a wide variety of data types is the power of these database systems and interfaces.

There are several more tutorials available from the "Help" link on the left of the main SGD page. Once in SGD Help, click on "Hot Tips" and explore the tutorials.

Mouse Genome Database (MGD) http://www.informatics.jax.org

Curated by the Jackson Laboratory, Bar Harbor ME. This database focuses on the genetics and biology of the laboratory mouse. It includes a large variety of data including strain and polymorphism data.

This database also allows for some basic comparisons of the mouse and human genomes:

  1. Select the "Mamalian Homology and Comparative Maps" link.

  2. Click on "Oxford Grid".

    An Oxford grid displays a comparison of the number of homologous genes that have been mapped to chromosomes in two species. For  more information on Oxford grids, select the link in the heading "Retrieve an Oxford Grid".

  3. From the "Retrieve an Oxford Grid" page, select "mouse" under "Species - columns" and "human" under "Species - rows" and click "Retrieve".

    The table displayed has human chromosomes along the vertical axis and mouse chromosomes along the horizontal axis. In order to view a more detailed comparison of a specific chromosome, click one of the blue numbers along the mouse axis.

    The resulting page contains several sections of the mouse chromosome with genetic markers listed. Selecting a marker will provide links to more specific comparative data.

Feel free to play with other species comparisons and explore the other options available from the main MGD page.

Flybase http://flybase.bio.indiana.edu

This site at Indiana University is a mirror of the database curated at Harvard, and maintains a rich variety of data on Drosophila melanogaster. Drosophila have been a model system for studying the homeobox class of genes involved in organismal development.

  1. On the main Flybase page, search the database under "genes" for "homeobox" using "All text".

  2. Click on the link for ANTC. ANTC is a gene involved in the development of antenna.

    The FlyBase Gene Report page has various links to sequence information and mapping data. Try viewing the "Graphic map" of the region surrounding this locus.

The remaining databases will provide similar methods for accessing genome data. Visit these sites and explore.

C. elegans Genome Database (ACeDB) http://probe.nalusda.gov:8300/cgi-bin/browse/acedb

This database is maintained by the USDA and is the original genome database to use what became the ACEDB software. ACeDB (with a small "e") stands for "A C. elegans DataBase" while ACEDB (with the capital "E") refers to the software. This interface to the ACEDB data is more rudimentary and requires more knowledge of the ACEDB data structure for more complex searches. It does, however, provide significant guidance in learning this structure.

C. elegans is also a model organism for studying homeobox domains. Try finding information on these as a query.

GrainGenes Database http://wheat.pw.usda.gov

GrainGenes is a compilation of molecular and phenotypic information on wheat, barley, oats, rye, and sugarcane. The project is supported by the USDA/NAL Plant Genome Research Program. This database interface is the same as for the C. elegans DB.

Methanococcus jannaschii Genome Database (MJDB) http://www.tigr.org/tdb/mdb/mjdb/mjdb.html

Curated at The Institute for Genomic Research (TIGR). This database is primarily a sequence repository and a few search tools. No genetic mapping data is included. This bacterial strain was one of the first reported fully sequenced genomes.

CGSC http://cgsc.biology.yale.edu/cgsc.html

The E. coli Genetic Stock Center curates this site and provides access to genotype, strain, gene name, linkagemap, and gene product information.

Much comparison of genomes is being done in bacteria and a large portion of it is directed at finding homologous genes involved with a variety of metabolic pathways. Try searching for "enolase" in both MJDB and CGSC.

Genome Database (GDB) http://gdbwww.gdb.org

The Genome Database is curated by the GDB organization and this site is hosted by The Johns Hopkins School of Medicine. This database has contained the genomic mapping data from the Human Genome Initiative, but funding for this project was recently discontinued by the US-DOE. If you have time, you should visit this database and look at the resource that is being lost.


Next - Using the Biology Workbench
Up - Main Page


summer_w_sm.jpg (9409 bytes)NCSAsm.gif (1758 bytes)
Developed and Maintained by Mark S. Whitsitt
Last Updated: Saturday, June 06, 1998 12:29 PM