Workshop Laboratory 1Navigating Genome Databases on the World Wide WebThe number of genome databases available on the World Wide Web has grown explosively in the last few years. These databases are not simply sequence repositories, but include a variety of data types including genetic mapping and literature reference data. These database sites also often include many administrative functions to help members of the community researching a specific genome such as gene name registries and colleague databases. The primary goal of these databases is generally shared by all; to store and curate large amounts of a wide variety of data while making access and analysis of this data easily and freely available. To this end, most genome databases use a database program called ACEDB, which is mainly run on Unix-based computers. This system is powerful, but not particularly easy to use when constructing database queries. So, genome database curators have designed web-based interfaces to ACEDB, and the other tools and data, that are much easier to navigate. These interfaces also strive to make interpretation of the data more intuitive by providing, for example, graphical representations of mapping data and hyperlinking to other data-types within the database. Each research community, however, has different ideas of how this common goal should or can be accomplished. Therefore, genome databases may employ many different tools to access, annotate, and search for genome data. Most employ some type of text-based, Boolean query form on their web pages, but many also provide for easy interactive browsing of their data using JAVA applets or other programs embedded in their pages (see CyanoBase for an excellent example). In this exercise, we want you to explore several of these databases and become familiar with the various ways data queries can be generated. We have provided some suggested routes for exploration in each database, but feel free to go where you want and ask the questions you are interested in. Saccharomyces Genome Database (SGD) http://genome-www.stanford.edu/SaccharomycesCurated by the SGD staff in the Department of Genetics of the Stanford University School of Medicine. This database includes a database of the molecular biology of the yeast Saccharomyces cerevisiae (SacchDB) as well as the S. cerevisiae Gene Name Registry and graphical chromosome maps.
The cross-linking and integration of such a wide variety of data types is the power of these database systems and interfaces. There are several more tutorials available from the "Help" link on the left of the main SGD page. Once in SGD Help, click on "Hot Tips" and explore the tutorials. Mouse Genome Database (MGD) http://www.informatics.jax.orgCurated by the Jackson Laboratory, Bar Harbor ME. This database focuses on the genetics and biology of the laboratory mouse. It includes a large variety of data including strain and polymorphism data. This database also allows for some basic comparisons of the mouse and human genomes:
Feel free to play with other species comparisons and explore the other options available from the main MGD page. Flybase http://flybase.bio.indiana.eduThis site at Indiana University is a mirror of the database curated at Harvard, and maintains a rich variety of data on Drosophila melanogaster. Drosophila have been a model system for studying the homeobox class of genes involved in organismal development.
The remaining databases will provide similar methods for accessing genome data. Visit these sites and explore. C. elegans Genome Database (ACeDB) http://probe.nalusda.gov:8300/cgi-bin/browse/acedbThis database is maintained by the USDA and is the original genome database to use what became the ACEDB software. ACeDB (with a small "e") stands for "A C. elegans DataBase" while ACEDB (with the capital "E") refers to the software. This interface to the ACEDB data is more rudimentary and requires more knowledge of the ACEDB data structure for more complex searches. It does, however, provide significant guidance in learning this structure. C. elegans is also a model organism for studying homeobox domains. Try finding information on these as a query. GrainGenes Database http://wheat.pw.usda.govGrainGenes is a compilation of molecular and phenotypic information on wheat, barley, oats, rye, and sugarcane. The project is supported by the USDA/NAL Plant Genome Research Program. This database interface is the same as for the C. elegans DB. Methanococcus jannaschii Genome Database (MJDB) http://www.tigr.org/tdb/mdb/mjdb/mjdb.htmlCurated at The Institute for Genomic Research (TIGR). This database is primarily a sequence repository and a few search tools. No genetic mapping data is included. This bacterial strain was one of the first reported fully sequenced genomes. CGSC http://cgsc.biology.yale.edu/cgsc.htmlThe E. coli Genetic Stock Center curates this site and provides access to genotype, strain, gene name, linkagemap, and gene product information. Much comparison of genomes is being done in bacteria and a large portion of it is directed at finding homologous genes involved with a variety of metabolic pathways. Try searching for "enolase" in both MJDB and CGSC. Genome Database (GDB) http://gdbwww.gdb.orgThe Genome Database is curated by the GDB organization and this site is hosted by The Johns Hopkins School of Medicine. This database has contained the genomic mapping data from the Human Genome Initiative, but funding for this project was recently discontinued by the US-DOE. If you have time, you should visit this database and look at the resource that is being lost. |
![]() Developed and Maintained by Mark S. Whitsitt Last Updated: Saturday, June 06, 1998 12:29 PM |