Bioinformatics

STEM Cohort

Databases and data resources

As the number of databases seems to grow daily, providing a comprehensive list is not feasible here, but take a look at Wikipedia's ever-growing list of biological databases to find more. The majority of these databases host publicly available data, and if you have any questions about accessing the data, contact me.

	The 1000 Genomes Project is a large public catalog of human variation and genotype data. The project sequenced genomes of people across the globe in an effort to build a comprehensive understanding of human genomic variation.
	The goal of ENCODE (ENCyclopedia Of DNA Elements) is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
	The Kyoto Encyclopedia of Genes and Genomes is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
	Reactome is a free, open-source, curated and peer-reviewed pathway database. The goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education.
	The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. Drawing from multiple data sources, the resource includes information on over 5,000 proteomes (the set of all proteins produced by an organism) as well as information about which proteins are involved in human disease.

	The Cancer Genome Atlas includes comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer to improve the prevention, diagnosis, and treatment of cancer.
	The Centers for Disease Control and Prevention host the Integrative Database System with frequently updated information about human health, including a weekly summary of national mortality due to infectious disease.
	The Comparative Toxicogenomics Database is a publicly available database of information about chemical-gene/protein interactions and chemical-disease and gene-disease relationships.

	Biodiversity Information Standards (TDWG) (formerly the Taxonomic Diversity Working Group) develops standards for storing and sharing data about organisms. This group is responsible for the development of Darwin Core, an extension of Dublin Core for sharing information on biological diversity.
	Observational records of millions of organisms are accessible via the Global Biodiversity Information Facility (GBIF). Similar data for marine organisms is available at the Ocean Biogeographic Information System.
	The Encyclopedia of Life gathers, generates, and shares data on all species known to science. This portal provides descriptions, classifications, pictures, and geographical information about life on earth.
	The Tree of Life Web Project is a community-contributed resource with pictures, text, and other information for organisms, living or extinct. Connections between Tree of Life web pages follow phylogenetic branching patterns between groups of organisms, so visitors can learn about evolutionary history as well as the characteristics of individual groups.

Bioinformatics

STEM Cohort

Databases and data resources

Information for

Libraries & Locations

Search form

Bioinformatics

STEM Cohort

Databases and data resources

Information for

Libraries & Locations

Connect