Data Science Specialist

Jeffrey Oliver's picture
Jeffrey Oliver
Office of Digital Innovation & Stewardship
University Libraries
University of Arizona
Tucson, AZ 85721

Databases and data resources

As the number of databases seems to grow daily, providing a comprehensive list is not feasible here, but take a look at Wikipedia's ever-growing list of biological databases to find more. The majority of these databases host publicly available data, and if you have any questions about accessing the data, contact me.

1K Genomes Project The 1000 Genomes Project is a large public catalog of human variation and genotype data. The project sequenced genomes of people across the globe in an effort to build a comprehensive understanding of human genomic variation.
ENCODE The goal of ENCODE (ENCyclopedia Of DNA Elements) is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active.
KEGG The Kyoto Encyclopedia of Genes and Genomes is a database resource for understanding high-level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput experimental technologies.
Reactome Reactome is a free, open-source, curated and peer-reviewed pathway database. The goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic research, genome analysis, modeling, systems biology and education.
UniProt The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data. Drawing from multiple data sources, the resource includes information on over 5,000 proteomes (the set of all proteins produced by an organism) as well as information about which proteins are involved in human disease.
The Cancer Genome Atlas The Cancer Genome Atlas includes comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer to improve the prevention, diagnosis, and treatment of cancer.
CDC Interactive Database Systems The Centers for Disease Control and Prevention host the Integrative Database System with frequently updated information about human health, including a weekly summary of national mortality due to infectious disease.
Comparative Toxicogenomics Database The Comparative Toxicogenomics Database is a publicly available database of information about chemical-gene/protein interactions and chemical-disease and gene-disease relationships.
BISON The U.S. Geological Survey developed Biodiversity Information Serving Our Nation, an integrated and permanent resource for biological occurrence data from the United States. Over a quarter-billion records can accessed through the web portal or via the BISON API.
Biodiversity Information Standards Biodiversity Information Standards (TDWG) (formerly the Taxonomic Diversity Working Group) develops standards for storing and sharing data about organisms. This group is responsible for the development of Darwin Core, an extension of Dublin Core for sharing information on biological diversity.
GBIF Observational records of millions of organisms are accessible via the Global Biodiversity Information Facility (GBIF). Similar data for marine organisms is available at the Ocean Biogeographic Information System.
Encyclopedia of Life The Encyclopedia of Life gathers, generates, and shares data on all species known to science. This portal provides descriptions, classifications, pictures, and geographical information about life on earth.
Tree of Life Web Project The Tree of Life Web Project is a community-contributed resource with pictures, text, and other information for organisms, living or extinct. Connections between Tree of Life web pages follow phylogenetic branching patterns between groups of organisms, so visitors can learn about evolutionary history as well as the characteristics of individual groups.