Simply put, bioinformatics is the application of computer science to analyze biological data. It is an interdisciplinary field, drawing from mathematics and statistics to afford understanding of biological datasets that are too large or too complex to be analyzed by conventional methods. Originally coined in 1970 to describe the dynamics of information in biological systems (Hogeweg 2011), the term "bioinformatics" is currently applied to the acquisition and analysis of data on biomolecules, especially DNA, RNA, and proteins.
The field has exploded in recent years due to the growth in the amount of available data and the increasing computational resources necessary and available to analyze such data. As a case in point, the first human genome sequence took approximately 13 years to complete at a cost somewhere between $500 million and $3 billion USD (National Human Genome Research Institute 2016, Wetterstrand 2016). In 2015, a complete human genome could be completed on the order of days for a cost of approximately $1,500.
The field of bioinformatics is still evolving, and by some definitions, bioinformatics now includes large-scale studies of biodiversity, evolutionary biology, and public health. While definitions vary, most scientists would agree that bioinformatics has been instrumental in extracting understanding for a variety of fields in this era of big data.
Here's a sampling of what we learned from bioinformatics:
- The communities of bacteria that live in our digestive system, also known as our "gut microbiota," are dramatically important in our well-being, and disturbances to this community of bacteria can have profound effects on human health (Clemente et al. 2012).
- Dogs were mostly likely domesticated by hunter-gatherers, before humans started forming large agricultural communities. This discovery was possible due to whole-genome sequencing and comparison among domestic dogs (Canis lupus familiaris) and wolves (Canis lupis) (Freedman et al. 2014, Skoglund et al. 2015).
- Several genes contribute to Parkinson's disease, reflecting the complex nature of the disease. Through meta-analysis of several genome-wide association studies, Nalls et al. (2014) showed that multiple genes of small effect, rather than a few genes of large effect, contribute to the risk of Parkinson's disease.
- As part of their mission to identify emerging infectious disease, the CDC discovered the Bourbon virus through next-generation sequencing (Kosoy et al. 2014). Applying bioinformatic analyses to the DNA, scientists showed this disease was most closely related to Dhori and Batken viruses, which have never been recorded from the Western Hemisphere.
- Clemente JC, Ursell LK, Parfrey LW, Knight R (2012) The Impact of the Gut Microbiota on Human Health: An Integrative View. Cell 148(6)1258-1270. doi:10.1016/j.cell.2012.01.035
- Freedman AH et al. (2014) Genome Sequencing Highlights the Dynamic Early History of Dogs. PLoS Genetics 10(1): e1004016. doi:10.1371/journal.pgen.1004016
- Hogeweg P (2011) The Roots of Bioinformatics in Theoretical Biology. PLoS Computational Biology 7(3): e1002021. doi:10.1371/journal.pcbi.1002021
- Kosoy OI, Lambert AJ, Hawkinson DJ, Pastula DM, Goldsmith CS, Hunt DC, Staples JE (2014) Novel Thogotovirus species associated with febrile illness and death, United States, 2014. Emerging Infectious Diseases 21(5):760-764. doi:10.3201/eid2105.150150
- Nalls MA et al. (2014) Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease. Nature Genetics 46:989-993. doi:10.1038/ng.3043
- National Human Genome Research Institute (2016) The Cost of Sequencing a Human Genome. URL: www.genome.gov/27565109/the-cost-of-sequencing-a-human-genome/. Accessed 2016-06-23.
- Skoglund P, Ersmark E, Palkopoulou E, Dalén L (2015) Ancient Wolf Genome Reveals an Early Divergence of Domestic Dog Ancestors and Admixture into High-Latitude Breeds. Current Biology 25(11):1515-1519. doi:10.1016/j.cub.2015.04.019
- Wetterstrand KA (2016) DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP). URL: www.genome.gov/sequencingcostsdata. Accessed 2016-06-23.