Skip to Main Content

Bioinformatics

Data Science Specialist

Profile Photo
Jeffrey Oliver
he/him/his
Contact:
Research Engagement
University Libraries
University of Arizona
Tucson, AZ 85721
520-626-9215
Social: Twitter Page

Want to learn more?

Here are some links to resources for learning how to use some of the more common bioinformatics analysis tools.

Bioconductor learning resources Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data. This community-driven project has several resources for learning how to use the tools in the Bioconductor R packages, including example workflows, hands-on courses (including an introduction to R, the language of Bioconductor), and videos on the Bioconductor YouTube channel.
CyVerse learning resources The CyVerse Learning Center has information about how to use the tools available at CyVerse (formerly the iPlant Collaborative), from creating an account to using APIs. The CyVerse wiki also has a growing list of tutorials for performing analyses on the CyVerse platform.
Galaxy learning resources The Galaxy Project bioinformatics pipeline has several videos with screencasts of how Galaxy can be used. The Galaxy 101 project provides a hands-on tutorial for getting started with the resource. And finally, several community-contributed learning resources are also available.
NCBI YouTube Channel NCBI has created several informational and training videos and made them available on the National Library of Medicine YouTube channel. These videos range from introductions of resources that are only a few minutes long to in-depth, multi-video series on using NCBI resources for next-generation sequencing analyses.
Trinity learning resources Trinity is a software suite for de novo transcriptome assembly. On the Trinity Wiki you can find a series of screencasts showing how Trinity is used. There is also an RNA-Seq Workshop, which provides hands-on experience using the Trinity analysis tools.

The field of bioinformatics-related literature is vast, but here are a few useful overviews and some "classics" in the field:

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology 215(3):403-410. doi:10.1016/S0022-2836(05)80360-2. BLAST (basic local alignment search tool) may be the most widely-used algorithm in bioinformatics and remains a standard in genomics research today.
  • Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene Ontology: tool for the unification of biology. Nature Genetics 25:25-29. doi:10.1038/75556. The concept of Gene Ontology, a formalized means of categorizing gene functions, locations, and the biological processes they are involved in, has proved immensely important for investigating biological pathways.
  • Dayhoff MO, Ledley RS (1962) Comprotein: a computer program to aid primary protein structure determination. In Proceedings of the Fall Joint Computer Conference, 1962, 262-274. Santa Monica, CA: American Federation of Information Processing Societies, 1962. doi:10.1145/1461518.1461546. A description of a computer program developed to determine protein structure, in what is considered by many as the first bioinformatics publication.
  • Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences USA 95:14863-41868. PMID: 9843981. This description of clustering analyses introduces the intuitive, and now standard, red/green presentation of differential gene expression.
  • Noble WS (2009) A quick guide to organizing computational biology projects. PLoS Computational Biology 5(7): e1000424. doi: 10.1371/journal.pcbi.1000424. This guide shows a logical organization structure with suggestions for managing diverse assets including computer code, data, and documents.
  • Searls DB (2010) The roots of bioinformatics. PLoS Computational Biology 6(6): e1000809. doi:10.1371/journal.pcbi.1000809. In this retrospective, Searls provides an explanation of the scientific legacy leading to the bioinformatics synthesis.
  • Shade A, Teal TK (2015) Computing workflows for biologists: A roadmap. PLoS Biology 13(11): e1002303. doi:10.1371/journal.pbio.1002303. This paper covers some best practices in bioinformatics in a must-read for students and faculty alike.
  • Check out bioinformatics publications from the past week
  • And you can always search for bioinformatics resources at the library!

A key skill in bioinformatics is the ability to interpret and write computer code for custom analyses. There are many resources for leaning how to write computer code, including online instruction, Software and Data Carpentry workshops, and University of Arizona courses. If you are just starting out, and want to learn how to write computer code for bioinformatic analyses, here are a few places to start:

Learn R
  • To get started in R, check out Swirl, which provides an interactive interface to the R language.
  • Software Carpentry has two introductory lessons for learning R (Lesson 1, Lesson 2), both designed for novice audiences.
Learn python
Learn Linux
  • Familiarity with the command-line interface of the Linux shell is a very important skill in bioinformatics, and you can get started with a novice lesson from the Software Carpentry Foundation.
  • There is also an interactive learning resource at learnshell.org.
  • And be sure to check out this YouTube video from NCBI on basic Linux commands.