Data Science Specialist

Jeffrey Oliver
Online Tools

A number of online analytical resources exist where researchers can take advantage of high performance computing clusters to analyze large-scale datasets. The resources listed here are just a few of the most commonly used tools in the field of bioinformatics.

CyVerse CyVerse provides a national cyberinfrastructure for life science research as well as training scientists in using such high performance computing resources. Formerly the iPlant Collaborative, CyVerse provides a platform for data storage, bioinformatics tools, image analyses, and cloud services. CyVerse also allows programmatic access through multiple APIs. The CyVerse Learning Center provides valuable information for getting started using CyVerse resources.
Galaxy Galaxy is a platform for performing data-intensive biomedical research. This open source project can be used either on the free, public server or as your own instance. With thousands of tools available, Galaxy allows you to build robust, reproducible bioinformatic pipelines. You can also use one of the hundreds of published workflows for analyzing your own data. A number of Galaxy tutorials are available, providing examples of how to use Galaxy.
Gene Expression Omnibus GEO2R can be used to perform differential gene expression analyses on datasets that are hosted at NCBI's Gene Expression Omnibus. The tool allows you to define how samples are compared and offers flexibility in significance calculations and output formats. Best of all, all the calculations are performed in real-time with a dynamically generated R script which can be copied and used on your own data!
NCBI Entrez APIs The E-utilities API provides programmatic access to several NCBI databases, including PubMed, Genome, Nucleotide, and dbSNP; this API allows you to perform queries without using a web browser. EDirect is a library of perl scripts that offers UNIX command-line utilities for automating E-Utilities queries and processing results.
AmiGO The Gene Ontology Consortium has produced a suite of tools available through the AmiGO site. In addition to navigating Gene Ontology annotations, AmiGO offers enrichment analysis tools and an SQL interface for querying the GO database.
CIPRES Providing cyberinfrastructure for evolution research, CIPRES is a public resource for the estimation of large phylogenetic trees. The high performance inference tools include RAxML and MrBayes. A number of sequence alignment tools, including MAFFT, MUSCLE, and ClustalW are also available. Online tutorials show how to use the data management and analytical resources available at CIPRES.