

Microbiome studies can be performed using either short read technology (Bentley, 2006), see e.g., (Willmann et al., 2015) or long read sequencing technology (Jain, Olsen, Paten, & Akeson, 2016 Rhoads & Au, 2015), see e.g., (Arumugam et al., 2019). Long reads or assembled contigs require modified algorithms during alignment and binning, and both DIAMOND and MEGAN provide long read modes to operate on long (erroneous) sequences (Arumugam et al., 2019 Huson et al., 2018). DIAMOND is used as the main alignment engine in a number of analysis pipelines (Franzosa et al., 2018 Huson et al., 2016 Zhu et al., 2017).Īnalysis of short read microbiome samples usually involves determining the highest scoring alignments of each read to a set of reference sequences, followed by assignment to taxonomic and functional bins, using heuristics such as the naïve LCA (lowest common ancestor) approach for taxonomic binning (Huson et al., 2007) and the best hit approach for functional assignment (Huson, Mitra, Weber, Ruscheweyh, & Schuster, 2011). Our lab developed DIAMOND (Buchfink, Xie, & Huson, 2015b) to replace BLASTX in such analyses, providing a 20,000-fold speedup over BLASTX on short sequencing reads, while maintaining sufficient sensitivity. For subsequent studies involving hundreds of millions of reads (Mackelprang et al., 2011 Qin et al., 2010), BLASTX was run at super-computer centers. In early microbiome studies (Huson et al., 2007 Poinar et al., 2006 Venter et al., 2004), BLASTX (Altschul et al., 1997) was used to align small numbers of reads (on the order of hundreds of thousands) against a small database (in 2007, NCBI-nr contained approximately 2 million sequences). The core computation of the approach presented here is the translated alignment of microbiome sequences against the NCBI-nr database (Benson, Karsch-Mizrachi, Lipman, Ostell, & Wheeler, 2005). Finally, a proper biological understanding of processes within a given microbiome requires detailed knowledge of the proteins present and their alignments to reference sequences of known function (Willmann et al., 2015). Translated alignment ameliorates these issues to a degree because protein sequences are much more conserved than genomic sequences. Second, the high level of redundancy of genomic sequences causes performance issues when query sequences display very large numbers of equally good alignments. First, currently, genomic reference databases cover only a small fraction of the diversity present in the environment (Wu et al., 2009). Why align against protein sequences? While analysis of microbiome sequences using DNA alignment against genomic references is feasible, there are a number of issues with this approach.

In this approach, the sequences are first aligned against a reference database of protein sequences of known taxonomic and functional identity, and then the resulting alignments are used to assign the sequences to taxonomic and functional bins. Whereas diamonds of any carat size hold value and find use, diamonds between L and V color grades appear less often in jewelry.One main approach to taxonomic and functional binning of microbiome shotgun sequences is based on protein homology (Glass, Wilkening, Wilke, Antonopoulos, & Meyer, 2010 Huson, Auch, Qi, & Schuster, 2007). (Conversely, value goes down exponentially as the color grade decreases). (Once these diamonds have enough saturation to show these colors, they are automatically considered fancy).Īs with other diamond grading scales, diamond value goes up exponentially with each increase in grade. Such diamonds have enough color to be considered “fancy,” along with pink, green, and blue diamonds. Yellow or brown diamonds that make it past the Z grade, however, instantly go up in price.

Thus, the more color a stone has (yellow or brown), the lower the grade. The most highly valued diamonds have no color. Any stone within that range falls within the “normal color range.” The Gemological Institute of America (GIA) color scale for white or colorless diamonds ranges from grades D to Z.
