Clustering of Pyrosequence Data: Methods to Produce Accurate Estimates of Species Richness

Flynn, Jullien

The combination of pyrosequencing with metabarcoding can be a rapid and effective approach for the early detection of invasive species. However, the technically complex nature of the data can potentially lead to taxon misidentification and to inaccurate estimations of species richness. For properly identifying invasive species using such data, one of the most important steps is the clustering of sequences by similarity into operational taxonomic units (OTUs) to distinguish between species in the sequenced sample. Some critical factors include the choice of the clustering algorithm and the treatment of gaps in the alignment for the sequence identity calculation. Here we assembled an artificial community of 69 known zooplankton species and tested three algorithms (UCLUST, UPARSE and mothur) each under three alternative identity calculations that treat the presence and length of sequence gaps differently in the assignment of OTUs. Depending on the algorithm and identity calculations used, we can recover a number of OTUs close to the number of species in the artificial community (from 51 to 68 OTUs using UCLUST). Through this work, we hope to be able to identify the most appropriate clustering algorithm and parameters to accurately describe species richness and identify taxa in natural samples.

144th Annual Meeting of the American Fisheries Society

August 17 - 21, 2014

P-63
Clustering of Pyrosequence Data: Methods to Produce Accurate Estimates of Species Richness

P-63 Clustering of Pyrosequence Data: Methods to Produce Accurate Estimates of Species Richness

P-63
Clustering of Pyrosequence Data: Methods to Produce Accurate Estimates of Species Richness