77-26 What You Don't Know Can Hurt You: Bias in Genotyping Uncertainty

Sewall F. Young , Molecular Genetics Laboratory, Washington Department of Fish and Wildlife, Olympia, WA
Scott Blankenship , Washington Department of Fish and Wildlife, Olympia, WA
Kenneth I. Warheit , Science Division, Washington Department of Fish and Wildlife, Olympia, WA
Genetic variation in natural and cultured populations is commonly used to make inferences about population structure and history, ecological relationships, adaptive responses to environmental variation, and patterns of distribution and human exploitation.  Declining costs of data collection, widely available software packages for genetic data analyses and increased genetic literacy among biologists have put the power of population genetics applications into the fisheries managers’ toolbox.  But all quantitative analyses, genetic and non-genetic, rely on representative data and it is important to understand potential sources of bias in data collection.   In the case of genotype data collection, we show that exclusion of ambiguous data points in genotyping cluster plots is non-random and can introduce bias into allele and genotype frequency estimates.  At a diallelic locus in a population sample where both alleles occur at a frequency of 0. 5, exclusion of a subset of heterozygotes will not change allele frequency estimates but it will decrease the observed heterozygosity and can lead to erroneous inferences about random mating.  If allele frequencies are not equal in the population, then exclusion of a subset of heterozygotes will cause an underestimate of the minor allele frequency which can undermine applications of allele frequency data.   We explore the potential magnitude of the bias and suggest strategies for minimizing its effects on data analyses.  Our recommendations include careful screening of genotyping assays to eliminate those that produce weak clustering, using laboratory processes that improve genotyping signals, and designing genotyping studies with an excess of markers to allow post-hoc elimination of loci with lots of missing data.