70-8 Selecting the Best Variables from Otolith Tracers to Maximize Classification for Linear Datasets
Developing a classification model that accurately indentifies the provenance of individuals is central in understanding the dynamics of any population. Tracers from the otolith, such as trace element chemistry, stable isotope composition, and otolith microstructure each describe unique aspects of a habitat. Within these tracers, there are a suite of variables that can be used to separate groups. For example, there are over seven trace element variables that are commonly used in classification studies. In constructing a classification model, limited guidelines are given in selecting variables that maximize classification. Variables used in the model are frequently chosen because they can be easily obtained, or because they have been widely used by other investigators. More recently, variables have been selected because their mean concentrations differ among areas as determine by analysis of variance, or through machine learning algorithms. These methods do not address the information conveyed by each variable, and incorporating those that add no habitat information can reduce classification accuracies. We propose a parametric method for statistically selecting variables that are important for separating groups in linear data sets. Using Rao’s additional information criteria, we determined the change in statistical distance that occurs by the addition of a new variable to the model. We classified juvenile spotted seatrout (Cynoscion nebulosus) to nursery seagrass habitats in Chesapeake Bay by using linear discriminant function analysis (LDFA). Using trace element chemistry, stable isotopes (δ13C, δ18O) and growth parameters, we found that all variables from these tracers do not convey information that is important in delineating habitats. From Rao’s criteria, we obtained maximum classification using three variables from a suite of 12. Barium, δ13C, and Y conveyed sufficient information to classify fish with over 75% accuracy. This method directly assesses the information conveyed by each variable and determines those that better predict natal habitat. From this method we offer a direct approach to selecting variables for classification that is simple, yet powerful, and is ideally suited for LDFA.