T-122-12
Data Validation and Sharing in a Large Research Program

Curt Seeliger , c/o USEPA, Raytheon Information Services, Corvallis, OR
Philip Kaufmann , Western Ecology Division, United States Environmental Protection Agency, Corvallis, OR
Each year, the US EPA’s National Aquatic Resource Surveys (NARS) collect data from more than 1000 sites, rotating every 5 years through lakes, wadeable streams, boatable rivers, wetlands and near coastal estuaries.  Physical habitat data from lakes and flowing waters constitute a substantial portion of the data generated by NARS.  Thousands of pieces of information collected from each site contribute to raw data files that must be verified and validated before useable metrics describing habitat structure and condition can be calculated.  The data must be also stored securely and made available to internal and external users with shifting and competing priorities.  Appropriate data handling practices are critical to supporting the national ecological condition assessments based on these data, and facilitating the use of this data by other federal agencies, the states, tribes and academic researchers. Determining best data handling practices is an ongoing effort for the US EPA’s National Aquatic Resource Surveys.  In this presentation, we focus on the well-understood data validation process of NARS and its still-evolving data sharing process. Both processes rely on version controls at different scales to enhance flexibility and repeatability of analyses, while minimizing cost and inefficiencies.