Context
These additional data quality checks can be run on MERMAID projects with fish belt data to test and visualize differences among observers, detect outliers, and explore whether shark observations represent a large proportion of the total biomass.
In this example I export public data (i.e. from projects that have a fish belt data sharing policy of "public"). I have used two different projects in this case - one with multiple observers (for observer comparisons) and one with shark observations (to illustrate the shark analysis section). However, the underlying code can be applied to any project for which you can access observation level data. For example, I provide code in the full resource (see link above) that can be used to apply these data checks to a project for which you are a member, regardless of the data sharing policy.
Observer comparisons
Data checks for observers includes tables and plots to compare biomass and taxonomic identification among observers. We include statistical comparisons among observers, but any significant differences should be interpreted with caution as observers may survey in different locations and at different times. Therefore, observer differences could reflect spatial or temporal variation in fish communities rather than true observer effects.
For this example I have anonymized the observer names but the section of the code that anonymizes can be removed to show the actual names.
Outlier detection
The data checks include identifying sites that have unusually high biomass in the project and visualizing the spread of fish biomass and fish abundance across sites.
Sites identified as statistical outliers warrant further investigation. However, these should also be interpreted with caution as differences may represent true ecological variation (e.g., protected areas, unique habitat features) rather than data errors.
Shark observations
Since shark observations can represent a large proportion of biomass, I have included visualizations of shark biomass compared to total fish biomass for sites with shark observations. A further data check for sharks specifically is a comparison of the estimated weights for each shark observed against published weight for the species observed (from FishBase). If any individual shark observations are flagged as being greater than published weights it may warrant further investigation. However, maximum published weights are not available for all shark species and are sparse in others so this should be interpreted with caution as well.
Data summary
In addition to the individual data checks, the code that presents overall summaries of the project data, like the following:
Sample Coverage
Total sample events: 47
Total transects: 176
Total observations: 2187
Unique sites: 36
Unique observers: 3
Taxonomic Diversity
Fish families: 21
Fish genera: 41
Fish species: 79
Biomass Statistics (kg/ha)
Mean: 168
Median: 113.6
SD: 199.1
Range: 19.6 - 1217.1
Data Quality Flags
High biomass outliers (>90th percentile): 5
Statistical outliers (IQR method): 12
Sites with shark observations: 10