Fishbelt Data Checks

Published 14 Jan 2026, Updated 18 Feb 2026
89
By Iain R. Caldwell - Lead Data Analyst, MERMAID,
Fishbelt
Quarto
R
Bar Plot
Scatterplot
Histogram

Context

These additional data quality checks can be run on MERMAID projects with fish belt data to test and visualize differences among observers, detect outliers, and explore whether shark observations represent a large proportion of the total biomass.

In this example I export public data (i.e. from projects that have a fish belt data sharing policy of "public"). I have used two different projects in this case - one with multiple observers (for observer comparisons) and one with shark observations (to illustrate the shark analysis section). However, the underlying code can be applied to any project for which you can access observation level data. For example, I provide code in the full resource (see link above) that can be used to apply these data checks to a project for which you are a member, regardless of the data sharing policy.

Observer comparisons

Data checks for observers includes tables and plots to compare biomass and taxonomic identification among observers. We include statistical comparisons among observers, but any significant differences should be interpreted with caution as observers may survey in different locations and at different times. Therefore, observer differences could reflect spatial or temporal variation in fish communities rather than true observer effects.

For this example I have anonymized the observer names but the section of the code that anonymizes can be removed to show the actual names.

Biomass by observer.
Taxonomic identification by observer.

Outlier detection

The data checks include identifying sites that have unusually high biomass in the project and visualizing the spread of fish biomass and fish abundance across sites.

Sites identified as statistical outliers warrant further investigation. However, these should also be interpreted with caution as differences may represent true ecological variation (e.g., protected areas, unique habitat features) rather than data errors.

Biomass and abundance by site with red points indicating sires with unusually high biomass (z-scores >2), falling above the red threshold line.

Shark observations

Since shark observations can represent a large proportion of biomass, I have included visualizations of shark biomass compared to total fish biomass for sites with shark observations. A further data check for sharks specifically is a comparison of the estimated weights for each shark observed against published weight for the species observed (from FishBase). If any individual shark observations are flagged as being greater than published weights it may warrant further investigation. However, maximum published weights are not available for all shark species and are sparse in others so this should be interpreted with caution as well.

Shark biomass vs. total fish biomass by site.
Weights of shark observations compared to maximum published weights by species.

Data summary

In addition to the individual data checks, the code that presents overall summaries of the project data, like the following:

Sample Coverage

  • Total sample events: 47

  • Total transects: 176

  • Total observations: 2187

  • Unique sites: 36

  • Unique observers: 3

Taxonomic Diversity

  • Fish families: 21

  • Fish genera: 41

  • Fish species: 79

Biomass Statistics (kg/ha)

  • Mean: 168

  • Median: 113.6

  • SD: 199.1

  • Range: 19.6 - 1217.1

Data Quality Flags

  • High biomass outliers (>90th percentile): 5

  • Statistical outliers (IQR method): 12

  • Sites with shark observations: 10