Peskas is a software designed for the comprehensive management of small-scale fisheries data, from data collection to data visualisation and publication. It utilises an automated digital infrastructure to process and visualise fisheries data, ensuring that data-driven decisions can be made for sustainable fishery management. The following subsections describe the Peskas’s architecture and functionalities in detail.
2.1 Software architecture
Peskas is composed of two core R packages, one dedicated to the data workflow, peskas.timor.pipeline [9] and one intended for visualisation, peskas.timor.portal [10]. The R language was selected for its powerful statistical computation and graphical capabilities conducive to data analysis tasks. Docker containerization is a cornerstone of the architecture, ensuring platform independence and streamlined deployment across diverse environments. Version control and automated workflows are managed through GitHub and GitHub Actions, supporting continuous integration and deployment practices, while Google Cloud Platform is used for data storage. For interactive data visualisation, a Shiny web application based on a Bootstrap 5 UI kit (Tabler) and running on Google Cloud Run is employed to construct a dynamic, multilingual web application, whereas Rmarkdown is utilised to generate detailed reports integrating both analysis and outputs. The dissemination of data is carried out through the Dataverse API, enabling open access to the data sets curated by Peskas.
The architecture of Peskas was designed to minimise costs as far as possible, and to be adaptable and scalable, allowing for flexibility and ease of maintenance. From data collection and preprocessing to analysis, validation, and dissemination, each step runs within a Docker environment, ensuring consistency and reproducibility. The use of GitHub Actions automates the entire pipeline, enabling daily updates to the data and visualisations provided by the portal. Figure illustrates this architecture of the Peskas platform which ensures all components work together seamlessly, providing reliable, scalable, and insightful analytics for fisheries management.
2.2 Software functionalities
The Peskas platform is engineered to deliver a comprehensive array of functionalities that streamline the collection, processing, analysis, and presentation of fisheries data. This multifaceted approach not only enhances the utility of the platform for various stakeholders but also ensures the integrity and accessibility of the data, as the following subsections describe. Peskas consists of six core modules, each dedicated to a particular data flow step and consisting in turn of a series of functions:
-
Data Collection: KoboToolbox surveys and continuous, solar-powered GPS vessel trackers to collect and send data in near real-time, alongside fishery metadata for a thorough data-gathering process.
-
Pre-processing: Data formatting, shaping, and standardisation to prepare the raw data for analysis.
-
Validation: Outlier detection and error identification, and includes an alert system to maintain data quality.
-
Analytics: Modelling fisheries indicators, nutritional characterization, and data mining to extract valuable insights.
-
Data export: Automated dissemination of processed and analysed fisheries data to ensure accessibility and comprehension. This involves restructuring data for dashboard integration and open publication.
-
Visualisation: Tools for data reporting and sharing of insights through a comprehensive dashboard.
2.2.1 Data Collection and Integration
A key functionality of Peskas is its capability to automate the retrieval of data from a wide range of sources. Peskas leverages APIs such as the one provided by KoBoToolbox to automate the retrieval of survey data. While KoBoToolbox is a key tool for field data collection in diverse environments, Peskas is designed to work with any similar XForms or data collection platform. For tracking vessel movements, Peskas integrates with Pelagic Data Systems (PDS), which offers a vessel tracking system (hardware) and data-as-a-service solution for monitoring fishing vessels. This yields high-resolution data on vessel movements. However, Peskas's architecture is built to accommodate data from any tracking system that provides compatible data formats, ensuring flexibility in sourcing vessel movement data. The automated integration of tracking data into Peskas enriches the dataset with high-resolution geolocation and movement information, enabling the calculation of fishing effort per boat and extrapolation across municipal and national fleets. Beyond these specific integrations, Peskas currently utilises Airtable1 as the preferred platform for its metadata registry, including static tables with vessels, catch, and regional information used for the subsequent data processing steps. This approach ensures that Peskas remains a versatile tool for data collection and integration, capable of working with a broad spectrum of data sources and types.
2.2.2 Preprocessing
Once the data is collected, Peskas initiates a preprocessing phase where the raw data undergoes cleaning, normalisation, and transformation. This phase standardises data formats, ensuring uniformity and converts data into tidy formats. It also involves a preliminary quality check for inconsistencies and missing values. One of the challenges addressed during this phase is the integration of data collected via three distinct survey versions of KoboToolbox. These versions, each with its unique structure, necessitate being merged to achieve a unified data framework. After the merging of the KoboToolbox data, the pipeline triggers two pivotal data-mining functions: get_weight and, subsequently calculate_nutrients. These functions extract information regarding the weight of each catch and its nutrient composition, respectively. These functions pull data from FishBase [11], a comprehensive external resource, by employing an API. This integration allows for the dynamic inclusion of length-weight relationships and nutrient concentration information based on the most updated information. Moreover, this stage involved the alignment with the established aquatic foods ontology [12], where variables are not only renamed for enhanced clarity but are also aligned with a recognized and controlled vocabulary where applicable.
2.2.3 Validation
In the validation module of Peskas, data undergoes a rigorous validation process to ensure its accuracy and consistency for subsequent analyses; it involves the validation of both catch and vessel movement data. For catch data, the process involves the examination of outliers and anomalous values, which when identified are excluded from further analysis. The validation procedure is organised through a systematic labelling process, wherein each data entry is assigned a specific code that reflects a particular validation status. For instance, data entries without outliers are tagged with the code "0," indicating "no alerts." Conversely, a code "5" signals that the "Trip duration is too long," and a code "7" denotes that the "Recorded length is too large for the catch type," among others. In total, 21 distinct alert flags have been established, each addressing a specific and critical dimension of the data to facilitate a comprehensive quality assessment. The decisions on how to structure data alerts took place through dialogue with local stakeholders and fisheries experts to ensure they were context-specific.
Validation employs both univariate and multivariate methods to detect outliers and assess the data precision. Univariate outlier detection is conducted using the median absolute deviation (MAD) method, in the “univOutl” package [13]. Multivariate approaches are employed to verify the accuracy of specific variables, such as catch weight, where outliers are identified using thresholds based on Cook's Distance of the residuals between catch weight and catch value.
To refine outlier detection accuracy and fine-tune detection parameters, entries flagged as potential outliers undergo manual scrutiny on a specialised validation platform. This internal tool, a Google Sheet integrated with Kobotoolbox via Google Apps Script, streamlines the review process. This setup facilitates a more efficient and effective manual validation step, especially from the stakeholder's perspective, ensuring that outlier detection is both precise and adaptable to the nuances of the data. The final product of this validation module is a dataframe that has been meticulously cleaned and validated, ensuring its readiness for further analysis and processing in modelling and metric extraction phases.
Vessel movement data validation focuses on the unique challenges associated with global position data. Issues such as undetected trips, merged trips, or split trips, as well as potential delays or losses of information due to poor network coverage, necessitate a tailored validation approach. The validate_pds_data function is instrumental in addressing the complexities associated with GPS tracking data, by evaluating each vessel trip for its duration and the distance covered. This evaluation utilises specified parameters to assess the durations and distances, along with the distance between the start and end points of a fishing trip. Importantly, it also incorporates quality metrics to refine data quality. Among these metrics, “outlier limits” identify and exclude data points that markedly deviate from expected patterns, such as anomalously high speeds, indicating potential inaccuracies or anomalies in the data. Similarly, “signal trace dispersion” measures the consistency and reliability of GPS signal locations over time, where a high dispersion level could suggest issues like poor GPS signal quality or errors in data transmission.
2.2.4 Analytics
In the analytics module of Peskas, following data validation, the focus shifts towards quantifying fisheries indicators. This module is tasked with estimating the average catch per trip as well as the average revenue per trip across municipalities. Also, we estimate the number of landings per fisher by month derived from a generalised linear mixed model. At the heart of this module, the estimate_fishery_indicators function orchestrates the workflow, beginning with the ingestion of trip data, which is then enriched with metadata on registered boats. The inclusion of registered boat numbers is particularly essential as it underpins the calculation of total catch and revenue within each region. Once the average catch per trip is determined, and the frequency and duration of trips each fisherman undertakes in a month are established, the number of registered boats allows for an extrapolation of these averages to a regional scale. Notably, the estimation of catch is performed not only on aggregated values but also across different fish groups and important species. This approach adds a critical layer of granularity; by dissecting the catch data into specific taxa, the module enhances the understanding of species-specific trends and their implications for fisheries management.
2.2.5 Data export
The export module in Peskas disseminates the processed and analysed fisheries data to a wider audience, ensuring both accessibility and usability. This module undertakes a data restructuring process aimed at fulfilling dual objectives: firstly, to align with the dashboard framework for visualisation and user interactions, and secondly, to prepare the data for open publication. For open publication, both raw and aggregated (national and municipal) data undergo conversion from RDS file format to CSV. This transformation caters to the needs of a general audience, facilitating broader access and understanding. Accompanying the data, an informative README document is automatically generated, offering detailed descriptions of the data content and fields. Data is uploaded to the Harvard Dataverse portal under the CC BY-NC-SA 4.0 licence through the portal API service, automatically, every month. This automated process ensures that the latest fisheries data are consistently made available to researchers, policymakers, and the public.
2.2.6 Visualisation and Reporting
To make the insights accessible and actionable, Peskas boasts an advanced visualisation feature through an interactive Shiny dashboard. This dashboard serves as the primary interface for users to explore, analyse, and interpret the data in real time. It is complemented by the capability to generate detailed reports using Rmarkdown, which allows for the dynamic incorporation of analysis results, including charts and tables, into comprehensive documents
2.3 User Interface
The Peskas portal dashboard (https://timor.peskas.org/) is a robust web application hosted on Google Cloud Run, ensuring scalability and reliability. It leverages a suite of R packages to deliver a dynamic, interactive user experience, particularly in visualising fisheries data. The dashboard is updated daily with fresh data from the peskas.timor.pipeline, orchestrated through GitHub Actions, ensuring that the displayed information is real-time and actionable. Developed using the Shiny framework, the portal incorporates advanced visualisation tools such as “kepler.gl”, a powerful Javascript library for geospatial data analysis, which is used to provide stakeholders with a visual understanding of fishing activity distributions across Timor-Leste.
The user interface is designed to be intuitive and accessible, featuring a multilingual option that currently includes English, Portuguese, and Tetum. This feature is crucial for engaging local communities in sustainable fisheries management by allowing them to access and analyse data in their preferred language. Data interaction is facilitated by various R packages integrated into the portal, such as “reactable” for interactive tables [14], and “apexcharter” for responsive charting solutions [15]. These tools enable users to drill down into specifics such as catch volumes, species distribution, and fishing efforts, with the flexibility to customise views and download data according to their needs. The portal's backend is supported by Google services, with authentication managed via “googleAuthR” [16] and data storage solutions provided through “googleCloudStorageR” [17], ensuring secure and scalable cloud storage options.