Study area
This study was conducted in Ifanadiana, a rural district located in the Vatovavy region of southeastern Madagascar (See Supplementary Fig. 1, Additional File 1). The district covers an area of approximately 3,970 km², receives an annual precipitation of 1,200 mm, has a humid tropical climate and a population of approximately 200,000 inhabitants. The district consists of 15 communes (administrative units with an average of 13,000 inhabitants) and 195 fokontany (local administrative units consisting of several villages with an average of about 1,000 inhabitants). Ifanadiana is characterized by its geographical diversity, including mountains, valleys with water sources, rivers and dense, very humid vegetation, ideal for agriculture. The district is predominantly rural, with an economy dominated by agriculture, primarily rice cultivation. Rice cultivation practices in the district vary according to the geographical context, resulting in a diversity of cultivation seasons. Some farmers plant rice once a year, while others plant rice twice a year. This variation is influenced by irrigation availability and access to water, critical factors for understanding how seasonal differences contribute to the diversity in the spatio-temporal dynamics of rice field flooding across the district.
Ifanadiana district is malaria endemic, with an estimated annual incidence of approximately 500 cases per 1,000 inhabitants after accounting for underreporting and prevalence rates as high as 80% during the malaria season [21]. The district exhibits seasonal transmission dynamics linked to climatic patterns, with a low transmission season between June and November and a high transmission season between December and May. Previous studies have shown that rice fields favor malaria transmission, in combination with other climatic and socioeconomic factors in the district [22]. Therefore, we aimed to understand the flooding dynamics of all rice fields in the district in order to contribute to precise and targeted malaria control efforts.
Identification of rice field flooding thresholds using field data and Sentinel-1 images
Monitoring the spatio-temporal dynamics of rice field flooding is based on three fundamental steps: field data collection, exploitation of SAR satellite data, and determination of optimal thresholds to classify different water classes (Fig. 1). First, field observations were conducted to obtain temporal and spatial representativity of rice field flooding dynamics (See Supplementary Fig. 1, Additional File 1). A temporal sampling campaign was conducted monthly on 7 rice fields between February 2021 and February 2022 (Table 1). A spatial sampling campaign was conducted in the dry season (October 17–21, 2022) and the rainy season (February 27-March 3, 2023), covering 54 rice fields in the district (Table 1). These visits, conducted by a team of 5 people, aimed to collect representative samples of three categories of flooding dynamics ("open water", "vegetated water", and "no water"), and to collect information on rice plant height, water depth, rice field characteristics, observation landmarks and their orientation, and observed water class types. We used the NextGIS Mobile (version 2.6.49) to geolocate each rice field and store information and photos directly in the attributes of the geographic features [23]. These data allowed us to manually classify pixels in each rice field that corresponded to the three water class categories to use to train the classification model.
Second, SAR satellite data were obtained on these field-monitored rice fields. This technology is particularly suitable for monitoring the spatio-temporal dynamics of flooding in rice fields due to its ability to penetrate clouds and operate day and night [24, 25]. Among the different polarizations used by SARs, the VH (Vertical transmit and Horizontal receive) polarization of Sentinel-1 imagery is particularly relevant for detecting the presence of water [26]. We used Sentinel-1 SAR data at the ARD (Analysis Ready Data) level available on the Google Earth Engine platform [27]. These data have already undergone the major preprocessing steps: orbital corrections for precise localization, radiometric corrections to convert raw values into usable backscatter coefficients and remove noise effects, and terrain geocoding to project images onto a cartographic grid that accounts for topography and accurately represents ground geometry.
Third, using data collected in the previous two phases, optimal thresholds of VH polarization values were determined to distinguish between the three categories of flooding: "open water", "vegetated water", and "no water". Our approach began with data preparation, which included georeferencing and vectorizing observed plots as shapefiles, where each plot corresponded to an observation point containing a water class based on field observations (Fig. 1A). We then processed Sentinel-1 imagery acquired on the date closest to the field visits (range of 0–14 days difference), identifying pixels corresponding to each plot and extracting VH polarization values for each field-collected water class. To identify the best discriminative thresholds, we used two different ROC curves: one to discriminate "open water" from "vegetated water", and another to discriminate "vegetated water" from "no water". To ensure robustness and generalizability of our thresholds, we spatially stratified the data, using data around the EAST-WEST axis as training data (77%) and those from the NORTH-SOUTH axis (23%) as validation data, allowing us to calculate the sensitivity and specificity for each threshold (See Supplementary Fig. 2, Additional File 1).
Table 1
Number of field observations used to train the classification model.
| Number of rice fields | No. Observation points (photos) | No. Polygons |
Temporal sample | | | |
Monthly (February 2021- February 2022) | 7 | 63 | 89 |
Spatial sample | | | |
Dry season (October 17 2022- October 21 2022) | 54 | 203 | 222 |
Wet season (27 February 2023–02 March 2023) | 54 | 181 | 204 |
Estimation of individual rice field flooding dynamics in Ifanadiana district
We applied the classification models and thresholds previously obtained to SAR imagery corresponding to all rice fields within the district from January 2016 to December 2022 (Fig. 1B). To do so, we first identified all of the rice fields, buildings, and administrative boundaries in the district via a participatory mapping exercise on the OpenStreetMap (OSM) platform in 2019. Full details of this mapping exercise are reported in [28]. We downloaded this data using the QGIS QuickOSM extension [29] [August 23 2023], obtaining the boundaries of 17,321 rice fields and over 100,000 buildings. We also obtained the administrative boundaries of the district's 15 communes and 195 fokontany [30], as well as precipitation data for the duration of the study from Africa Rainfall Climate 2 [31], which was used to calculate the daily precipitation average for the district.
We used Sentinel-1 images collected every 12 days from January 01 2016 - December 23 2022 at the same ARD processing as described above. For each Sentinel-1 image, we filtered the pixels located fully-within the rice fields mapped on OSM and classified them into the same three water categories by applying the optimal thresholds determined by the classification model. Finally, we calculated the areas and proportions of coverage of each water class in each rice field for each acquisition date of the available Sentinel-1 images.
We applied smoothing to the resulting time series to reduce the influence of stochastic noise and erroneous data points. For each rice field, we compared four smoothing techniques: cubic-spline, polynomial, basis-spline, and natural spline. For cubic-spline, basis-spline, and natural-spline techniques, we limited the degrees of freedom to 4 per year, to allow the smooths to fit the seasonality of the data without overfitting to short-term trends. For the polynomial technique, we reduced the degrees of freedom to 2 per year due to convergence issues at higher values. For each rice field, we chose the smoothing technique that resulted in the lowest residual sum of squares via leave-one-out cross validation, and used the resulting model to create a weekly time series for that rice field (See Supplementary Table 1, Additional File 1). We applied the smoothing method to the vegetated water class and the sum of both flooded water classes (vegetated + open water), referred to as total flooded area, separately.
Individual rice field characterization and estimation of aggregate indicators
We developed an algorithm to create summary statistics regarding the agricultural season from each rice field’s smoothed time series of flooded areas (Fig. 1C). First, we performed numerical derivation on each time series to identify local minima and maxima values of the area of each class using a central derivation across a time difference of one day. We then assessed whether each local minima or maxima represented the beginning, peak, or end of an agricultural season. The beginning of a season corresponded to a local minimum that was followed by a maximum that occurred at a flooding value at least 10% of the maximum flooding value of that rice field higher than the focal minimum. The peak of a season corresponded to a local maximum that occurred at a flooding value of at least 10% of the maximum flooding value of that rice field higher than the nearest minimum on either side of the focal maximum. Maxima that did not meet this threshold were defined as intra-season maxima. The end of a season corresponded to a minimum that had a maximum between it and the following minimum that occurred at a flooding value of at least 10% of the maximum flooding value. If the threshold was not met, the period between the two minima was classified as inactive. From these identified beginnings, peaks, and ends of seasons, we then estimated the following statistics for each agricultural season and rice field: start date, end date, minimum flooding value, maximum flooding value, date of maximum flooding, and whether the season was inactive (no flooding). These statistics were then summarized to rice-field level statistics over the seven year time period.
In order to provide useful information to operational programs that implement LSM interventions, individual rice field data need to be aggregated in a way that can help with design, site prioritization, and follow-up. First, to visualize spatial trends in these indicators, we obtained the average of each statistic per fokontany (smallest administrative unit) to be displayed in corresponding maps. Second, we applied a semi-supervised time-series classification algorithm to categorize rice field flooding dynamics into a limited set of homogeneous groups. For this, we manually classified as inactive those rice fields that remained less than 20% flooded during the study period or remained inactive for more than 50% the study period. We then classified the remaining rice fields (N = 14,693) by applying an algorithmic methodology based on dynamic time warping (DTW) and partitioning around the median (PAM) to the time series [32, 33]. The approach consists of two distinct steps: calculating a metric of distance between all pairs of smoothed rice field time series (in Section 3) using DTW, and clustering of the smoothed time series using the PAM algorithm applied to the DTW distance matrix. The determination of the optimal number of clusters was based on the ratio between the inter- and intra-group variance of the rice fields (See Supplementary Fig. 3, Additional File 1), using the index quality measure of Calinski-Harabasz. This resulted in a seven-category classification for the rice field dataset, to which we gave descriptive names based on their average time-series: inactive before mid-year 2018 and low flood coverage, low flood coverage, low and moderate flood coverage, moderate flood coverage, moderate flood coverage with two seasons per year, high flood coverage, and very high flood coverage. For this classification, we used the R software, version 4.3, and the packages "parallelDist", version 0.2.6 [34] and "parallelpam", version 1.4 [35]. Finally, determining the population at risk from a particular larval habitat is an important factor to consider for site prioritization of LSM interventions. Here, we estimated the number of buildings within a 500m radius of each rice field as a proxy for population at risk from that rice field, using OSM spatial data on rice fields and households. From this, rice fields were classified into two categories: those with less than 50 surrounding buildings and with more than 50 surrounding buildings.
Integration of rice field monitoring indicators into an e-health tool
To demonstrate how rice field monitoring can be used for operational purposes during LSM implementation, we integrated all the data and results obtained, including the location and temporal dynamics of rice fields, into a web application developed with the “R Shiny” platform (Fig. 1D). The application allows for the visualization of key information at the district, fokontany, and individual rice field levels. This approach aims to facilitate the use and interpretation of results, while ensuring interoperability of information to support programmatic teams involved in mosquito larvae surveillance and control. We performed spatial junction techniques with administrative boundaries to create complete geographic profiles for each fokontany and rice field. To implement the application, we used PostgreSQL for the spatial database management system (DBMS) with the “PostGIS” extension, as well as the R packages “leaflet”, version 2.2.2 [36] for cartographic visualization and “Rpostgres”, version 1.4.7 [37] and “rpostgis”, version 1.5.1 [37] for connection to the DBMS and manipulation of geographic data.