Survey design
We conducted a structured household survey to collect data about the characteristics and the performance of small-scale carp aquaculture systems in Bangladesh. The household survey consisted of 10 modules that collectively elicited comprehensive information about the aquaculture production systems in each household, and the demographic and socioeconomic profile of households. Each household that participated in this survey answered all ten modules.
The specific modules were:
-
Module A: Household identifiers and demographic profile
-
Module B: Land ownership
-
Module C: Aquaculture production
-
Module D: Agricultural and livestock production
-
Module E: Income and expenditures
-
Module F: Credit access and group membership
-
Module G: Food security
-
Module H: Multi-dimensional poverty
-
Module I: Vulnerability to shocks
-
Module J: Perceived impacts of the COVID19 pandemic
Module A and B contained basic information about the location and demographic characteristics of the respondent and their household (Module A), as well about land tenure and size (Module B). As the main focus of the survey was the elicitation of the characteristics and performance of carp production systems, the respondents were the household members mainly responsible about pond operation that also had a good understanding of other livelihood activities within the household. This was practically always either the household head or the spouse.
Module C was by far the most extensive module in terms of the number of questions. It is essentially the central point of the survey, as it elicited comprehensive information about aquaculture production, commercialisation and knowledge. Notably, for every pond operated by the household, this module captured the types of fish produced, sold (raw or processed), gifted and self-consumed, as well as the generated income. The module also elicited information about all the material (e.g. feed, seed) and production methods (e.g. BMPs) used in the ponds, as well as other related information such aquaculture labour, assets, expenditures and knowledge.
Module D captured information about other on-farm livelihood activities related to agriculture and livestock production. It elicited the production, consumption and sales of all major crops/vegetables and livestock, the generated income, as well as relevant input costs (e.g. agrochemicals, labour)
Module E elicited information about non-farm income and expenditure (on-farm income/expenditure items were captured in Modules C-D). Module F captured information about the participation of households on different producers’ groups, as well as their access to credit and aquaculture/agriculture knowledge, training and extension.
Module G and H contained the base questions to calculate two standardised metrics of food security and poverty respectively. Module G contained the base questions to develop the Food Consumption Score (FCS), a measure of dietary diversity28. Module H contained the base questions to calculate the multi-dimensional poverty index (MPI), a composite measure of poverty29.
Modules I and J captured the vulnerability of households to climatic shocks and the COVID-19 pandemic. Module I elicited: (a) the exposure of the households to different climatic hazards such as floods, droughts, or storms, (b) the effect of such hazards on aquaculture output, livelihoods and food security, and (c) the coping mechanisms or adaptive strategies implemented by households to mitigate the effects of such hazards. Module J elicited similar information about the disruption of aquaculture activities during the COVID-19 pandemic. Perceptions were mostly elicited using appropriate Likert scales, tailored to the type of disruption and the measured outcome.
All modules overwhelmingly contained close-ended question, where the respondents had to either provide actual numerical values or the enumerators had to assign pre-existing codes depending on the respondent answers. Questions about all productive activities (in Module C-D) and off-farm income and expenditures (in Module E) were for the previous aquaculture production season, and entailed a 12-month recollection period.
Only a small minority of the closed-ended questions were perception-based. For these questions the respondents had, for example, rate their satisfaction with their fish production (in Module C), or vulnerability to climatic hazards and COVID-19 pandemic (in Module I-J). We ensured that these perception questions were framed appropriately to avoid gender-differentiated responses (e.g. effect for entire household rather than individual respondents). Finally, there were very few open-ended questions, which were used as an opportunity to ask respondents justify some of their answers to previous questions if they wished so.
Data collection
The survey was translated in Bengali and digitised in tablets for in-person data collection using well-trained enumerators (refer to “Technical validation” for a deeper explanation of these processes). The digitisation and data collection was undertaken by Development Research Initiative (dRI), in close cooperation with the research team. dRI is a local consultancy with extensive experience in survey collection from rural households in Bangladesh (including aquaculture producers).
Enumerators and supervisors were trained with the active supervision of the research team over 3 days (13–15 November 2021). Subsequently, the enumerators conducted pilot surveys for 2 days (16–17 November 2021) with aquaculture producers in some of the selected study areas to become further acquainted with the protocol. This piloting was also essential for pinpointing questions with problematic framing (e.g. in terms of clarity or sensitivity) and ensuring that the ranges used for coding were sensible (see “Survey design”). Where needed, the final survey was revised slightly reflecting the insights gathered from the piloting. This was done through iterative discussions between the enumerators, supervisors and the research team.
The full household survey was conducted between 18th November 2021 and 28th February 2022. Each survey lasted 60–120 min depending on the answers of the respondents. Surveys tended to be longer for households that operated multiple ponds.
Multiple quality assurance mechanisms were implemented during the digitisation of the survey, the training of the enumerators/supervisors and the implementation of the survey to ensure high and consistent data quality (refer to “Mitigation of non-sampling errors” for a comprehensive explanation of these quality assurance mechanisms).
Ethical approval
The Ethics Review Committee of the University of Tokyo confirmed that the survey did not require ethical review. This was in accordance with the Government of Japan research ethics guidelines (“Ethical guidelines for Life Science and Medical Research Involving Human Subjects”) and “The University of Tokyo’s Research Ethics Regulations”.
Data Records
The dataset is accessible via the Harvard Dataset repository30. The archived material includes the dataset file (xlsx-file) and the household questionnaire file (docx-file).
The dataset file (xlsx-file) contains the collected data for all variables. The data is included in two tabs (Sheet 1: Question 1-13; Sheet 2: Question 14-46). For each variable in the dataset file (xlsx-file) there is an alphanumerical identifier (in Line 2) and the full wording of the related question/answer (in Line 1). The household questionnaire file (docx-file) contains the full survey in English and Bengali. It contains the complete script of each questions to understand the framing, and the ranges for categorical variables. Each variable in the household questionnaire file (docx-file) also has an alphanumerical identifier.
The alphanumerical identifiers between the dataset file (xlsx-file) and the household questionnaire file (docx-file) are the same to enable linking the data of each variable to the actual question. Names of household members and other contact details are withheld to ensure anonymity. All other data is provided in raw format without any manipulation.
Technical Validation
Sources of errors
Different types of sampling and non-sampling errors can compromise data quality in large-scale farm-level surveys such as the one outlined in this Data Descriptor31,32. Such sampling and non-sampling errors can both insert uncertainties in subsequent analysis and cause erroneous assessments33. Ultimately, in the context of this study, they can compromise the robust identification of technology adoption and performance at the farm level and the design of development interventions seeking to improve farm performance and sustainability.
Sampling errors generally arise from the unrepresentativeness of the sample. During the initial stages of research design we identified that the main sources of sampling errors in our study would mostly be associated with the selection of: (a) study areas with unusual, unrepresentative, and/or collectively homogenous carp production systems, and (b) aquaculture producers in each study area with unrepresentative carp production systems compared to the prevailing systems in their respective study area.
Considering that the aim of this survey is to characterise the different carp production systems in Bangladesh, risks of carp production systems [point (a) in previous paragraph] could likely occur by (i) selecting study areas with low/no carp production, (ii) selecting as study areas only few localities within the country (e.g. select upazilas only in some divisions or districts for convenience reasons), and/or (iii) selecting study areas that collectively fail to capture the diversity of carp production systems in the country. Such study site selection decisions could individually or collectively lead to a skewed understanding of the national landscape of carp production systems. Unrepresentative producer selection in the study areas [point (b) in previous paragraph] could occur through biases in respondent identification and/or lack of proper randomisation during respondent selection in each study area. We expected that such types of errors are likely to occur due to the lack of comprehensive, consistent and spatially-explicit datasets about carp aquaculture in Bangladesh, as mentioned throughout this paper.
Non-sampling errors refer to all other errors that can arise during data collection, which can cause the data to differ from their true values34. During the initial stages of research design, we identified that the main sources of non-sampling errors would mostly be associated with non-response errors and measurement errors31. Such errors typically occur if (a) data collection deviates from the established study protocol, and (b) responses are biased or fail to capture/report properly the required information (e.g. recollection errors)34. We expected that such types of non-sampling errors could likely occur for two reasons. First, was the large enumerator team necessary for data collection considering the study’s extensive geographical scope (entire Bangladesh) and large sample size (>4,500 surveys) in order to avoid the possible sampling errors mentioned above. Second, was the large amount of information captured in the survey and the long recollection periods for some of the key questions.
Being conscious of these risks, from the onset of the study we implemented various quality assurance procedures both to identify and select properly the sites and respondents, as well as to ensure consistent data collection quality for the entire sample. We describe these quality assurance procedures in the two following sub-sections.
Mitigation of sampling errors
We actively sought to reduce sampling errors through the development and implementation of a robust sampling strategy both for selecting the study areas (upazilas), and the respondent households in each study area. Both relied on multi-stage processes that tried to eliminate to extent possible uncertainties and biases with study site and respondent selection.
For study area identification and selection, we followed a multi-stage process consisting of expert consultations, GIS-based aquaculture suitability analysis and triangulation (to the extent possible) with secondary data compiled by the Department of Fisheries of the Bangladesh Ministry of Fisheries and Livestock22 (refer to “Site identification and selection”). In particular, by superimposing the upazilas identified through the expert consultation (Figure 1a) with the findings of aquaculture suitability analysis we ensured that none of the selected upazilas fall completely in areas with low aquaculture suitability (Figure 1b). This multi-stage approach allowed us to avoid selecting study areas with unrepresentative carp production, and particularly areas that individually lack significant carp production or areas that collectively fail to capture the diversity of carp production systems in the country (see previous section). By relying on multiple data sources and a large number of experts we reduced to the extent possible: (a) biases that could emerge if we relied on the perspectives of just few expert, and (b) inaccuracies due to the incompleteness of some datasets.
For respondent identification and selection, we also followed a multi-stage process that relied on randomised respondent selection from carp producer lists developed by the research team (refer to “Sample identification and selection”). The underlying aim was to effectively randomise sample identification and selection in order to minimise (to the extent possible) biases and errors introduced by non-randomized sampling methods (e.g. convenience sampling) or by using outdated producers' lists. In particular, to ensure a relatively large and balanced pool of respondents, we randomly selected approximately 84 respondents in each study upazila, from more extensive lists of 130-150 producers that were developed by the research team for this project. It is important to note here that we did not identify every carp producers in the study upazilas. Conducting a complete census of all carp producers in the study areas, was both outside the scope of the study and would have been prohibitively resource intensive. However, we must acknowledge that despite our best efforts there is some likelihood that randomisation might not have been conducted perfectly in all upazilas, possibly inserting some selection biases. Unfortunately, this cannot be formally tested considering the lack of comprehensive and cohesive lists of carp producers, as explained throughout this paper. However, we believe that the final sample offers a good representation of the carp aquaculture sector in Bangladesh when taking into consideration the insights gained from the expert workshops24 and the findings of other aquaculture studies in the country23,27,35.
Finally, to achieve high response rates we conducted the interviews in-person, in the local language (Bengali), and at the respondents’ residence. We opted to conduct interviews in familiar and comfortable settings rather than common areas (e.g. community halls) to both enable respondents consult their files to assist recollection and reduce reluctance to answer questions deemed sensitive (e.g. income, food security). We did not record response rates statistics, but according to the daily feedback from the enumerators the survey achieved very high response rates in all regions. Based on this feedback, the main reason for declining to participate in the survey was (a) the substantial amount of time required to complete the full survey (approx. 60-120 min), and (b) possible health concerns as the data collection occurred during the COVID-19 pandemic (Nov 2021 - Feb 2022). The small number of surveys that were not fully completed were discarded from further analysis, and the database provided in this Data Descriptor.
Mitigation of non-sampling errors
To reduce non-sampling errors, we developed and adhered to a robust and comprehensive protocol for the design and implementation of the household survey. Specific actions included:
- select and experienced research and data collection teams;
- design and digitisation carefully the questionnaire to capture accurately the characteristics and performance of carp producers, and assist the recollection of the respondents;
- train enumerators to adhere properly to the study protocol;
- revise iteratively the questionnaire prior to implementation through the insights generated during training and piloting;
- check daily all collected surveys for completeness and quality, and provide timely and constant feedback to enumerators.
Regarding (a), the interdisciplinary research team that designed the survey (i.e. the authoring team of this Data Descriptor) had long experience designing and implementing many similar structured surveys in the context of small-scale aquaculture and agriculture systems in the global South16,32,36,37 (including in Bangladesh18,38,39). The data collection was sub-contracted to dRI, a local consultancy with extensive experience administering surveys to rural households in Bangladesh (including aquaculture producers).
Regarding (b), in terms of design the initial draft of the survey was developed by the research team, using insights from the stakeholder workshops and in situ observations of small-scale aquaculture systems in many of the study areas. This enabled for the good prior understanding within the research team of the unique characteristics of carp aquaculture systems and study sites, as well as the broader dynamics of the aquaculture sector in the country. This was critical for designing the actual questions, selecting an optimal question order, identifying the possible codes for categorical questions, and identifying the most appropriate units of measurement and their conversions (e.g. local vs. standardised units of measurement). In terms of digitisation, the survey was coded by dRI using Survey CTO. The digitiation of the questionnaire was conducted under the close supervision of the research team. In particular, prior to digitisation, the research team explained every single module, question loops, units of measurements, and expected ranges for key questions. The digitisation process paid particular attention to the skip logic, the format of response options depending on the variable (e.g. only numerical values, only alphabetical values, or alphanumeric values) and the ranges of variables.
This careful survey design and digitisation was important not only for preventing errors in capturing the data, but importantly for assisting the recollection of participants. Proper recollection was deemed particularly important for those of the key variables that had a 12-month recall period (e.g. aquaculture production, fish consumption, income and expenditure streams). Studies have suggested that due to such long recollection periods respondents may be unable to report accurately such variables, likely underestimating them16,40,41. In terms of design, some of steps taken to improve recollection was to integrate recollection-sensitive question such as fish production, fish sales and fish self-consumption in the same loop in Module C for each individual fish species in each operational pond, rather than divide them between different modules such as Module E (income and expenditures) and Module G (food security). In other words, for each fish species in each operational pond the questions for income generation and self-consumption were asked immediately after the fish production questions. In terms of digitisation, particular attention was paid in the coding of these loops to help enumerators identify discrepancies if the sums for production, sales, and self-consumption did not add up.
Regarding (c), initially the research team trained the dRI supervisors in English through several online meetings during the first half of November 2021. During these training sessions the research team explained in depth to the dRI supervisors the survey protocol to understand its aims, overall structure, individual components and critical parts. Subsequently the dRI supervisors trained the enumerators in Bengali, with the assistance of research team members that were present online to elucidate any unclear parts or answer any questions. Beyond explaining how to administer the survey, this training sought to ensure that enumerators could critically identify possibly erroneous answers, could perform internal consistency checks on-the-spot (e.g. by checking combinations of certain questions), and could act accordingly when identifying inconsistencies (e.g. by repeating certain questions). Special attention was paid for questions with long recollections and loops of questions, for which enumerators were instructed to spend extra time to ensure that the quantities balance. The overall enumerator training lasted for 3 days (13-15 November 2021), and ensured that the data collection team (i.e. enumerators and supervisors) could adhere to the survey protocol.
Regarding (d), the training sessions outlined above provided the first opportunity to identify problematic questions/formulations and fine-tune the format, translation and digitisation of some questions. Following the end of the training, the questionnaire was pre-tested for 2 days (16-17 November 2021) to understand how it performed under real conditions (e.g. time requirement, ability of respondents to comprehend/answer all questions, willingness to answer sensitive questions), as well as to train the local enumerators under real conditions. Based on this pre-testing, we further changed slightly the formulations, translation, ranges and or digitisation of some questions to reflect the on-the-ground realities. These changes were made after reaching consensus within the research team, the supervisors and the enumerators. Only when all these issues were resolved the official data collection started.
Regarding (e), at the end of each data collection day, dRI uploaded online all of the surveys collected during the day to be accessed by the research team. To ensure the quality of the collected data members of the research team checked during the respective evenings all collected surveys during that day to identify incomplete surveys and/or erroneously collected information. Feedback was conveyed daily back to the supervisors about possible errors, requesting small re-training to individual enumerators if substantial errors were found. Questionnaires that did not meet data quality requirements were discarded at the end of each day.
Usage Notes
Considering that (small-scale) carp aquaculture is a particularly important aquatic food system in the global South (especially in Asia)10, the dataset described here can be used for various types of studies in Bangladesh and beyond.
First, the dataset can be used to explore the heterogeneity and differentiation of carp aquaculture production systems in Bangladesh, in terms of their comparative characteristics and performance. Studies have pointed the increasing necessity of considering heterogeneity between small-scale aquatic food producers in order to prevent the design and implementation of policies and practical solutions that do not meet the needs or capacities of different producers19. In particular, the dataset described here contains comprehensive information that allows for the identification of different production models in terms of fish species, inputs, management practices, intensification levels, and market orientation, among others. This information is found predominately in Module C. In terms of performance, the dataset can be used to assess different indicators of rural livelihoods (e.g. on-farm and off-farm income), poverty (e.g. multi-dimensional poverty), food security (e.g. dietary diversity, months of hunger), and environmental performance. Regarding the latter, the dataset contains detailed information about aquaculture inputs including such as individual types of feed and fertilisers, which could be used to estimate nutrient use efficiency or environmental impacts such as acidification, eutrophication, and ecotoxicological impacts through a life cycle assessment. Information about income is mainly found in Modules C-E, about poverty in Module H, about food security in Module G, and about environmental performance in Module C. Such studies can focus solely on Bangladesh or multiple countries if combined with similar datasets from other countries.
Second, and related to the previous point, the dataset can be used to understand in more depth which factors affect the performance of these aquatic food systems. Such factors can be very diverse, spanning multiple domains: technical (e.g. adoption of BMPs), demographic (e.g. household structure), capacity (e.g. education, access to extension services), geography/infrastructure (e.g. access to infrastructure), economic (e.g. access to credit, asset base) or institutional (e.g. participation to producer groups). This information can be found in different parts of the dataset: technical (mainly Module C), demographic (mainly Module A), capacity (mainly Module A and F), geography/infrastructure (mainly Modules A and C), economic (mainly Modules C, E, F), institutional (mainly Module F). Similar to the previous point, such studies can focus solely on Bangladesh or be used in multi-country assessments if combined with similar datasets from countries.
Third, considering its wide geographical coverage of the entire country, the dataset can essentially provide a snapshot of the current state of the carp aquaculture sector in Bangladesh. Considering that the sector is very dynamic and experiences fast change, this information can be used to develop a national benchmark of this critical aquatic food system. This can offer a robust evidence base to both design future interventions and/or assess the performance of future interventions in the study areas. Although the actual data will be applicable only in studies focusing solely on Bangladesh, the findings of these studies can have wider implications considering the global importance of small-scale crap aquaculture, especially in Asian countries10.
Fourth, food system transformation and food system vulnerability to climate change and livelihood shocks are becoming the focus of large-scale integrated modelling studies42–44. However, it is often challenging to populate the aquatic food system components of such models due to the generally lower availability high-quality spatially-explicit datasets of inland aquaculture, compared to land-based agricultural systems and fisheries. With inland aquaculture expected to account for the bulk of future fish production expansion capacity compared to marine aquaculture and fisheries12, it becomes increasingly necessary to develop high quality data for such systems. The dataset presented here contains diverse variables that can help populate such model components, depending on the focus of the modelling exercise. Of particular relevance could be variables related to the production characteristics (Module C), impacts (Module E, G, and H) and vulnerability to climate change and livelihood shocks (Module I-J).