We demonstrate how to leverage diverse data sources, satellite imagery, and modeling techniques to provide nuanced insights into refugee spatial dispersion. We first explain why we choose the Cameron setting, then enumerate the different data sources explored and their global availability and finally present our modelling framework and potential for replicability.
Location: the refugee context in Cameroon
Cameroon is confronting a multi-faceted humanitarian and protection crisis caused by conflict, inter-communal violence, and the impacts of climate change. The location of refugees follows roughly this schema: refugees from Nigeria who fled Boko Haram violence are situated in the Far North region and refugees from Central African Republic (CAR) in the East region (see Fig. 1). Some refugees from CAR have been displaced for over a decade, and the political and security situation in CAR has not improved sufficiently to warrant their return (16). UNHCR monitors seven refugee sites along the border with CAR (see Fig. 1). Nigerians began seeking refuge in Cameroon in 2012. In response, Cameroon established a camp in Minawao to accommodate up to 20,000 refugees, but the camp’s capacity was nearly exceeded by 2014 (16). This is the only refugee site titled camp in Cameroon. By the end of 2015, violence along the Cameroon-Nigeria border displaced more than 90,000 Cameroonians and refugees who had settled in these areas and this pressure at the border never really stopped (16). At the end of March 2023, Cameroon had over 480,000 refugees and asylum seekers, including roughly 349,000 Central African Republic refugees and 128,000 Nigerians and more than one million internally displaced people (17).
Data sources
Data are involved at three stages of the high-resolution refugee mapping pipeline: first to define the target variable that is the refugee population through administrative records, second to geolocate the refugee population through a settlement map and a gridded population map and third to model its spatial variation through multiple spatial covariates.
Refugee population
The data has been retrieved in April 2023 from the proGres version 4 registration database, UNHCR's corporate registration, identity and case management tool that holds the individual data of forcibly displaced and stateless persons. First rolled out in 2018, it is built on Microsoft customer relationship management and is the backbone of multiple interoperable tools and application ranging from the rapid off-line inputing of refugees information when they seek assistance to the biometric checks during food distributions. The information recorded covers not only basic demographic characteristics but also identification of specific needs and recording of events such as resettlement acceptation or voluntary departure for all countries where UNHCR operates. Considered as the most up-to-date list on refugees in a country, the proGres database is regularly used as a sampling frame to conduct surveys and to provide statistics on the current size of the refugee population. The aggregated statistics from the database are openly available on UNHCR website (3) and through a R package (18) but accessing the individual records requires special authorisation. In countries where government also undertake refugee registration, UNHCR database can contain only a fraction of the refugees, but this is not the case in Cameroon where it can be considered as a refugee census. Out of the 471 386 refugees recorded in Cameroon, 337 223 refugees were reported as living outside of UNHCR-monitored refugee sites. The registration database provided us with refugee aggregates at admin 3 level and partial information on refugee precise locations based on textual input.
Gazetteer
To link the manually inputted textual information in proGres registration database with spatial coordinates, we retrieved 37 531 point locations with names from OpenStreetMap in June 2023 (19) that is point feature with key “place” and value “all” using the QuickOSM plug-in from QGIS (20).
Covariates We gathered 20 covariates to inform the spatial allocation of refugees inside Cameroon:
-
Covariates linked to the available infrastructure are derived from OpenStreetMap: distance to health, education, local roads, major roads, marketplaces, places of worship, road intersections (19) and from NASA Visible Infrared Imaging Radiometer Suite: mean intensity of nightlights in 2022 (21)
-
Covariates linked to insecurity are derived from the Armed Conflict Location and Event Database (22): distance to conflict locations in 2019, 2020, 2021 and 2022; and from the UNHCR sites location (19): distance to monitored refugee sites
-
A covariate linked to the level of population density is derived from the WorldPop bottom-up gridded population dataset: a sum of population counts in a 1km window
-
Covariates linked to the morphology of settlement are derived from the Ecopia building footprint: the mean and coefficient of variation of the area and perimeter of buildings contained in each grid cell, the date of the satellite imagery used, the classification in urban and rural of the buildings (23)
Settlement map To map refugee population inside monitored sites, we assessed the accuracy of five different settlement maps that were extracted from satellite imagery by various institutions. They differ in their format (building polygons or gridded settled area), their spatial resolution and their temporal resolution as summarised in Table 1.
Base high-resolution population
We used as base layer for the refugee population located outside of monitored refugee sites, a 100m gridded population layer, product of a collaboration between the Cameroon National Statistics Office and the WorldPop Research group produced from household survey dating from 2021 and 2022 and building footprints dating from 2021. The method used was a Bayesian Hierarchical Geostatistical Model.
Table 1
Attributes of the settlement maps assessed for refugee mapping inside sites
Name | Source | Date selected | Type | Spatial Resolution | Coverage | Temporal Resolution | Resource |
---|
World Settlement Footprint | German Space Agency | 2019 | Raster | 10m | Global | 4 years | Link |
Global Human Settlement Layer | Joint Research Centre - European Commission | 2020 | Raster | 100m | Global | ~ 5 years | Link |
Microsoft Building Footprints | Microsoft | 2022 | Polygon | | Global | Ad hoc | Link |
Google Building Footprints (v1) | Google | 2021 | Polygon | | Africa, South Asia and Latin America | Ad hoc | Link |
Ecopia Building Footprints | Ecopia AI Maxar Technologies | 2021 | Polygon | | Africa | Ad hoc | Link |
Mapping refugee distributions with high spatial resolution
The goal is to provide a workflow for mapping with high-resolution refugee across Cameroon. We have access to reliable refugee totals for the eight monitored sites and for the 360 provinces (arrondissements). To gain more spatial insights we want to disaggregate those totals into a fine resolution grid following the format of gridded population which consist in estimating population counts over a complete partitioning of the country of interest into same-sized small grid cell (see Leyk (6) for a review of the concept). As summarised in Fig. 2, we developed two different methods to disaggregate refugee totals into grid cells depending on if the recorded refugee location was inside or outside of the UNHCR-led sites.
Mapping refugees inside UNHCR-led refugee sites
The mapping of refugees inside UNHCR-led refugee sites consists in spatially disaggregating the refugee totals observed for each site into all the grid cells covered by the site. To do so, we first checked the site boundaries against different satellite imagery (Google, ESRI and Bing basemap visualised in April 2023) to update them in case of late changes. We assessed the different settlement maps to select the most accurate in terms of number of buildings delineated and extent of settlement mapped. We manually digitised structures that could be seen from satellite imagery but were not in the satellite-imagery derived building footprint products. We overlayed a 100m grid over the sites and computed the number of structure footprints in each grid cell. We disaggregated the total number of refugees registered in each site into the 100m grid cells by using the number of structures as weights, adopting thus a deterministic dasymetric allocation of refugees (see Fig. 2).
Mapping refugee outside UNHCR-led refugee sites
The mapping of refugees outside sites consists in disaggregating at grid cell level the refugee count observed at admin 3 level as provided by the proGres database. There are two primary steps for this approach: (1) geolocating the records from the administrative register and (2) combining the geolocation with other spatial covariates to model the spatial allocation of the entire refugee population.
To geolocate the records from the registration database, we cleaned the text recorded in the freely inputed field about refugee location by removing arabic character, upper case and trailing spaces and replacing internal whitespace and apostrophe by hyphen. We then merged the refugee locations with the point locations from OpenStreetMap that went through the same cleaning process. We then converted the refugee point location layer to a continuous spatial indicator by creating a buffer for each refugee point based on the size of the refugee population assigned to this location. We then computed a distance layer from all grid cells to the buffered refugee location in order to avoid a strict subjective spatial cut-off. It is not possible to use the geolocation of refugee directly because: (1) the point location provided by OpenStreetMap may not always correspond exactly with the location provided in the registry, and (2) that would entail removing the refugees who could not be located with OpenStreetMap.
To address the second step that consists in modelling the refugee count at grid cell level, we adapted a similar procedure as developed by Stevens (4) and widely applied by the WorldPop research group (5) for disaggregating total population count into grid cells:
-
We select gridded covariates that are related to refugee locations (see Data section), in addition to the layer geolocating refugees with OpenStreetMap as describe in previous paragraph.
-
We estimate through a random-forest model the relationship at admin 3 level between the selected gridded covariates and the log of the total number of refugees divided by the total number of people, that is the number of refugees per inhabitant.
-
We predict the number of refugees per inhabitant using the fitted model for every grid cell contained in Cameroon and located outside of UNHCR-led refugee sites.
-
We multiply the estimated number of refugees per inhabitant by the estimated number of inhabitants to obtain the estimated number of refugees for every grid cell.
-
To ensure that the sum of the grid cells with refugees is exactly equal to the refugee totals reported at admin3 level, we calibrate the number of refugees per grid cell with the reported totals at admin3 level.