Study area
The study was conducted in Ethiopia, which is located in the North-Eastern part of Africa. It is bounded by the north and south Sudan on the West, Eritrea, and Djibouti on the North East, Somalia on the East and South East, and Kenya to the South. Ethiopia lies between the 30 N and 150N Latitude and 330 E and 480 E Longitude.
The country occupies an area of approximately about 1,127,000 km2. The Ethiopian landmass consists of a large, high elevated plateau bisected by the Rift Valley into the northwestern and the southeastern highlands, each with associated lowlands. The contrast in relief is remarkable as land elevation ranges between −155 m of Asal Lake in the Afar depression (the lowest point in Africa) to the peak of Moutain Ras-Dashn at 4,620 meters above sea level in the Semen Mountains [15]. There are nine regional states and two city administrations.
Study design
A population-based cross-sectional study was employed to examine HIV seropositivity, explore spatial distribution of HIV and identify factors associatedwith HIV seropoeitivity in Ethiopia.
Data Source
The data for this study were taken from the 2016 EDHS. The 2016 EDHS is the fourth comprehensive and nationally representative survey conducted in Ethiopia as part of the worldwide Demographic and Health Surveys (DHS) project. The EDHS 2016 data were downloaded from The DHS website after being granted permission.
Dependent variables: HIV Seropositivity (positive, negative)
The Interviewer collected capillary blood from a finger prick in women age 15-49 and men age 15-59 who consented to HIV testing. The protocol for blood specimen collection and analysis was based on the anonymous unlinked protocol developed for the DHS Program. If a respondent consented to HIV testing, five blood spots from the finger prick were collected on a filter paper card. A unique barcode label was affixed to the filter paper card, a duplicate label was attached to the biomarker questionnaire, and a third copy of the same barcode was affixed to the Dried Blood Spot Transmittal Sheet to track the blood samples from the field to the laboratory. Blood samples were dried overnight and packaged for storage the following morning. Samples were periodically collected from the field and transported to the laboratory at the Ethiopian Public Health Institute (EPHI) in Addis Ababa. Upon arrival at EPHI, each blood sample was logged into the CSPro HIV Test Tracking System database, given a laboratory number, and stored at -20°C until tested [16].
Sample size and sampling procedures
A total of 25,774 individuals out of which 13,295 women and12,479 men were included in to the analysis from 13,043 households. All sampling procedures, data collection, and data quality control were done by the DHS team [16].
Data processing and analysis
The data of 25,774 individuals were extracted from 2016 EDHS and analyzed using Stata 14. Descriptive statistics was computed using frequency and proportion.
Spatial statistics Analysis
Statistically significant clusters defined as geographic areas where the prevalence of the disease is disproportionately higher/lower compared to neighboring areas. Tests for global clustering detect the existence of at least one cluster, but not the specific place of the cluster(s) [reference].
Mapping Cluster
Mapping clusters was done using the Local Indicators of Spatial Association (LISA) analysis and GeoDa was used to conduct LISA analysis. LISA measures spatial autocorrelation, a measure of the degree to which features clustered or dispersed, and can be used as a method for cluster analysis. In cluster analysis objects in the same group (cluster) are more similar to each other than others[17, 18].
Local Moran’s I used to map clusters of disease prevalence and identify significant clusters. This map has four classes. The clusters are the high-high (high disease prevalence rates whose neighbors also have high prevalence rates) and low-low (low disease prevalence rates whose neighbors also have low prevalence rates) locations. This showed positive spatial autocorrelation and explain clusters, the remaining are outliers [reference/s].
Hot Spot Analysis
Hot spot analysis was done using GeoDa. Mapping hot spot analysis apply local G* statistic and used to identify and show cluster of areas with high prevalence rates (hot spot) and areas with low prevalence rates (cold spot). It is essential to assess the significance of the test [17, 18].
Mapping hot spot analysis uses local G* statistic to map clusters of disease prevalence, because it identifies hot and cold spots. A map should show clusters of areas with high prevalence rates (hot spot) and areas with low prevalence rates (cold spot). It is essential to assess the significance of the test. This can increase the statistical significance with increasing the number of permutations of the results, so it can be used 999 permutations. This means 999 times, which eliminates bias by producing random results [17, 18]. To conduct this analysis use GeoDa
Interpolation
Interpolation analysis was conducted using QGIS and it is based on the assumption that spatially distributed objects are spatially correlated. I.e.) things that are close together tend to have similar characteristics or based on measured area predict unmeasured area by using the Inverse Distance Weight (IDW) [19-21].
Kulldoruff’s Scan Statistic
Kulldoruff’s scan statistic is a method, which is spatial scan statistics for detecting and evaluating statistically significant spatial clusters risk factors for a specific disease. Final confirmatory spatial analysis was done using spatial analysis software SaTScan and QGIS. The SaTScan identifay specific palce spatially significant higher or lower rate of aggregates is found. Its output presents the hotspot areas in circular windows, indicating the areas in the windows are higher than expected distributions compared to the areas outside of the cluster windows [22-25].
Multilevel logistic regression
Multilevel analysis was considered appropriate to account for the hierarchical nature of the DHS data and to be able to estimate lndividual level as well as household-level effects on the outcome variable [26-28].
Model building
The four model building and analysis were done using Stata 14. Overall, four models containing variables of interest were fitted for each of HIV Seropositivity adults. The first model (M0) is an empty model which was fitted without independent variables to test random variability using Intra House Hold Correlation (IHHC) [29]. The second model (M1) was fitted to all lower-level (individual level) factors; the third model (M2) used for all higher-level (household level) factors; and the fourth model (M3) used for both lower- and higher-level factors to report. Model fitness for the report selected by the Akaike Information Criterion (AIC).