Study area and study period
The study was conducted in public health facilities of Harari regional State of Ethiopia from July 1 to 15, 2020. Located 518 km to the East Addis Ababa, Harari Region is one of the ten regional States in Ethiopia with an estimated area of 311.25 km2. Based on the 2007 national census conducted by the Central Statistical Agency of Ethiopia (CSA), Harari Region has a total population of 183,415, and has 9 Districts (6 urban and 3 rural) and 36 kebeles (the smallest administrative units in Ethiopia) (31). There were seven hospitals in the Harari Region of which one was owned by the Harari Regional Health Bureau while the rest was owned by other governmental and private organizations. Among these, the 2 hospitals were governmental public health facilities. There were also 8 public health centers, 32 health posts, 10 not-for-profit private clinics, and 15 private clinics for profit in the Harari Region.
Study Design
A facility-based cross-sectional study design was employed.
Study population
The study populations for this study were all departments that were implementing routine health management information systems (HMIS) in all public health facilities of Harari Regional State.
Sample size determination and sampling procedure
The sample size of the study was determined by using a single population proportion formula
Where; n = Sample size, Zα/2 = Standard normal distribution corresponding to a significance level of alpha (α) of 0.05 = 1.96, P =magnitude of the data quality of routine health information system among departments in public health facilities of Dire Dawa (75.3%) (14) and d = degree of precision = 0.05. Accordingly
Since the 245 total number of departments was less than 10,000, the correction formula was used and gave nf = 314/1+ (314/245) =138. However, since the existing departments implementing health information systems were found to be manageable, a census of all (245) departments found in all 42 public health facilities (8 health centers, 32 health posts, and 2 hospitals) was considered.
Data collection instrument
The questionnaire was adapted from the Performance of Routine Information System Management (PRISM) assessment tool version 3.1. (32), and used with little modifications to collect quantitative data. It comprised four sections: The first section was composed of questions related to socio-demographic characteristics of the department heads such as age, educational status, working experiences, professional category, salary, residence, and others. The second and third sections of the questionnaire included items assessing the technical, organizational, and behavioral factors associated with the quality of routine health information system data respectively. Observations, interviews, and document reviews guided by an observation checklist (fourth section of the questionnaire) were used to collect data on the departments’ data quality from all the departments through their respective department heads/representative of each department.
Data collection procedures
Twelve health professionals who had basic data management training and prior experience of data collection and four health professionals who were members of the HIS monitoring team were assigned for the data collection and supervision respectively. Before the data collection, two days training was provided on the purpose, how to collect data, and on ethical issues emphasizing the importance of the safety of the participants, and data quality.
The data were collected by going to all the health facilities, explaining the aim of the study, ensuring the confidentiality of the data, obtaining the written consent from each facility head and participants, observing and interviewing to fill the checklist, and distributing the questionnaire to the department heads to read and fill the rest.
Study variables
Dependent variable
Data quality was the dependent variable of the study.
Independent variables
The independent variables include:
Organizational variables:- training, feedback, supervision, computer, internet, reward, engagement in HIS activities, performance review meeting, and data use,
Technical variables:-presence of standard indicators, report formats, and trained person able to fill format, and
Behavioral variables:- motivation, attitude, data manipulation for competition, negligence, sense of responsibility, knowledge, and data quality checking skills.
Operational definitions
Good quality data: The data that fits the criteria for the three quality dimensions - accuracy >=80%, completeness >=85%, and timeliness >=85% (27, 33).
Poor quality data: The data that does not fit the three criteria (accuracy <80%, or completeness <85%, or timeliness <85%).
Completeness: refers to when the expected data elements are filled in the report format and on the source documents. The data completeness is the average of the source document or registration content completeness and reports content completeness. The data is complete if the average is >=85% (33).
Register content completeness: was checked by taking the last 15 cases from the registration of the department for the selected month/quarter and measured by dividing the number of completely recorded cases by the total cases checked. If the total cases/entries registered in the register are less than 15, the available cases are considered.
Report contentcompleteness: at the department level, report content completeness was measured by dividing the number of data elements reported in the report format by the total number of expected data elements to be reported by the department (32). For departments that do not keep the report copy with themselves, it was taken from the HMIS unit.
Data Accuracy: was measured by recounting already reported data elements/indicators from the source document/register and compared with the one reported in the report format. The data elements/indicators for which the verification factor (recounted value from the source document divided by the value reported in the HMIS report) fell between 0.9-1.1 were regarded as accurate (have normal verification factor). The department’s data accuracy was determined as the sum of accurate data elements/indicators divided by the total number of data elements checked. The department data is accurate if the average is >=80% (27).
Timeliness: was assessed as a report submission within the accepted time period through observing the reporting date on the reporting form of two randomly selected monthly reports. Departments at the health posts were expected to report from 20-22nd, departments at the health centers and hospitals report to the next level from 20-24th. The data of the department is timely if the average is >=85% (33).
Knowledge on HIS: It was the knowledge of rationale of routine HIS data that was measured by using the three knowledge-related open-ended questions which have a total raw score of 7 and for which the answers were coded according to the themes on the PRISM assessment user guide (32). The 50% mean score was used to classify the knowledge as good or poor.
Data quality control
The pre-test of the questionnaire was done on 12 departments which are found in health facilities outside of the Harari Region to identify any ambiguity, consistency, and acceptability of the questionnaire as well as the time needed to fill the questionnaires. The necessary modifications were made before the actual data collection.
The quality of data was monitored frequently both in the field and during data entry. This was done in the field through close supervision of the data collectors. All completed questionnaires were examined for completeness and consistency during data collection. An incomplete and unclear filled questionnaire was given back to the study participants immediately.
Data processing and analysis
Data were entered using Epi Data and exported to SPSS software version 25 for data recording, cleaning, and statistical analysis. Descriptive statics using frequencies, percentages, tables, and figures were used to describe the departments in the public health facilities, and the overall data quality was categorized as poor and good data quality. Bivariate logistic regression analysis was done to identify variables that were candidates for multivariate analysis. All variables that have an association on bivariate analysis at a liberal P-value of < 0.25 were considered for inclusion in the multivariate analysis. Afterwards, multivariate analysis was done to control the confounding effect of other variables and to identify independent predictors of routine health data quality in the health facilities. The magnitude and direction of the relationship between the variables were expressed as odds ratios (OR) with 95%CI and P-value < 0.05 was used to declare the statistical significance. Model fitness was checked by using Hosmer-Lemeshow’s test at P-value of >0.05 and a multicollinearity check was also carried out.