For the BHIS 2013 the province of Luxembourg financed an oversampling of 600 individuals. Consequently, the total net sample size to be obtained was set at 10,600 individuals. For the BHIS 2001, several provinces financed an oversampling, which resulted in a required net sample of 12,050 participating individuals. As the BHIS is a household survey, household reference persons from the NR were selected. A multistage stratified sampling design including several sampling techniques; stratification, clustering, systematic sampling and simple random sampling, was used. Municipalities served as Primary Sampling Units (PSUs) and were selected through a systematic sampling method with a selection probability proportional to their size. In each selected municipality one or more groups of 50 individuals had to be interviewed throughout the calendar year, divided into 4 trimesters (that is; per trimester around 12–13 individuals in every group had to be interviewed). A systematic sampling method was used to select households, the Secondary Sampling Units (SSUs). The number of selected households in one group corresponds with 50 individuals. Finally, at most four individuals – the Tertiary Sampling Units (TSUs) – were selected for the interviews within each household: by default, the reference person and his/her partner (if any) and two (or three) remaining household members using random selection (29).
Field substitution was applied at the level of the SSUs: for each household, three substitute households matched on statistical sector, age group of the households’ reference person (administrative contact of the household) and household size were selected in order to create clusters of four households. Once the clusters were created, an ad-random scrambling was applied in order to identify the initial household to be contacted and the potential first, second and third substitute households. For every cluster, an unmatched substitute cluster was identified with the potential fourth till seventh substitute household. Also within a substitute cluster, households had similar characteristics, but there was no match with the characteristics of the households in the original cluster. While in the BHIS 2013 cluster substitution was unconditional (once a cluster was exhausted, the first household of the substitute cluster was activated), in the BHIS 2001 cluster substitution was conditional to the number of participating individuals which were part of the initial cluster of households.
The fieldwork of the BHIS was spread over a whole calendar year and samples were taken each quarter about 6 weeks before the start of the quarter. An online verification of the vital status of the reference person was performed a few days before the start of the data collection. Reference persons with a change of vital status (e.g. died, moved abroad) and their corresponding household were removed from the sample. The order of the remaining households within the clusters was adapted accordingly. Consequently, a very limited number of clusters counted less than four households. For every selected household and for every selected household member a unique identifier was created. An algorithm was developed to assure the conversion from this identifier to the corresponding number in the NR and was entrusted to a Trusted Third Party (TTP); Statistics Belgium (Statbel).
Para-data obtained during the data collection of BHIS were used to assess the practice of substitution throughout the fieldwork phase. For every activated household (that is; for every household that was effectively invited to participate in the BHIS), interviewers had to document the date, hour and mode (by telephone or at doorstep) of every contact-attempt. At least five contact-attempts, of which at least one at doorstep had to be made before a household could be labelled as non-contactable. In case a household was not contactable or refused to participate, a substitute household was activated by the central administration of the survey. The same contact procedure as for initial activated households is used for substitute households. This para-data enabled a strict follow-up of the fieldwork phase and enabled to assess whether the activation of a substitute household was justified.
Data on the educational level of the household’s reference person were derived from the Administrative Census 2011 and the Census 2001. The Census 2001 (officially entitled the “General Socio-Economic Survey 2001”) was the last ‘traditional’ census based on an exhaustive postal survey among households. Participation to the census was compulsory and resulted in a participation rate at household level of 96.5%. Questions on the educational level had to be completed for every household member aged 15 years and older.
The Census 2011 was based on linked administrative databases and covers, among other, data on the educational level (highest diploma) provided by the Belgian communities responsible for the organization of education. For what concerns the highest level of education, the Census 2011 was an update of the data collected in the context of the Census 2001. For those who obtained a (registered) diploma in the period 2001–2011 that was higher than the one declared during the Census 2001, the highest educational level was adopted.
A first assessment of the data completeness revealed relatively high levels of item-missingness for educational level in the Census databases (e.g. for 16.1% of all reference persons sampled for the BHIS2013 information on the educational level was missing in the Census 2011). Since complete data is an absolute prerequisite to assess the substitution process, regression based multiple imputations (m = 5) procedures were applied, presuming missingness at random (MAR). Variables added to the model were gender, age group and household size. An analysis of cluster homogeneity/heterogeneity in terms of educational level showed a very high level of cluster heterogeneity; only in 13.2% of all clusters the reference persons of the four households had an identical educational level.
After having received the permission of the Belgian Privacy Commission, the BHIS 2013 sample data were one-to-one linked with the BHIS 2013 para-data and the data on the educational level, derived from the Census 2011. The BHIS 2001 sample data were equivalently linked with the BHIS 2001 para-data and the data of the Census 2001. The reference persons highest achieved educational level, stored in the Census databases in 6 categories according to the International Classification of Education (ISCED) (32), was regrouped in three categories: low educational level (no diploma, primary education (ISCED 1) and lower secondary education (ISCED 2)), middle educational level (higher secondary education (ISCED 3) and post-secondary non-higher education (ISCED 4)) and high educational level (bachelor and master (ISCED 5) and doctorate (ISCED 6)).
All households effectively invited to participate in respectively the BHIS 2013 and the BHIS 2001 were re-ordered in terms of the original clusters, that is; in terms of initially selected households, first substitutes, second substitutes, etc. For each substitution wave, the response rate according to the educational level of the households’ reference person was calculated. Given their limited number of cases, the fourth till seventh substitutes were grouped. Differences in response rates were assessed by the Delta method using the TEST statement in SAS© PROC MIANALYZE. A sensitivity analysis, taking only households for which the educational level of the households’ reference person was known into account, was applied to test the robustness against departures from the MAR assumption.