2.1. Data
In this study, we used the 2015 CHARLS data. CHARLS is a set of high-quality micro-data representing households and individuals aged 45 and above in China, to analyze the problem of population aging in China and promote interdisciplinary research on aging. The CHARLS national baseline survey was launched in 2011, covering 150 county-level units, 450 village-level units, and 17,000 of approximately 10,000 households. These samples can be tracked every two to three years in the future, and one year after the survey, the data is open to the academic community[38].
Our study focused on the middle-aged people. According to the WHO's division on age, we defined that the middle-aged people were those of 45~59 years old[39].
2.2. Health Status Measurement
SAH is an indicator used to reflect health status. The CHARLS questionnaire set up two surveys on SAH. Each survey set questions on SAH twice. The first was at the beginning of the survey and the second was at the end of the health survey. The respondent may not be very familiar with the survey at the beginning of the investigation. After asking about many health-related questions, the respondent would have a certain understanding of their own health, so we thought that the second one is more reliable, which is used in our study. In the CHARLS questionnaire, the SAH can be reflected as the following two questions:
Question 1: How would you rate your health status? Would you say your health is very good, good, fair, poor or very poor?
- Very good
- Good
- Fair
- Poor
5.Very poor
Question 2:Next I have some questions about your health. Would you say your health is excellent, very good, good, fair, or poor?
- Excellent
- Very good
- Good
- Fair
- Poor
We defined Excellent, Very good, and Good as 1, and Fair, Poor, and Very poor as 0. We set SAH as 0-1 variable.
Regarding the indicators of suffering from chronic diseases, the CHARLS questionnaire asked the respondent that whether they had a series of chronic diseases diagnosed by doctors, such as hypertension, diabetes, asthma, etc. We defined suffering from chronic disease as 1, and no chronic disease as 0.
The depression scale(CES-D10) was used to measure depression in the CHARLS questionnaire[40]. There was 10 items in the depression scale with 8 negative items and 2 positive items.We used a 4~point rating. Regarding respondent’s feelings and behaviors last week, those with less than 1 day was "rarely or not at all", 1~2 days was "not too much", 3~4 days was "sometimes or half the time", 5~7 days was "most of the time", negative statements ware positive questions, and the corresponding scores ware: 1 point, 2 points, 3 points, 4 points. The positive statements were just the opposite. The corresponding scores were: 4 points, 3 points, 2 points, 1 point, respectively. The score range of CES-D10 was from 0 to 40. The lower score meant lower depression. 0~20 points indicated no depression, 21~40 points indicated suffering from depression[41]. We defined depression as 1 and no depression as 0.
2.3. Variables
Considering the previous study, we used socioeconomic status, demographic characteristics and living environment as core explanatory variables and control variables[42~44].
1. Independent Variables
The core explanatory variable of this study is labor participation. In the CHARLS questionnaire, the respondent was asked whether he/she had engaged in agricultural production or business activities for more than 10 days in the past year. If the respondent replied less than 10 days, the respondent would be asked whether he worked more than one hour last week. The work status included employment or self-employment, such as civil servants, enterprise employees, self-employment, and help for their own companies without remuneration; while domestic work and volunteer services were excluded. If the respondent replied that he/she wasn’t engaged in farming and other work, he/she would be asked whether he/she temporarily stopped working because of temporary leave, sick leave, or receiving on-the-job training. Then the respondent would be asked whether it was more than 6 months. When he/she can return to his/her original job position, he/she also was defined that he currently had a job. In the CHARLS questionnaire, the measurement of this variable was obtained by comprehensive calculation of the following four questions:
Question 1: Did you engage in agricultural work (including farming, forestry, fishing, and husbandry for your own family or others) for more than 10 days in the past year?
Yes
No
Question 2: Did you work for at least one hour last week? We consider any of the following activities to be work: earn a wage, run your own business and unpaid family business work, et.al. Work does not include doing your own housework or doing activities without pay, such as voluntary work.
1.Yes
2.No
Question 3: Do you have a job but are temporarily laid-off, or on sick or other leave, or in-job training?
1.Yes
2.No
Question 4: Do you expect to go back to this job at a definite time in the future or within 6 months?
1.Yes
2.No
According to International Labor Organization (2013) [45], we define the following three types as not participating in labor: (1) not engaged in the past year agricultural production and operation activities; (2) be engaged in other work for less than 1 hour per week; (3) be engaged in temporarily stopped working for various reasons, at the same time, there is no certainty that they will return to their original jobs within six months. We define the participation of labor in the state of work with the exception of the three types.
2. Control Variables
Previous studies showed that economic resources had a positive impact on health[46]. Only with a certain economic strength is it possible to participate in health clubs, obtain adjuvant treatment for quitting smoking and drinking and so on.[47]. Based on previous researches, our study selected individual demographic characteristics (gender, age, marriage and education), social activities, drinking, smoking, health before the age of 15 and family property as control variables. The definition and measurement of specific variables are shown in Table 1.
2.4 Propensity Score Matching Method
The labor participation of middle-aged people may be affected by health factors, some unobserved factors of their own, endogeneity and so on. As the regression model using cross-sectional data may cause data bias and confounding variables, Propensity Score Matching (PSM) was used to match those who participated in labor and those who did not. PSM is aimed to reduce the influence of these biases and confounding variables in our study. In order to explore the relationship between a certain factor (exposure or intervention, hereinafter collectively referred to as treatment factor) and health outcome, we should build a control group for comparison. The purpose of our study is to control the interference of non-treatment factors and highlight the effect of treatment factors (Average Treatment effect on the Treated, ATT). The calculation formula of ATT is as follows:
hi1 and hi0 indicate the health status of those who participated in labor and those who did not. The logit model is generally used to estimate the probability of whether the respondents enter the treatment group, that is, the probability of middle-aged labor participation. The propensity score estimation model is as follows:
PSM includes nearest neighbor matching, radius matching, kernel matching, local linear regression matching, and spline matching. We used three methods of radius matching method, caliper matching method and kernel matching method to match in our study, mainly focusing on the nearest neighbor matching method. The nearest neighbor matching method refers to setting a radius r in advance. If the radius value is smaller, the matching number is fewer .
2.5 Logistic Regression
Logistic regression deals with bivariate dependent variables or multivariate variables. In our study, the dependent variable is a 0-1 binary variable. In order to explore the effect of labor participation on the health of middle-aged people, we combined with the regression equation of health and health level proposed by Wagstaff (2003), and we set the logit model as follows:
h is a binary variable that represents the health status of individual; i. represents SAH, chronic diseases and depression. Work status represents labor participation and it is the core explanatory variable in this study. x is the control variable, including gender, age, educational status, etc.
2.6 Statistical Analysis
We defined the middle-aged people who don’t participate in labor as a control group (no labor participation = 0), and the middle-aged people who participate in labor as a treatment group (participation labor = 1). Then we matched the two groups to improve the comparability. The descriptive statistics of the variables are shown in Table 2. 4.40% in control group and 24.49% in treatment group of middle-aged people reported that they were healthy. 80.84% of the middle-aged people participated in work, and 19.16% of middle-aged people did not participate in work.