Hypotheses development
In this paper, we summarize four borrower’s language features, expression redundancy degree, objectivity degree, short-sentence preference degree and punctuation control degree. Expression redundancy measures whether the borrower is wordy and verbose during the process of expressing information; Objectivity is an indicator to measure how detailed and objective when the borrower discloses his personal information; Short-sentence preference is to measure the borrower's preference for short sentences; Punctuation control will show us borrower's attention and control to punctuation. The specific definition and calculation will be explained in detail in the later section. At present, we will make assumptions about how these features influence borrowing success and default rates based on the previous researches.
When Kahneman and Tversky (1973) studied on the Intuitive predictions, they concluded that in making predictions and judgments under uncertainty, people prefer to rely on a limited number of heuristics and representativeness. As a result, investors tend to read selectively to find key information or words which can be to representativeness of borrowers’ personality to access the borrowers. Therefore, if the borrower's expression is too verbose in the loan description and loan title, and prefers long sentences which may cover the keyword information, it will disturb the lender's granting decision to a certain extent. Previous studies have also included basic information about the readability of the loan description (i.e., average word and sentence length) as control variables (Pope&Sydnor,2011).In addition, the usage of punctuation in the text reflects the borrower's control and cognitive ability. Excessive usage of punctuation makes loan description informal and reduces the readability of the text (Chen & Huang, 2018). Therefore, we suppose that the detail that whether the loan description sentence ends with a punctuation mark also has a certain impact on the readability of the loan request description. Research shows that lenders prefer borrowers who disclose their personal information proactively (Jiang,Wang,&Chen, 2018). For instance, some borrowers will use their true names, telephone number as their account names and disclose their detailed profession and revenue information by quote figures in their loan request description which are not indispensable in platforms. So it can be inferred that the objectivity of the borrower's expression has positive associations with funding success. Therefore, we propose Hypothesis 1 as follows:
H1a: Redundancy degree in loan requests has negative associations with funding success
H1b: Short-sentence preference degree in loan requests has negative associations with funding success
H1c: Objectivity degree in loan requests has positive associations with funding success
H1d: Punctuation control degree in loan requests has positive associations with funding success
The loan success rate measures the lender's feedback on the borrower's information disclosure, while the repayment probability reflects the borrower's financial status and credit situation, which is often closely related to personal characteristics. Previous studies have shown that personality quality and personal identity content play an important role in predicting loan default (Herzenstein et al, 2011; Larrimore et al, 2011). The larger the amount of key information in the loan description, the less likely the borrower is to default (Yu, 2017). According to the definition of this paper, a borrower's expression redundancy degree is high means that the same text length contains less key information. The borrower's expression exists largely to maintain his/her image to others, which has the possibility of self-whitewashing and intentional inducement (Baumeister, 1982). To a certain extent, objectivity may also be one of the tools for borrowers to obtain an objective image. We speculate that borrowers with such whitewashing are often less honest than they express. The borrower's preference for sentence length is mainly related to the lender's reading experience. Generally speaking, borrowers who prefer to use short sentences are more colloquial in language expression. Their preciseness and self-control ability may not be as good as those who prefer writing long sentences, so we predict their default probability would be higher. As mentioned above, the punctuation control in loan requests reflects the borrower's self-control ability. We speculate that borrowers with strong self-control ability have stronger constraints on their own behavior and higher repayment probability. Therefore, we propose Hypothesis 2 as follows:
H2a: Redundancy degree in loan requests has negative associations with default probability
H2b: Objectivity degree in loan requests has positive associations with default probability
H2c: Short-sentence preference degree in loan requests has positive associations with default probability
H2d: Punctuation control degree in loan requests has negative associations with default probability
Method
Data source and sample selection
The data used in this study are obtained from Renrendai (renrendai.com), one of the largest peer-to-peer lending platforms in China. This study uses all loan requests created on Renrendai between June 1, 2016 and September 30, 2018. 13,485 loan requests are selected as our research samples by random sampling. Then manually extract other required texts from Renrendai web page according to the loan number. We preprocess the collected data as follows: first, exclude the institutional guarantee, field certification and credit assurance certification; second, exclude the loan titles, blank loan description or the texts without any language features (for example, "42378", "reserved zzxnejek,”, etc). There are 3,567 pieces of data remained in total, and 2,720 pieces of effective data are finally retained after the abnormal data is eliminated.
Variables explanation
In this paper, for the borrower's language features, expression redundancy, objectivity, short-sentence preference and punctuation control are defined as follows:
Redundancy: expression redundancy measures whether the borrower's language expression is wordy and verbose. It is calculated by dividing loan description words by number of key information. Higher redundancy means the borrower describes less key information in more words, that is, the more useless and wordy words in loan description, the higher the redundancy. The number of key information is obtained by manual marking one by one. The effective information includes personal identity, position, residence, income, loan purpose (unclear purposes such as daily consumption and turnover are not included), car and real estate situation, repayment source, etc. Some studies have shown that the lender does have a certain investment bias for the borrower's loan purpose and other effective information (Zhuang, Zhou ,& Fu,2015).
For example, if the loan application text is "The loan is mainly used for starting a business. I have real estate without mortgage and I have a good credit with no overdue. The source of repayment is my wage income, which is about 5000 yuan per month, so there is no pressure for repayment."
Here "starting a business", "have real estate without mortgage", "without overdue" and "The source of repayment is my wage income, which is about 5000 yuan per month." are four key information.
Objectivity: an indicator to measure whether the borrower's language expression actively discloses personal information and whether the information disclosed is detailed and objective. Objectivity degree is calculated by the number of figures quoted in the loan description, which is also manually marked one by one. The result of Larrimore (2011) show that the use of quantitative words that are likely related to one’s financial situation had positive associations with funding success which was considered to be an indicator of trust. We reason that the number figures quoted can to some extent affect the lender's granting decision on whether the borrower is trustworthy or not.
For example, if the loan application text is "I have twice borrowed money from Renrendai, with a good repayment record. At present, I have built 4 houses, all of which have applied for real estate certificates. I want to decorate one set for my own use. But due to the shortage of funds, I take this platform to raise funds. " Here the number of figures quoted is 2.
Short-sentence preference: an indicator to measure the borrower's preference for short sentences. Short-sentence preference degree is calculated by dividing number of punctuation marks by number of loan description words. The larger the value is, the more the borrower prefers to use short sentences. Using short sentences often can increase the readability of the text.
Punctuation control: an indicator to measure the borrower's attention to punctuation. Here, it mainly uses whether the loan description ended with punctuation as the judgment standard. Although it is only a small detail, it can reflect the borrower's standard degree to the written language format to a certain extent. At the same time, the borrower remembers punctuation at the end of the loan description can make the description text more formal. Such a borrower may have the personality characteristics of doing a thing through from beginning. We believe that the characteristics may have a predictive effect on the performance of the borrower after successfully obtaining the loan. Other variables are shown in Table 1.
Model Construction
In our study, we employ logistic regression model to study the influence of borrower's language features (redundancy, objectivity, short-sentence preference and punctuation control) on the funding probability and default probability of P2P lending market. Our empirical model is as following:
where the dependent variable Yi is a binary variable equal to 1 if the borrowers successfully have their loan requests granted (or default after receiving funding) and 0 otherwise. The main explanatory variables are Red (redundancy), Obj (objectivity), Sen (short-sentence), Pun (punctuation), the borrower's language expression features extracted in this paper: expression redundancy, objectivity, short-sentence preference and punctuation control. Xi is a vector of control variables, including amount of borrowing, interest rate, credit score of the borrower, age, education level, income, marriage status, working experience, etc., and ε is the random disturbance term.
Descriptive statistics
In this paper, first of all, descriptive statistics are conducted for the variables, and the results are shown in Table 2. As a result, our sample includes 2,720 loan requests, of which 2,474 were successfully funded while the remaining 246 were not funded. Among all requests that were successfully funded, there were 1,331 defaults and 1,143 loans were repaid on time. In addition, we can roughly see the demographic characteristics of the online loan market. Male borrowers are more active in the online loan market (the mean value of gender is 0.8739), mainly aged between 20 and 40 years old. Married people account for a large proportion (the mean value is 0.6235). The average academic background of the borrowers participating in the online loan is above college, and their educational level is higher than the social average. The average worktime of the borrowers is between 1 and 3 years, but they always have low income level (the average income is 2.9353).
Table 3 lists the correlation coefficients for all four key explanatory variables used in our analysis. Data shows that there exists correlation among the four key explanatory variables, but the correlation is relatively low. It also suggests that multicollinearity problems don't arise, so subsequent regression processing can be carried out.