As a monitor corpus, the size, and dimensions of the Malaysian Diabetes Corpus are expected to grow with time, depending on available resources. Currently, the corpus is drawn from a popular and established Malaysian English newspaper and this may limit the representativeness of the study findings. Further, the numbers are relatively small for a corpus. However, the small sample size is not considered a cause for major concern as MyDC is a specialised corpus and is not meant to be used as a reference corpus like the British National Corpus, for example.
The average readability score calculated in this study was 49.6, and to put it in perspective, this value may be compared with the oft-cited readability value of Reader’s Digest which is 65, and New York Times which is 52. Clearly, there is a general indication that the newspaper articles within the corpus may be too difficult for the average reader. Research has also pointed out that newspaper articles have different readability values based on content types. Flaounas et al. (2013) described a massive-scale readability analysis of English newspaper articles from 99 countries and ranked ‘Sports’ and ‘Arts’ at the top with a score of 54 and 48, respectively. They listed 16 content types, but unfortunately, ‘Health’ is not among them. However, they also included an Average score for all which stands at 42. The Mean score for the dataset in this study is higher (better readability) than the average readability of materials used in their study. This fact can be viewed positively but in a cautious manner as the readability is still below what is desirable if the objective is to create awareness about diabetes among the public.
With regards to the patterns of the articles’ readability seen over the years as captured in this study, the range of 45-56 is considered consistent, suggesting that the articles are in the difficult to read category. The variance itself is not significant enough to provide any interesting discussion because of the low general readability of the materials. Furthermore, there were no recorded phenomenon or events during the years covered to indicate any factors that may have been responsible for the slight variations. However, the results show slightly higher numbers than the average put forth by Flaounas et al. (2013) and at least for the Year 2016, the average score is better (55.93) than the best average (54 for Sports) discovered in their study.
Calculations for readability are mathematical approximates that evaluate the ‘readability’ of a piece of text quantitatively. As a strictly quantitative method, it yields usable information when covering large numbers of articles but is very limited in its abilities to drill down into the features of individual articles. The third research question is meant to enrich our understanding of what makes a text more readable to the layperson in a more qualitative manner.
The best ranked articles seem to share similar textual features. They were written in a conversational manner, with heavy use of simple active sentences. This helps with readability scores as active sentences normally result in words with fewer syllables and are well-known to be simpler than their passive counterparts. The problems posed by heavy use of passive sentence constructions to readability is well known and has been strongly criticised by John O’Hayre (1966). He came up with the Lensear Write Formula that factors in the use of active versus passive sentence constructions in evaluating readability. Although he meant it for government employees at the time, his words are still insightful for medical and public health practitioners today: “We know we can’t write simple, straightforward English without a lot of effort, so we automatically fall back on our technical jargon where we feel safest; this kind of writing is easiest for us to do” (O’Hayre, 1966, p. 27).
Also notable is the fact that two out of the three subjects are not medical or healthcare personnel. This may suggest that the layperson may be better articulated to convey information to the public. Medical experts may find it difficult at times to ‘speak plainly’ as their ‘normal’ language may already include terms or styles common to their professions (i.e., medical jargon). This is not a new problem. For example, in the field of e-learning, an instructional designer normally sits in between the subject matter expert (SME) and the learner. The instructional designer’s task is to redesign and reformat the knowledge from the SME into something easier and ‘consumable’ for the learner. Writers for the mass media as well as medical personnel communicating with the public via mass media should be aware that efforts must be made to simplify the language as much as possible.
The article ‘Worst 1’ has an FKRE score of only 25, making it the least readable in the dataset. It is a report of a scientific study and thus, this finding is not surprising. However, the newspaper is meant to be read by the public, which means that low readability is not desirable. Smith, Buchanan and McDonald’s (2017) suggestion that medical publications should include a lay person summary for each article could be of immense value in such cases. Additionally, research reports that are of significance to the public would do well with a ‘press release’ version that can incorporate readability enhancements to make it easy for news outlets to publish.