The World Health Organization, International Classification of Diseases, Clinical Modification (ICD-CM) code system, has simplified the extensive amount of information on medical records, improving the identification of diseases, and has an impact on quality-of-care assessment, allocation of resources, evaluation of management patterns and outcomes of diseases. The need for specificity after ICD 9th edition due to the broadness of concepts was overcome with the establishment of the tenth revision in 2014. 10Despite the above, determining the extent of hematological diseases based solely on hospital discharge codes could be problematic.
Our report highlights the potential limitations of the use of ICD coding for its use in studies. Remarkably, 11% of the patients had a concurrent diagnosis code for PV and SE. Although the percentage of misdiagnosis in this study may appear to be modest, these chronic illnesses have a high cost on health expenses. An analysis made by Metha et al revealed an age-adjusted prevalence of 56.5 per 100 000 of patients who suffer from polycythemia vera and reported that in 2010, the annual cost reached $14 903 dollars of overall health care, outlining a considerable increase in comorbidities in this population. 11 However, these results, as their authors describe, may have limitations and biases for the same reason as this study. The impact of erroneous adjudication of diagnostic codes may have an impact on interventions using data from public health registries, surveillance and disease control at a population level. It has been proven that a correct diagnosis after profound data analysis could lead to a tremendous improvement in health costs. 12
The numerous variables, the high number of unexplained cases reported, as shown in a cross-sectional study from NHANES 2007-200813 and the misidentification of erythrocytosis seen in some cases 14, make data collection complex. Misidentification of these cases may be consistent with the low reported data rate of polycythemia cases at present. Larger-scale studies are needed to further identify the impact on healthy areas, as the outcomes of the above erythrocytosis research are hitherto unknown.
Challenges of diagnosis between etiologies of erythrocytosis (PV versus SE) could be clarified after JAK2 somatic mutation genotyping is performed, as it prevails as a determinant tool in diagnosis. 8 In our cohort, the prevalence of the JAK2 V617F mutation was estimated to be 1.5% among cases of SE. Interestingly, we found that 3.7% of the SE patients had other clonal abnormalities of germline origin in the JAK2 gene. Even though the significance of these mutations and the risk of developing PV or leukemia phenotypes are unknown, tools to predict functional effects, such as the DANN score, could establish a probable risk of pathogenicity in some of these mutations. 15,16] With the development of more sensitive molecular diagnostic techniques, the prevalence of clonal hematopoiesis abnormalities seems to have increased in recent years and explains (at least partially) the differences seen across cohorts, including ours. 17
While the number of patients without erythrocytosis was small in our study, the fact that bias can be introduced systematically in the identification of cases through disease code-based registries questions the validity of studies done with only disease codes as the strategy to identify cases.
This observation has been highlighted in previous publications, which increased the awareness of potential error sources and recommended code users to better evaluate the applicability and limitations of codes for their study of a particular disease or medical conditions. 18
In our cohort, on the other hand, the code described as IE was given if the possibility of any etiology of erythrocytosis, whether primary or secondary, was excluded, and no further explanation was found for the presence of this phenomenon. 19]Consistent with the literature, aside from IE cases, the most commonly identified etiology of SE in this study was hypoxia, and approximately 24 (54.54%) patients had SE caused by obstructive sleep apnea (OSA).
The main limitations in our study relate to its retrospective design of selected cases referred to a cancer center. Moreover, our data analysis was restricted to patients who had JAK2 mutation molecular diagnostic studies. Furthermore, since several patients were evaluated as “second opinions” and their follow-up visits were carried out in various departments and medical centers, the etiology of erythrocytosis could not be established, accounting for approximately one-third of our cases. Another limitation includes the heterogeneity of the IE battery test used to evaluate secondary etiologies and the absence of follow-up data in several cases.
The use of ICD data capture modalities is critical to accurately identify specific populations of interest to conduct retrospective research; however, limitations are associated with possible misdiagnosed cases. Even though in our study the majority of the patients were correctly classified, individual medical chart review may be necessary until a better classification method is reached to decrease the possibility of introducing bias in such studies, especially in those cases with conflicting ICD diagnoses. Furthermore, the relationship of the codes with the true clinical diagnosis, as well as the identification and systematic management of dual codes in software healthcare systems, to decrease case misidentification should be a priority to improve public health and clinical studies. Research based exclusively on ICD codes could have a potential impact on public health and patient care, and limitations must be weighed when research findings are conveyed.