To improve the effectiveness of the diagnostic process, the ideal scoring system should work as an effective and accurate tool that accelerates and improves the decision-making process and simultaneously reduces the need for complementary imaging studies [22].
The aim of this study was to validate the effectiveness of the most commonly used CPRs and to develop a new streamlined and efficient scoring system.
In this sense, the most efficient of the CPRs evaluated was the Alvarado score, which has been confirmed in multiple previous studies [3, 17]. This score enables risk stratification in patients with RIFP with the quantification of eight variables. The other CPRS shows a lower diagnostic efficiency with an increase in the number of variables evaluated.
The newly developed CPR (the HMC score) group included six variables: anorexia, abdominal pain with less than 48 hours of evolution, migratory pain to the RIF, WBCC > 8.275 leukocytes/µL, NTF > 75%, and axillary temperature between 37ºC-39ºC. The score performs well as a predictor of AA with an area under the ROC curve of 0.81 (p < 0.001), with an improved diagnostic performance over the other scales (Fig. 3).
It is composed of three symptoms and three clinical data categories, which are easily identifiable by the patient and the evaluator, respectively. The HMC score has the advantage of being simpler (with fewer items) than the previous ones (Alvarado, RIPASA, AIR, and AAS), eliminating subjective data such as the degree of defense/rebound in the abdominal exploration (AIR and AAS), and data that are not always collected in the patient’s medical records.
This score established a cut-off point for the leukocyte count. Although it has already been shown that individual or combined analytical tests have limited or little specific value when predicting AA, their simultaneous negativity allows practically negating the diagnosis of AA [23]. In a prospective study of 1,032 patients, Lau [24] concluded that the elevation of the WBCC and the percentage of neutrophils simultaneously increased the diagnostic specificity for AA. In another study, Atema [25] found that a WBC count of > 20,000 associated with symptoms for more than 48 hours was associated with a positive predictive value of 100%.
Among patients with AA, the reported sensitivity and specificity rates of leukocyte counts were 60%-87% and 53%-100%, respectively [26], with different leukocyte cut-off points: 11,000 leukocytes/µL in the study of Bilic [27] and 10,400 leukocytes/µL reported by Narci [28]. Our leukocyte cut-off point was 8,275 leukocytes/µL, which increased the sensitivity of the test and, when combined with NTF (> 75%), the specificity was also increased. The percentage of neutrophils is by itself considered the best diagnostic marker for AA and is also related to its severity [25].
Another aspect introduced by the HMC scale is in reference to body temperature. Fever is one of the variables present in most of the RIFP diagnostic scales (Alvarado, RIPASA, and AIR). However, many authors believe that the predictive value of fever for AA is limited [29, 30]. Andersson [31], in a study of 496 patients, demonstrated that a temperature > 37.7 °C had a sensitivity and specificity of 70% and 65%, respectively, for the diagnosis of AA. In a later study, Andersson found that the mean temperature in nonsurgical abdominal pathology was 37.7ºC, and only its persistence in serial physical evaluations would indicate the presence of complicated AA [32]. Therefore, temperature, as an independent variable, is not as useful [3, 29]. In our scale, an axillary temperature between 37ºC and 39ºC was associated with a lower risk of AA. For that reason, and in agreement with these authors, our data support the idea that temperature is not a good predictive value of AA pathology. Its presence in patients evaluated for RIFP should alert clinicians to the possible existence of other intra-abdominal pathologies, such acute gastroenteritis, pelvic inflammatory disease, etc.
Otherwise, it is well established that the diagnostic approach to RIFP is conditioned by certain characteristics of the patient, such as age and sex [2, 4]. When comparing the global cohort of female patients with AA, we found that the HMC scale presented an AUC = 0.84 [0.77–0.90] (p < 0.001), which was higher than the AUC of the other CPRs. The data were even more obvious when we analyzed the group of women between the ages of 15 and 64 with an AUC of 0.86 [0.78–0.93] (p < 0.001). Additionally, the diagnostic approach in women of childbearing age is particularly difficult because of the overlap of gynecological symptoms with those of AA itself, causing an increase in NA due to diagnostic errors [33]. It has been postulated that CPR scores fail to properly evaluate this subgroup of patients because the scores cannot adequately exclude the presence of gynecological pathologies. In fact, a diagnostic scale has been developed for the management of acute abdominal pain in women of reproductive age [34].
When we applied the HMC score to women between 15 and 64 years old, we obtained a very high degree of success for the diagnosis of AA because of the 44 patients in this age subgroup with an HMC score ≥ 41, only one of them had a diagnosis recorded as AN, which improves the data provided by other authors [35]. However, female patients with a score ≤ 25 had the highest rate of NA (20 out of 44). These results support those collected in other studies that also showed high rates of NA in women of childbearing age [29, 36] and support the early implementation of imaging tests in these patients [37].
Another group of patients with specific characteristics is the pediatric group. In this subgroup, the diagnosis of AA is a challenge both for the presence of nonsurgical pathologies that resemble appendicitis and for the difficulties of the anamnesis and exploration of these patients [14]. The rate of diagnostic errors increases as age decreases, and children 3 under three years of age have up to 5 times more risk of complicated AA [38]. Unable to provide data on patients under five years of age, our results show that NA was more frequent in pubescent girls between 10 and 14 years old (60% in our cohort), which are similar results to those found by Güller in a retrospective study of 7452 cases [39].
The HMC scale was shown to be an acceptable predictor of AA in pediatric patients, with an AUC = 0.74 (0.59–0.90; p = 0.019), a result not achieved when applying the other scales. A high score on this scale was 100% diagnosed by AA, which could have avoided the use of ultrasound, a conclusion similar to that derived from the study of Blitman in which the Alvarado score was applied [14]. On the other hand, authors such as Fleischman [40] showed that low scores of the appendicitis scales in children had good sensitivity to rule out AA and, therefore, to save diagnostic imaging tests with certainty and avoid unnecessary radiation risks.
Consequently, we believe that imaging tests improve the diagnostic accuracy, avoid errors and delays in definitive treatment, and should be performed in the diagnostic workup of doubtful diagnoses (intermediate scores) followed by CT scan when needed, a strategy supported by other authors [41, 42].
Finally, in elderly patients, the AA rate is approximately 10%, although with the aging of the population, these figures are increasing [43]. Comorbidities, the insidious onset of the disease and the delay in diagnosis with the high rate of perforations make AA pathology with high using and mortality rates in elderly patients [44]. The diagnostic scales for AA were designed with a young population, so their effectiveness in an elderly age group is not well documented [45]. For all this, and in the same way as other authors [44], we recommended the early use of imaging tests in these patients, especially in the presence of inconclusive clinical data.
In our study, 11.1% of the patients were within this age group, with only 3 results of NA. None of the CPRs tested were statistically significant when applied to this group to discriminate between AA and NA. Nevertheless, the HMC scale was statistically significant, with the best AUC for elderly patients out of all the scores (0.86), showing that it was also a good predictive model for these patients. However, this sample size seems to be too small to make suitable comparisons with other published data.
The major weaknesses of this study are its retrospective nature, which increases the potential for bias and that it is a single center study. Among the strengths, it stands out that all patients have been treated by a small number of surgeons, with an adequate level of criteria uniformity and that in more than 95% of the cases, the clinical data were complete. Obviously, the score developed requires a validation that is currently being implemented in our center.