Systemic autoimmune disease (SAD) is an umbrella term for autoimmune diseases that could affect all body systems and organs. SADs are typically divided into major categories, such as systemic lupus erythematosus (SLE), rheumatoid arthritis (RA), and Sjögren’s syndrome (SS). However, patients with specific SADs present with considerably varying manifestations. Regarding the initial presentation of SLE, a molar rash, pleural effusion, or even septic shock can be the first manifestation of SLE [1]. Moreover, severity and prognosis considerably vary among patients with the same SAD [2]. Some patients meet sufficient criteria and receive a clear diagnosis of SAD, whereas other patients fail to meet criteria required for the diagnosis [3]. The high complexity of SADs has hindered their precise management.
To provide more meticulous treatments for SADs, detailed identification of heterogeneous SAD subgroups would be the key first step for developing tailored healthcare strategies. Regrouping or reclassification of various well-known diseases has attracted considerable interest. Diabetes, among the most prevalent diseases, is classified into four categories, not only type 1 and type 2, based on the traditional classification [4]. Myocardial infarction, another major disease with a high prevalence, is classified into five types based on pathological, clinical, and prognostic differences, and treatment strategies vary for these five types [5]. An update of the classification scheme enhances clinical usefulness and establishes an accurate diagnosis [6]. Due to their complex nature, the current classification scheme for SADs does not meet the clinical need.
Redefining SADs is an ongoing task in the rheumatology community. European and American committees established new classification criteria for SLE in 2019 [7]. In addition to experts’ consensus, antinuclear antibody (ANA) testing and data-driven methods have been employed to refine the definition of SLE [7]. In fact, in the realm of disease diagnosis (e.g., autoimmune diseases, cancers), relying solely on individual biomarkers often proves inadequate. To counter this limitation, utilizing a biomarker panel consisting of multiple markers has shown substantial potential for enhancing diagnostic accuracy [8]. However, due to the intricate and implicit numerical patterns within such biomarker panels, the incorporation of machine learning and artificial intelligence technologies becomes pivotal in deciphering these specific disease patterns, thereby further amplifying diagnostic precision [9, 10]. On top of the approach, using numerous objective measurements (e.g., biomarker testing) with data-driven analytical methods has been advocated as a more adequate approach for redefining SADs. Using data-driven analytical methods, such as unsupervised clustering, studies have demonstrated that the heterogeneity of autoimmune diseases can be efficiently deconstructed to obtain clinically meaningful insights [11–17]. A study identified distinct groups of patients with anti-Ku syndrome developing different types of severe comorbidities using clinical features [16]. Another study classified patients with particular SADs by performing cluster analysis based on principal components (PCs) derived from a multiple correspondence analysis (MCA). Although dissimilarities were observed in the positive rates of autoantibodies and the frequencies of SADs between clusters, the association between clusters and clinical characteristics remains unclear [17].
This study investigated heterogeneity in SADs, attempting to build an algorithm to distinguish disease subgroups by applying a method that jointed dimension reduction and clustering to immunomarkers and evaluated heterogeneity based on clinical manifestations.