Study design and data source
Figure 1 illustrates the workflow. These patients were from the First Affiliated Hospital of Xi’an Jiaotong University (Xi 'an, China) for acute symptomatic VTE between June 1, 2013 and June 1, 2018. A total of 1,460 patients was determined as the required sample size to be enrolled in the study. Approximately 30% of patients refered from inpatient or emergency departments to peripheral vascular department; about 70% came from peripheral vascular department. Each patient was examined at baseline according to a standardized protocol, following recommended international standards. Diagnostic procedures carried out at our institution dedicated diagnostic unit, including pulmonary angiography or CTPA, and compression venous ultrasonography. Angiography was performed after obtaining written informed consent from the patients. Exclusion criteria were: 1) recurrent pulmonary embolism, 2) incomplete clinical data, 3) contraindication to CTPA/ angiography, 4) The patient refuses to complete diagnostic test. This research was approved by the Ethics Committee of the First Affiliated Hospital of Xi’an Jiaotong University (Approval No. XJTU1AF2018LSK-144).
Predictor variables
In a first step, we searched PubMed and Web of Science databases without language or time restrictions to retrieve relevant studies. The prediction factors were mainly derived from 2019 ESC Guidelines for the diagnosis and management of acute pulmonary embolism1 and a systematic review and meta-analysis that was designed to identify factors for VTE in hospitalized medical patients10. To maximize safety and model usability, we tend to choose reasonable and clinically relevant predictors that are easily available in medical hospital, especially for primary hospital. Because of some biochemical tests are not routinely available, we did not consider biomarkers endorsed by guidelines (D-dimer or pro-BNP). Age as one of continuous variables, was transformed into binary variables using pre-specified cut-offs either (>65 years and <65 years) derived from literature9. Meta-analysis found low-certainty evidence of association between risk of any VTE and CVC use10, we did not choose central venous cathete (CVC) use, this risk factor is less common in our sample population. There is probably an association between risk of any VTE and elevated heart rate(>100 beats per minute), therefore, we selected tachycardia (>100 beats per minute) and heart rate (as continuous variables).
From the medical records and hospital chart review, trained study doctors recorded all clinical and electrocardiogram data on a standard form, including demographic factors and clinical diagnoses. At the time of diagnosis, all the eligible cases underwent by trained clinical doctor to determine the presence or absence of signs and symptoms related to VTE, as dichotomous variables (yes/no), including dyspnea, hemoptysis, chest pain, syncope, swelling pain in the lower limbs and so on. The doctor should be careful to identify potential factors associated with APE and exclude pre-existing medical history, that are similar with the clinical manifestations of pulmonary embolism.
Based on literature and research reports, we screened more than 10 kinds of electrocardiogram sign associated with APE11–15. The ECGs obtained within the first 24 hours of hospital admission were included in the study. Patients with acute cor pulmonale were deemed present if we identified at least one of the following: 1) SⅠQⅢTⅢ, 2) T-wave inversion in right precordial leads, 3) S1S2S3, 4) pseudo infarction, 5) transient right bundle branch block. If the above signs had appeared in the past, they would be excluded.
Outcome variables
The primary outcomes of this study were as follows:1) an easy-to-use predictive model for acute pulmonary embolism was derived and validated, 2) a reasonable pipeline of disease risk prediction and factor analysis was introduced. All patients had a complete diagnostic examination for a definite diagnosis. Pulmonary embolism was diagnosed by pulmonary angiography or CTPA, and deep vein thrombosis was diagnosed by compression venous ultrasonography.
Derivation and validation of the models
The initial cohort comprised 1,582 symptomatic VTE patients. 36 patients were excluded due to incomplete data, 86 were excluded due to acute pulmonary embolism only, hence, 1,460 patients (DVT + APE vs DVT 773:687) were included in this study. Then, we randomly classified samples as training set (1,095) and testing set (365) in a 3:1 ratio. The training set was used to generate the prediction model, and testing set was used to evaluate the prediction performance of the model. Firstly, we performed univariate analysis to select predictor variables those significantly linked with APE diagnosis, using a cutoff of p < 0.05. To avoid overfitting, LASSO regression analysis was used to screened those APE diagnostic-related variables. Later, all APE diagnostic-related predictor variables were included in the multivariate analysis to assess independent predictor factors using logistics regression. Ultimately, we constructed sixteen APE diagnostic-related predictors as candidates for the prediction model. The area under the receiver operator characteristics (AUC) curves was used to evaluate the diagnostic efficiency of the model. Based on the AUC,
Brier score and calibration curves were used to evaluate the concordance between predicted diagnosis outcomes in training set and testing set. The prediction model distribution of patients in different risk levels, the number of censored patients, and the heatmap of APE diagnostic-related predictors were displayed. Establishment of the nomogram based on independent risk factors resulting from multivariate logistics regression to predict the APE probabilities for patients with DVT.
Handling of missing data
Except for age and gender, there were very few missing data for all variables. We eliminated the missing variables and analyzed the complete data.
Statistical analysis
The statistical analysis was performed in R software (version 4.1). p < 0.05 was regarded as statistically significant.