Aim
To determine what data features, contained in EHRs, are associated with risk of lung cancer in a current, ever and former smoking population.
The systematic review will focus on identifying data features in EHRs that are associated with incidence of lung cancer. Where appropriate, data on measures of effect will be extracted.
Eligibility Criteria
Studies eligible for inclusion must meet the following criteria:
1. Study designs
As the review is interested in determining data features associated with risk of lung cancer, observational studies such as cohort, case-control, case series, cross-sectional and prospective designs will be included. Systematic reviews are also eligible for inclusion.
2. Participants
Participants included in studies will be current, ever and former smokers as this group is most at risk of developing lung cancer. Ineligible for inclusion are studies that feature participants 18 years or younger or/and that do not include a measure of effect (e.g. risk ratio (RR) and 95% confidence interval (CI)). Data on participants used to model risk must be from electronic health records.
3. Interventions
Studies examining interventions will not be included.
4. Comparators
The comparator group to current/ever/former smokers will be non-smokers or those not diagnosed with lung cancer. Case-control studies include those who do not develop lung cancer as the comparator, or they may compare non-smokers with current or ever smokers.
5. Outcomes
Studies to be considered must contain an estimate of risk for lung cancer as the main outcome being considered e.g., the risk of lung cancer for an individual who is a current smoker and has asthma. The primary outcome will be presented as a measure of effect (i.e. as the RR, hazard ratio (HR), odds ratio (OR), incidence rate ratio (ICR) or standardized incidence ratio (SIR)) for each risk factor and data feature, with the 95% CI.
6. Setting
Studies of any type of setting will be included.
7. Language
International studies will be included but must be in English.
Search strategy
The search strategy will examine the following databases for relevant studies, using the same search string which will be adapted to the database under review: Cochrane library, MEDLINE (ovid), Scopus and Web of Science. The search strategy used for MEDLINE and adapted to other databases is given in Additional file 2.
Websites for EHR and administrative databases will be searched for bibliographic lists (e.g., Clinical Practice Research Datalink, www.cprd.com). Furthermore, relevant grey literature will be examined through Open Grey (http://www.opengrey.eu/).
Studies which are eligible for inclusion will then have their bibliographies searched, for additional relevant studies. Content experts may also be contacted for information about other potential ongoing or unpublished studies.
Selection Process
Once the searches in the databases listed above have been undertaken, two reviewers will independently screen the studies by title and abstracts. Studies that appear to meet the inclusion criteria will then have their full texts assessed for eligibility. The reviewers will subsequently determine whether the study can be included in the review. If the reviewers cannot come to an agreement on a study, a third reviewer will be brought in to determine whether the article is eligible for inclusion.
Data Collection
Two authors will extract information from the studies separately, then compare and discuss results. Forms for the data collection will be piloted on two studies. Covidence will be used for the collection of data [32]. Extracted data will include any demographic or socioeconomic descriptive information. The measures of effect for risk factors and their 95% CI, the methodology of studies, number of participants (and controls for case-control studies), and length of follow up (for cohort) will also be extracted.
Outcomes
The primary outcome is the risk of incidence of lung cancer for risk factors (presented as a measure of effect e.g., RR, HR, OR, ICR or SIR) for each risk factor and outcome of incidence, with the 95% CI.
Risk of bias
For case-control and cohort studies the Newcastle Ottawa scale (NOS) will be used to assess risk of bias [33]. The AXIS tool will be used to assess bias on cross-sectional studies [33]. For other studies the CASP checklists will be used [34]. These will be piloted on a select number of papers initially. Records will be kept on decisions made for data extraction.
Two reviewers will implement the risk of bias forms for studies which have met the inclusion criteria after the full-text articles have been examined. Information collected on risk of bias will be synthesised and tabulated. The results will provide information about the quality of evidence for risk factors which will be examined in the discussion.
Data Synthesis
A narrative synthesis will be carried once the data extraction forms have been completed by the reviewers. A summary of included studies will provide information on the authors, study design, number of study participants, how the studies have recorded smoking behaviour (i.e., smoking status, pack years, duration of smoking etc.), and the measures of effect for the pre-existing data features identified in studies. It is expected there will be some clinical heterogeneity between studies so limitations of the studies will be recorded, extracted and discussed in the paper. Funnel plots will be used to investigate potential publication bias.
Additionally, if the literature supports a statistical combination of results, sensitivity analysis will be performed to assess the included studies, in terms of comparable quantitative information.