Information about the ethnicity and socioeconomic status of participants in clinical research is needed for the interpretation, generalisability and pooling of data as well as to inform discussion around health inequalities. The relevance of ethnicity and socio-economic status to health and biomedical research is well established but has been emphasised by the COVID-19 pandemic, during which specific ethnic groups and poorer individuals have been disproportionately affected1. The causal pathways driving health disparities are complex and multifactorial, however under-reporting of participant characteristics has been identified as a potential contributory factor2-4.
The International Committee of Medical Journal Editors (ICMJE) recommendations5, and some journal instructions to authors promote inclusion of these data6,7. Previous studies have identified that reporting is frequently incomplete with limited progress made over the last three decades 8-13. Recent years have seen an increased focus on ethnicity and socioeconomic status in medicine, however there is a lack of research as to whether this has resulted in better reporting.
To evaluate the current situation in this area, we assessed the frequency of reporting of ethnicity (or ‘race’) and socioeconomic status indicators in a sample of research articles published in high impact general medical journals in Spring 2021.
We identified the 10 highest ranked journals as per Google scholar ‘Health and Medical (general)’ category up to April 2021. At the time of data collection these were The New England Journal of Medicine (NEJM), The Lancet, the Journal of the American Medical Association (JAMA), Proceedings of the National Academy of Sciences of the United States of America (PNAS), Nature Medicine, Public Library of Science One (PLOS One), The British Medical Journal (BMJ), Cochrane, Cell Metabolism, and Science Translational Medicine. PNAS and PLOS One include a wide range of subject areas therefore the subsections ‘Biological Sciences, Medical Science’ and ‘Clinical Medicine’ were used respectively. From each of these 10 journals, we selected the 10 most recent journal articles that report participant level data. Laboratory studies using human derived tissues or cells were included if donor information was provided. Journal reporting guidance and requirements were also assessed by evaluating author guidelines, websites, and contacting the respective editorial/publishing teams. Data were collected on which participant level characteristics were reported and how. Data collection and analysis was conducted by SCB, KEJP, SMA and PW. All papers were reviewed independently by at least two researchers.
Ethnicity and race are related yet different constructs and arguably the latter term should be abandoned14. However, given the frequent lack of standardisation in the literature and that the terms are in practice often used interchangeably we accepted the use of either term. Similarly, regarding reporting of socioeconomic status indicators, various often inconsistent methods are used, therefore we opted to assess both direct measures such as the Index of Multiple Deprivation, but also measures from which socioeconomic status could be inferred such as educational attainment and job role. The focus being if, rather than how, such measures are reported.
650 publications were assessed to identify 100 meeting inclusion criteria (figure 1). Of one hundred research articles included, 35 reported ethnicity (or race) and 13 reported socioeconomic status. By contrast, 99 reported age, and 97 reported sex or gender (Table1). Among the articles not reporting ethnicity only 3 (5%) highlighted this as a limitation, and only 6 (7%) highlighted where socioeconomic status data were missing. Median number of articles reporting ethnicity per journal was 2.5/10 (range 0/10 (PLOS One), to 9/10 (JAMA)). Only 2 journals explicitly requested reporting of participant ethnicity (or race), and 1 requested socioeconomic status. Types of research included – interventional studies (n=30), cohort studies (n=35), case-control studies (n=3), systematic reviews and metanalyses (n=16), epidemiological and surveys (n=3), and other (n=13). Twenty of the 100 were laboratory studies (either observational or involving interventional manipulation of samples) using human samples, of which 4 reported ethnicities of sample donors (of others, none mentioned as a limitation), and none reported socioeconomic status.
Among the 24 papers describing clinical trials, 50% reported ethnicity, with none highlighting the absence of these data as a limitation. 12.5% of trials reported an indicator of socioeconomic status, with one of the 21 not reporting socioeconomic status highlighting this absence as a limitation.
Of note, two of the research articles included in our sample identified ethnicity as being relevant to their research topic, yet did not provide relevant data on their study participants or highlight the lack of this data as a limitation of their study ‘in the case of DNA-based mutation testing, poor sensitivity in detecting mutations in infants from ethnic and racial minority groups'’15, and ‘peripheral oxygen saturation can substantially differ from the SaO2 under certain conditions and may be less accurate in Black patients than in White patients.’16.
Figure 1: Consort of included/excluded articles (see end of manuscript)
Table 1: Reporting of race or ethnicity and socioeconomic status (see end of manuscript)
The majority of research published in high-impact medical journals does not include data on the ethnicity and socioeconomic status of participants, and this omission is rarely acknowledged as a limitation. This finding echoes related historical research,8-13 but its persistence is of concern and is surprising given current awareness of such issues17,18.
These findings have important implications for the interpretation and application of research findings, both within academia and beyond, with the ongoing omission no longer justifiable as simple oversight. As highlighted by Baker et al.19 in relation to data relating to LGBTQI+ communities, but equally relevant here, ‘Data are fundamentally political: decisions about which data are collected and which are overlooked both reflect and shape policy and program priorities.’
Our results could have multiple contributory factors. For some research including secondary data analyses, ethnicity and socioeconomic status data may not have been available to the researchers, but given the lack of explanation, it remains unclear if these data were unavailable, or available but not included in publications. The low level of reporting in controlled clinical trials suggests issues beyond unavailability of data, as in these studies such data would be simple to collect. Additionally, given research successfully reporting these data, the justification for these omissions remains unexplained.
The increased frequency of reporting ethnicity compared to socioeconomic status, may indicate differences between the perceived relevance of these variables. This would be in keeping journal author guidelines and ICMJE recommendations that encourage the inclusion of relevant demographic variables to ensure representative samples5, more often explicitly stating race and/or ethnicity, than socioeconomic status. The relevance of these factors may not have been apparent to authors and editorial teams, however ICMJE Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly work in Medical Journals5 states ‘Because the relevance of such variables as age, sex, or ethnicity is not always known at the time of study design, researchers should aim for inclusion of representative populations into all study types and at a minimum provide descriptive data for these and other relevant demographic variables.’. Of note, not all of the journals in our sample state that they follow the ICMJE recommendations20. However, whether or not the journal states they follow guidance or not, this has no impact upon the relevance of these data and the importance of reporting them. Additionally, Maduka et al21 found no difference between journals stating they follow ICMJE recommendations, and those that do not, in the frequency of reporting race and ethnicity in a sample of surgical research publications in 2019.
Certain considerations require highlighting. Firstly, different approaches to selecting research papers may alter findings. Secondly, we identified high-impact journals using the google scholar H5 index but acknowledge various other equally valid methods exist. Thirdly our analysis focused on if ethnicity and/or race was reported, but we acknowledge that these are not synonymous terms. In addition to if these variables are reported, how they are reported is also an important area for discussion and research. The widespread omissions identified by this research suggests a structural problem. Indeed, we the authors have published research which would have met the inclusion criteria and failed to report these specific characteristics. Our intention is to highlight an issue and suggest approaches to address it.
Given that inadequate reporting persists despite research highlighting the issue, author and ICMJE recommendations, and the current socio-political climate, there is a clear need for more explicit requirements that are adhered to in practice. This is likely best achieved if steps are integrated into each stage of the research process, from protocol to publication. For example, Fain et al22 compared reporting of race and ethnicity on ClinicalTrials.gov before and after the requirement to report these data (if collected), was introduced, finding that this was associated with an increase from 42% to 92%. Similar explicit requirements could be taken in EQUATOR guidelines23, and research ethics applications. From our sample, the journal JAMA had the most explicit guidance for reporting race and ethnicity, and this variable was reported in 9/10 of the articles we reviewed. Of note from 2022 the New England Journal of Medicine will be requiring authors of research articles to provide data on the representativeness of the sample including race or ethnic group24, though it is unclear if socioeconomic status indicators will also be required.
The reporting of ethnicity and socioeconomic status in high-impact medical research remains poor, despite a consensus on its importance. Omission of these participant characteristics limits the interpretation, generalisability, and pooling of data, that are required to facilitated informed discussion around health inequalities. Guidance and encouragement have so far proven insufficient to change practice in this area. Standardised, explicit, minimum standards are required.