Leading on from this, literature searches for systematic reviews and primary research were conducted through the following databases: Cochrane library, MEDLINE, EMBASE and PubMed. The search criteria for these searches are shown in supplementary tables 1 and 2, with inclusion and exclusion criteria also displayed in supplementary tables 3 and 4. Consequently, systematic reviews and primary research was selected for analysis.
Table 1
Population, Intervention, Comparator and Outcome (PICO) framework used to generate review question
| | Description |
Population | | Children with Grave’s disease undergoing 1st line treatment |
Intervention | | Radioactive iodine |
Comparator | | Antithyroid drugs |
Outcome | | Remission (number whose thyroid returns to normal function) |
Guideline
The search of the NICE database and RCP (Royal College of Physicians) yielded thirty-two
documents of which NG145 - Thyroid Disease: Assessment and Management was selected [4]. It is a detailed generalised guideline with a vast amount of information regarding the diagnosis and treatment of thyroid disorders and disease.
Guideline Appraisal
Four independent assessors used the AGREE-II tool to critically appraisal the NICE guideline as this is the optimal number of assessors [13]. AGREE-II was used because it is routinely utilised for appraising guidelines [14]. The scores attributed to each of the six domains of AGREE-II are detailed in the appendices.
Scope and Purpose
The scope is clearly defined [15]. There is a clear population to which the guidance can be applied, as well as having a setting of NHS-funded healthcare providers. Outcomes were concisely expressed and well-written. There is also discussion of available treatments and associated comparators. A range of conditions are explored in different age groups of the population, while also excluding certain populations, e.g. neonates. Additionally, there is advice as to who the guideline is intended to be used for.
Stakeholder involvement
There were varying levels of stakeholder involvement from the multidisciplinary team to wider engagement [16]. Each of them had their name, discipline, location and institution denoted [17]. The process is robust where the team produce the guidance, which is then opened for stakeholder comment.
Rigour of development: This guideline was rigorously developed. A comprehensive search strategy was employed, and the relevant databases were searched. It was decided to use specific MeSH (Medical Subheading) terms and timeframes. Only papers written in English were reviewed which were cross-checked via reference reading of highly relevant papers. Predefined inclusion and exclusion criteria were stipulated, and studies assessed methodologically using the appropriate tool: GRADE (+ GRADE CERQual), QUIPS and CASP [18]. Recommendations were developed in an orderly fashion by the committee from the available evidence. It is said they considered the benefits, harms and cost of each course of action. They state the guideline is subject to being updated if needed.
Clarity of presentation
Presentation was succinct and not ambiguous. Relevant interventions are explored and targeted towards the populations in question. They make use of caveats to describe contraindications in certain situations that may differ from normal management. Separately, recommendations are summarised in a section and grouped together with key further questions and reviews.
Applicability
There was a brief mention of barriers and concerns raised by stakeholders, however it was concluded that it was not significantly more difficult to access services depending on which group you were in. A generic auditing tool and implementation support is provided by NICE itself. Summaries and the NICE pathway showed other simplified versions of the guideline [19]. Health economists played a part in the development by carrying out a cost-consequence analysis [20]. Systematic checking of the model calculations by two specialists occurred.
Editorial Independence
While assessing editorial independence it was apparent that NICE funded the National Guideline Centre (NGC) to produce this guideline [18]. Moreover, it was hosted by the Royal College of Physicians (RCP). It is unlikely that their views will have influenced the guideline as they are separate independent bodies. Positively, the interests of all committee members were publicly declared. To finalise the four independent authors saw this guideline as a rather distinguished piece of work with only minor flaws and therefore would recommend it.
Systematic Review Search
The systematic review search was carried out independently by three separate authors (AA,
BKSS and AS) using the electronic databases EMBASE, MEDLINE, PubMed, and the
Cochrane Library.
‘Paediatric Graves’ Disease’ was used as a MeSH term to yield a total of 22 systematic reviews, as shown in Fig. 2. Filter included ‘systematic review’ and a publication date after November 2019, as the relevant guideline (NG145) was published at this date. Eligibility of the two systematic reviews selected was determined by collaborative analysis and comparison of abstracts between AA, BKSS and AS to assess whether the studies fulfilled the pre-defined inclusion and exclusion criteria (Table 5). The final two systematic reviews chosen each look at one of the possible treatments used for PGD (ATD and RAI). All reviews included explored efficacy and adverse events.
GD is rarer in children and adolescents compared to adults, thus reviews exclusively containing randomised trials solely within the paediatric population are limited in number. This, along with the obvious ethical issues involved in paediatric treatment meant we included a combination of randomised and non-randomised studies, with the majority being observational. This enabled us to look at the duration of treatment, dosage differences and side effects in our outcome measure. The systematic reviews were appraised using the AMSTAR-2 tool, with the review search authors (AA, BKSS, AS) working alongside authors (RO, ASS) [21]. Each pairs’ findings were presented to the remaining authors to discuss any disparities, drawing conclusions regarding any biases which may have affected the credibility of results.
Inclusion Criteria & Exclusion Criteria
We chose an age range of 0–20 years to yield the greatest amount of studies, but a point worth noting is the majority of reviews selected were with populations less than eighteen years of age. Reviews that had varying methods of administering treatments were also included to increase the number of studies available for analysis.
To define a less developed country we used the list of nations that are part of the G20 summit [22]. One exception was made if Grave’s was relatively more prevalent in the region or if the research was conducted in a tertiary centre, resulting in publication in a European or American Journal. The above points also apply to the inclusion and exclusion criteria for the selection of primary studies.
Primary Study Search
A search was undertaken to identify pertinent primary literature published after August 2020, as data published after this was not included in the systematic reviews. Authors AA, BKSS and AS conducted searches via the databases MEDLINE, EMBASE, Cochrane Library and
PubMed.
A combination of MeSH terms were used in order to procure a large number of studies. This yielded thirty-four studies after duplicates from the combinations of search terms were removed and the titles were screened. Further studies were then removed based on our inclusion and exclusion criteria, resulting in the selection of four studies. Figure 3 below details the search strategy. In addition, we prioritised studies that evaluated both efficacy and adverse events.
As shown four retrospective cohorts were chosen. This study design allowed for dosage variations to be investigated, rather than just unexposed versus exposed participants. Each study was independently appraised using the CASP framework by the five authors (AA, BKSS, AS, ASS, RO) [23]. Views of the five authors regarding the primary literature were explored collectively, allowing for the assembly of the results into one fluent appraisal. Authors agreed regarding the inclusion of selected studies, and which provided the strongest evidence.
Review Findings
Our searches yielded one guideline, two systematic reviews and four primary studies (four cohort) which met our strict predefined inclusion criteria shown in supplementary Figs. 1 & 2 .
Systematic Review Results
A summary of the results of both systematic reviews are shown below in Table 2.
Table 2
Results of systematic reviews
Study Reference | Number of Study Participants | Cohort Studies included | RCTs included | Overall Findings |
Lutterman et al (24) 2021 | 1,283 | 23 | 0 | Treating patients with 11-15MBq iodine-131 per gram of thyroid tissue is an effective treatment option when aiming to achieve hypothyroidism. Efficacy seems to increase with dosage and activity of RAI. Short-term & long-term side effects are a rare occurrence in radioactive iodine treatment. |
JM van Lieshout et al (25) 2021 | 3,057 | 24 | 5 | Intention to treat analysis (ITTA) showed an overall remission rate of 28.8% in methimazole treated patients. Going up to 75% as treatment duration rises to 9 years • Occurrence of adverse effects: 17.6% • Occurrence of major side effects: 1.1%• |
Systematic Review Appraisal
No systematic reviews that directly met our inclusion and exclusion criteria compared ATDs and RAI in terms of efficacy and AEs. Therefore, we have included two systematic reviews, with each solely assessing one treatment. AMSTAR2 reported a critically low and low rating for the systematic reviews included, which is largely due to the absence of RCTs, highlighting the need for higher quality evidence in this field. Further reasoning for the scores of each review can be attributed to the fact that they both largely evaluated retrospective cohort studies, which are of lower quality of evidence compared to prospective studies. A positive of the reviews is that they both state they are the most extensive and up to date reviews in this field, thus making their findings more reliable. Van Lieshout et al. was rated as the highest quality of reviews as it accounted for the impact of heterogeneity, coupled with the fact that it included the largest number of participants [25].
Both systematic reviews detailed a comprehensive set of inclusion/exclusion criteria. Study selection and data extraction were performed in duplicate, minimising the possibility of bias. Lutterman et al. included studies published in languages other than English, allowing for a more expansive literature search [24]. Whereas Van Lieshout et al., excluded studies not published in English [25].
Lutterman et al. and Van Lieshout et al. both excluded studies at a high/moderate risk of
bias via use of the CASP checklist [24, 25]. Both reviews included summary tables detailing study characteristics. However, neither accounted for the impact of confounding factors, such as age and ethnicity. The presence of confounding factors could influence remission rates, affecting the generalisability of review findings, and therefore reducing validity of the results.
In assessing the efficacy of the treatment options, only Lutterman et al. incorporated studies which assessed efficacy based on different treatment outcomes, namely euthyroidism & hypothyroidism [24]. This inhibited merging of the results, so a pooled estimate of effect was not able to be determined. In contrast, the Van Lieshout et al. review standardised data by recalculating remission rates using ITTA to overcome heterogeneity and hence, provided a pooled estimate of effect [25].
Study | Advantages | Disadvantages | Percentage (%) of criteria met on AMSTAR 2 |
Lutterman et al (24) 2021 | • Most in depth systematic review of the effectiveness and occurrence of adverse effects in PGD to date • Study selection was performed independently and blinded to reduce reporting bias • Data extraction was performed in duplicate • Justification of exclusion criteria provided • Use of the Critical Appraisal Skills Program (CASP) checklist in critical appraisal of cohort studies | • No randomised control trials included • Unable to determine effect of confounders due to lack of bias assessment tool • No meta-analysis of results included and therefore unable to determine heterogeneity through statistical analysis • Unaware of the impact of heterogeneity on the results • End points for efficacy were not consistently defined, thus it is difficult to compare included studies | Critically Low |
J M van Lieshout et al (25) 2021 | • Largest study design with highest number of participant population (n = 3,057) • Data was standardised via recalculating remission rates using ITTA to overcome heterogeneity • Only studies assessed to be at a low risk of bias (determined via use of the CASP checklist) were included | • 82.6% of included study participants were female meaning results may not be generalisable to a typical hospital setting, where proportions of males: females may differ) • Confounding factors, such as study participant characteristics were not included for all cohort studies affecting the generalisability of the systematic review results | Low |
Table 3
Appraisal of systematic reviews
Primary Study Results
A summary of the results of the four primary study results are shown below in Table 4.
Study Reference | Study Design | Study participants | Type of Therapy | Results for effectiveness | Results for adverse events |
Namwongprom et al (26) 2021 | Retrospective Cohort | 32 | Radioiodine: 24hr I-131 | Hypothyroidism achieved 3–6 months after treatment in 65.6% of participants after single dose I-131 | NA |
H Lee et al (27) 2021 | Retrospective cohort | 99 | Methimazole (MMI) | Free thyroxine levels returned to normal after a mean time of: • 5.64 weeks for an initial dose of < 0.4mg/kg/day • 8.61 weeks for an initial dose of < 0.4-0.7mg/kg/day • 7.98 weeks for an initial for an initial dose of > 0.7mg/kg/day | Liver dysfunction P value = 0.034 Neutropenia P value = 0.015 |
Song et al (28) 2021 | Retrospective cohort | 195 | Methimazole (MMI) or Propylthiouracil (PTU) | More than six months of euthyroid status after terminating ATD treatment was defined as achieving remission. Cumulative remission rates: • Within 1 year of starting ATD = 3.3% • Within 3 years of starting ATD = 19.6% • Within 5 years of starting ATD = 34.1% • Within 7 years of starting ATD = 43.5% • Within 10 years of starting ATD = 50.6% | Total adverse events = 13.3% Most common: • Rash = 5.6% • Abnormal CBC = 2.6% (neutropenia) • Abnormal LFTs = 2.1% (liver dysfunction) |
Mizokami et al (29) 2020 | Retrospective cohort | 111 | Radioiodine: I-131 | Outcomes of thyroid levels: • Overt hypothyroidism = 91% • Subclinical hypothyroidism = 2% • Euthyroidism = 5% • Subclinical hyperthyroidism = 2% | Adverse events reported included: • Thyroid cysts = 4.27% • Iso-or hypo-echoic solid nodule(s) = 7.69% • 17.5% of patients followed up for 10 years or more developed newly detected solid thyroid nodules |
Table 4
Result of primary studies
Primary Study Appraisal
Baseline Characteristics
Nawongprom et al., Lee et al. and Song et al. included similar baseline characteristics such as age, gender and family history of thyroid disease, with each of the studies including further characteristics [26–28]. Mizokami et al., on the other hand, does not report baseline characteristics [29]. This introduces the possibility of confounding to a greater extent than the other studies as multiple factors are not accounted for in the statistical analysis.
Namwongprom et al., Lee et al. and Song et al. studies present continuous data as mean ± SD, and categorical data as a % [26–28]. At baseline, only Lee et al. separated subjects into different arms (this was done based on their severity of Graves’ disease and the resultant ATD dose received) [27]. Therefore, p-values were given to inform on statistical differences in baseline characteristics between each group, however, this was not possible in the other primary studies. Namwongprom et al. stated no statistically significant differences in baseline characteristics between participants, contributing to the external validity of this study [26].
The absence of information on ethnicity is a key point that prevents true generalisability to the general population, as Ehrhart et al. suggested in a 2018 article [30]. Ethnicity is not reported in any study, and as they are all conducted in East Asian countries, it can be problematic to assume these results will hold true in the UK, further hindering the strength of its evidence towards its application in the NICE guideline.
Study Design
None of the studies were of a particularly large scale, however relatively, Namwongprom et al. included the lowest number of subjects (32), with Song et al. using the largest number of subjects (195). To lower the sampling variability, a larger sample size would be required, which would contribute to more precise estimates of treatment effect.
All studies included follow-up tests and examinations, to assess both the efficacy and safety of each treatment. Namwongprom et al. assessed thyroid levels in patients at 6 months whereas Song et al. exhibited a mean duration of follow-up of 5.9 ± 3.8 years, testing various biochemical markers such as thyroid function tests and complete blood count. The follow-up periods in Lee et al. and Mizokami et al. were 2 years and a median of 95 months respectively. Such an extended follow-up period increases the reliability and validity of the results. However, some cases in Mizokami et al. were only followed-up for a minimum period of 4 months, ranging to a maximum of 226 months. The major advantages of using longer follow-up periods are that they are more likely to detect any declines in the efficacy of a treatment, showing any possible relapses, and detect any adverse events which may not present until a later date.
Outcomes
Namwongprom et al., Lee et al. and Song et al. measured the time taken for free T4 levels to normalise, with Song et al. also looking for the presence of goitre and ophthalmopathy. Mizokami et al. used ultrasound imaging to determine thyroid status using the volumetric ellipsoid method.
A major limitation seen in many retrospective cohort studies is a high loss to follow-up. The
loss to follow up can significantly reduce the validity of the results if the loss to follow up ratio is > 30%. Mizokami et al. and Song et al. reported a lost to follow up of less than 30%, suggesting the validity of their results will not be affected by lost to follow up. A drawback of Namwongprom et al study is the omission of a lost to follow up ratio being reported, meaning this study could be prone to bias. Interestingly, Lee et al. decided to remove loss to follow up from their statistical analysis, utilising a so-called ‘per-protocol’ approach. This approach can be problematic as it leads to the possibility of attrition bias, reducing the external validity of the study.
Study Reference | Advantages | Disadvantages |
Namwongprom et al (26) 2021 | • There are no significant differences in the baseline characteristics of patients recruited (excluding 24 hour I-131 uptake and RI status), meaning the effects of confounding factors are minimised • Appropriate comparison of data using statistical analysis tests (Sample T Test and Kruskal-Wallis H test) • Clearly defined outcome | • Small sample size of 32 patients • All patients were recruited from a single institute and are therefore likely to be susceptible to selection bias • Study design is retrospective and is thus susceptible • No statistical significance or confidence intervals are reported for any outcomes • All cases had previously taken ATD which may affect the outcome of radioactive iodine therapy |
H Lee et al (27) 2021 | • The efficacy and adverse effects of a wide variety of ATD dosages were explored • Relationship between dose of ATD and frequency of adverse events are reported along with statistical significance • Clearly defined outcome • Continuous variable differences were compared through the use of the post hoc Tukey tests | • Small sample size of 99 patients • Non-randomised allocation of patients, introducing the effect of confounding factors • Cases were obtained from only one hospital which may not be representative of the whole population, introducing selection bias |
Song et al (28) 2021 | • The largest sample size of 195 of all primary studies included • Clearly defined exclusion criteria • A lengthy mean treatment duration of 4.7 +/- 3.4 years • Clearly defined definition of remission • Use of Cox regression model to adjust for confounding at analysis stage | • Over the study period, 28.3% of patients were lost to follow-up • Predictors of remission could not be recognised due to inadequate sample size |
Mizokami et al (29) 2020 | • Median follow-up period of 95 months • Statistical analyses performed using the Mann- Whitney U test or Pearson’s chi square test • Accurate assessment of thyroid volumes with the use of ultrasonography | • Small sample size of 117 patients, with concurrent loss to follow-up of 25.6% of patients • Only a maximal limit of 13.5 mCi of I-131 can be administered to outpatients in Japan, which means activity of I-131 uptake in patients with large goitres is difficult to interpret. • Variations of TSH levels are likely to have affected thyroid volumes • Limited knowledge of the effect of confounding variables due to an absence of a baseline characteristics table |
Table 5
Appraisal of primary studie