From our search, we identified several registries with published outcomes as summarised in Table 3. Most of these registries included both patient and investigator-derived data.
Designing a registry for rare liver diseases
The design process can be divided into three stages; the theoretical, technical and maintenance phases(Table 4).
Aims and objectives
The theoretical phase begins by identifying the unmet needs the registry is expected to address. The goal of the registry should be defined early to inform the design process and outcome measures. A well-designed registry should be able to translate a clinical/academic question into measurable exposures and outcomes. Setting aims and objectives should take into account existing registries to avoid duplication. Many registries may have more than one purpose or rationale(21). For example, the main aim of the EuroWilson registry was to assess the feasibility of conducting randomised controlled trials for the treatment of Wilson’s disease, while the UK-PBC registry aims to identify PBC patients including those not responding to ursodeoxycholic acid (UDCA), elucidate the molecular mechanisms that govern non-response and strengthen relationships between clinicians, the NHS, patients and industry. As registries develop, refinement of objectives could be considered to cover newly identified knowledge-gaps.
The commonest aims and objectives used in registry design for rare liver diseases are listed in Table 5.
Define target population & observation period
The definition of the target population will determine which patients are eligible for inclusion into the registry. Having very extensive and strict inclusion/exclusion criteria for registries may miss patients. It is generally preferred to have broader criteria in order to be able to redefine registry entries to identify previously unrecognised unmet needs and to accommodate any change in diagnostic criteria that can occur following improved understanding of the studied disease.
Part of this process involves consideration of whether data should be collected from deceased and paediatric patients as this will influence governance approvals required. Existing epidemiological data e.g. incidence and prevalence will help identify the expected numbers of cases for the registry, hence, guide decisions around planning, costing, IT infrastructure and workforce. If the registry is designed as part of a clinical trial, inclusion and exclusion criteria should be stated in the protocol. For example, D’Angolo et al used a cut-off of 20 liver cysts for inclusion into their polycystic liver disease registry study whilst the UK-AIH registry study excludes all patients who have HIV(22).
The duration of observation should also be considered at this stage as it can influence the design especially if the registry is set up as a clinical study. For example, a registry, which captures cross-sectional data on patients who received a liver transplantation, will be inherently different to a longitudinal registry study exploring the natural history and future outcomes of female patients who developed acute fatty liver of pregnancy.
Information, Research & Clinical governance
The design protocols for disease registries should include sections describing both the lawful basis of data collection for the registry as well as the process of accessing and extracting data from the database. Different countries have different bases for collecting data. For example, in England, the National Disease Registration Service collects rare disease patient data without consent under Section 251 of the NHS Act 2006 and the authority of the Health Service (Control of Patient Information) Regulations 2002 and in compliance with the General Data Protection Regulation (GDPR). Any data input/output should be governed by the principles that apply to all other health and social care research of the area(s) where the registry will be active (Table 4-theoretical phase). Any governance issues regarding the data sharing across countries should be identified early and discussed with the health research authority of each country. The scope of consent from patients should also be included in the registry protocol and laid out in detail in the research application if one is required. Consent should be such that it covers future relevant linkage opportunities or is set up as a reconsented model. Specific approaches to data including pseudonymisation/anonymisation of records should be considered to protect patient identity, especially when extensive demographic data are not required to achieve the endpoints of the registry.
We recognise that this is particularly important for inherited rare diseases where there are data concerns relating to more than one member of the family. Oversight/steering committees, stakeholders and registry sponsors/funders, external experts and team-members with the relevant skillsets can all assist with adherence to governance policies and registry protocol.
Sponsorship and funding
The sustainability and efficient running of a registry are reliant on sufficient funding and sponsorship. It is therefore important to pilot a smaller-scale feasibility registry with fewer reporting sites. Registry funding can be sourced from various bodies including government organisations, non-profit disease foundations, patient groups, charitable foundations, private funds from philanthropists, industry and professional societies(18).
The European commission receives regular applications for funding support for rare diseases and its third Health Programme (covering the period from 2014-2020) offers support to the setting up rare disease registries as part of its operating framework(23). It is desirable for applications to consider collaborative efforts and ways of maintaining these and also to align with the recommendations of the High Level pharmaceutical forum for better access to orphan medicines(24). The European Association for the Study of the liver (EASL) also provides registry funding for liver diseases through its EASL registry data collection grant scheme. To date, it has awarded grants for the development of several orphan hepatic disease registries(25). Other organisations which also accept applications for registry funding include the UK’s medical research council (MRC) and the UK National institute for Health Research (NIHR).
Establish the registry team
The workforce required for designing, running and maintaining the registry should be defined early on based on the objectives, size and funding of the registry. A multidisciplinary team approach is key to successful implementation and ongoing success of any registry.
The chief/principle investigator should have a continuous oversight of the process and work closely with project management to set realistic and achievable targets. Project managers with financial experience as well leadership and organisational skills are important for liaising with funding bodies and sponsors. A core requirement is for legal and information-governance expertise as well as a strong grounding in epidemiology, medical statistics and population-based studies is also extremely important. Registry operators with particular roles in data liaison should be considered in order to ensure the effective negotiation of data from data providers. Data entry should be undertaken by team members familiar with the core and desired datasets (see below section on data management system) to guarantee that minimum standards for quality assurance are met.
Identify stakeholders and set up wider collaboration
A registry may have one or more stakeholders which are people or organisations who have an interest in the research question the registry is trying to address. Stakeholders can be either primary or secondary(18). A primary stakeholder is responsible for the logistics of setting up the registry while a secondary stakeholder is identified as the party who will benefit from the data and the answers to the clinical endpoints of the registry. Commonest stakeholders include, clinicians, researchers, academic institutions, patients, public, community leaders, policy makers, professional societies, regulatory agencies and industry partners.
The importance of collaborations in the field of rare diseases where data is scarce and fragmented has already been highlighted. For example, in the EU, the OrphanXchange project has been set up by Orphanet in order to promote collaboration between academia and industry and was funded by EU’s FP6 (6th framework program). The establishment of registries sits deeply in the core of the project. Examples of successful large-scale registry collaborations on rare liver diseases include the international PSC registry, the registry on Alpha1-antitrypsin deficiency, the European registry for liver disease in pregnancy and the European repository of patients with IgG4-related disease all of which have been funded through the registry programme of the EASL(25).
Moreover, the EU Committee of Experts on Rare Diseases (EUCERD) was set up and ran between 2010-2013 with the purpose of encouraging the exchange of relevant experience, policies and practices in rare diseases among member states. By the end of its tenure, the committee had promoted cooperation across European countries as well as other countries with interest in rare diseases such as groups from Japan. Conclusions from this work include recognition that many data repositories are academic and that many rare diseases have more than one registry whilst others have none.
This has facilitated drafting a consensus across 6 domains namely international operability, sources of data, collection of data, good practices, use of data for regulatory purposes, and sustainability(Table 6)(26).
Registry design, data management and data quality
Recently, there have been significant efforts by the European Commission to address challenges in setting up registries for rare diseases especially around technical and regulatory matters including clinical and research governance. The EPIRARE(European Platform for Rare Disease Registries) project was funded to serve exactly this purpose when it was set up in 2013. More specifically, its 4 objectives are to:
- Define the needs of the EU registries and databases on rare diseases
- Identify key issues to prepare a legal basis
- Agree on a Common data set and elaborate procedures for quality control
- Agree on the Registry and Platform Scope, Governance and long-term sustainability
Though EPIRARE has provided the fundamental steps in setting the foundations for rare disease registries design, a comprehensive blueprint is not available.
Defining unmet needs
Timely definition of unmet needs and clarity of objectives and endpoints of the registry will guide the choice of data which will be classified as mandatory or core i.e. minimum dataset.
The importance of avoiding duplication of effort and developing wider collaborations has already been discussed. These are data which will address the critical questions which the registry is setting out to answer. Additional variables i.e. desired or non-core can be included which should also align with the overall objectives of the registry.
For example, the European liver transplant registry(ELTR) has been collecting prospective transplant data on patients with polycystic liver disease(PLD) including demographics, symptoms, disease complications, laboratory results, prior therapy, liver transplant complications, and patient/ graft outcomes(27). However, this registry was not designed to collect retrospective long-term data in order to address research questions around the natural history of PKD, quality of life, disease prognostication and patient risk-stratification. Drenth et al have set up an international registry of patients with PLD to serve exactly this gap not covered by ELTR (22, 28).
Data management system
Data management systems (DMS) serving registries for rare diseases must be dynamic, integrative, extendable, customisable and intuitive in order to serve the designed purpose and objectives. The choice of DMS depends on expertise and available funding. Ideally, the DMS should be able to derive data automatically from electronic patient records as soon as a patient is registered into the database. However, whilst this may be a desirable functionality for local registries, it may be not be feasible for regional, national and international registries for many reasons including, heterogeneous data coding, multitude of patient workflow products, differences in ethics committee (or Institutional Review Boards equivalent in North America) standards across regions/borders, differences in local security protocols and the lack of electronic patient records. Therefore, the most resource-efficient way of achieving a common and shared data exchange could be a web-based model which can provide various database access levels. For example, the European Network for the Study of Cholangiocarcinoma (ENS-CCA) have achieved this by are using an established platform called REDcap (Research Electronic Data capture) to bring together data from 33 groups from 12 European countries, while other consortia such as UK-AIH use bespoke software solutions.
The role of a common shared platform that allows remote access and remote data entry for a rare liver disease is fundamental as it can shape the quality and integrity of data and format data in a standardised fashion. Once data fields are defined (e.g., mandatory vs. desired and content) data validation should be introduced at various checkpoints e.g. alpha vs. alphanumeric fields. Data validation is useful in ensuring that variables followed the expected format and prompting users to input missing data. Examples include typing extra decimal points and differences in lab units micromolsL-1 vs. mg/dl. As part of the registry’s maintenance processes, regular data cleaning should also be undertaken for problems that might not be addressed by validation, such as logical inconsistencies. Data validation/integrity can be further improved through a multi-source approach to registration. This means that data should be collected from various independent sources when possible. An example is the UK’s National Congenital Anomaly and Rare Disease Registration Service (NCARDRS) which collates, validates and registers data from various sources at local, regional and international level at various stages of the patients’ journey. This approach enables NCARDRS to achieve the highest possible ascertainment and completeness of cases in the population(29).
Whilst having appropriately trained team members in data-entry, the use of a data dictionary is paramount in order to define each variable collected. This will clearly define the data terminology e.g. clarifying “negative” as test negative or test not done. Quick and effective data management can also be facilitated by having an intuitive and user-friendly DMS. Data quality is proportional to how straightforward the interface is. Whilst the use of case report forms-CRF(or eCRFs) has been necessary to standardise the data collected from each patient, this is less relevant as data is mined from large healthcare data sets and EPRs. Data liaison resource is required to assure that the data to the registry is mapped appropriately to the variables so that the incoming data meets the standardisation criteria.
Disease registries for rare diseases should be expandable and customisable to allow data linkage from different sources such as primary care, by providing options for integration with their databases(30). This is particularly important as many patients with rare liver diseases remain in primary care undiagnosed for a long time before they are referred for hepatology advice(31).
Data linkage with primary/tertiary care will allow a better understanding of the natural history of the rare liver disease being studied and will aid refinement of locoregional policy and referral pathways. One of the biggest challenges of achieving seamless data linkage is the heterogeneous coding with different classifications including Read codes(versions 2 and 3), Systematized Nomenclature of Medicine - Clinical Terms(SNOMED CT), Online Mendelian Inheritance in Man(OMIM), ICD-10 and Orpha numbers. Routinely collected health data is usually coded in ICD-10 which is not granular enough to identify cases. Notably, many rare diseases do not have an ICD-10 code. With the universal introduction of SNOMED-CT there will be a different problem – with over 1,000,000 codes, one cannot be sure that a comprehensive list of codes has been established so that cases can be retrieved from hospital data. It is not unusual to find rare diseases being misclassified in generalist registries. For example, our observational data identify several issues regarding rare liver diseases such as PBC, which is often coded as secondary biliary cholangitis, and different codes are used between primary and secondary care i.e. Read vs. ICD-10 (31). As part of the European strategy the EU has recommended that member states should ensure correct and traceable coding of rare diseases using the International Classification of Disease (ICD) in European health information systems. The EU is working closely with the WHO to ensure ICD-10 code revisions for rare liver diseases and future incorporation of all rare diseases into ICD-11 (32).Therefore, as data systems migrate towards shared coding schemes e.g. SNOMED-CT, automatic data mapping could be executed by the registry’s DMS, acknowledging the issue of mapping from a less to more granular coding system.
As we move towards in the era of complete online data integration, it is particularly important to design registries for rare liver diseases with full online capabilities and capabilities of direct patient activation. This will allow the collection of patient-reported outcome measures (PROMS) which have been considered in the context of rare diseases by several authors as well as quantitative data such as alcohol consumption and smoking history(33).
Some commercially-available DMS platforms such as Patient Knows Best (PKB) and Evergreen life allow patient-controlled medical records. We are not aware of these being considered for the management of rare liver disease.
Electronic surveys and questionnaires could feed directly into the DMS and patient-entered data could automatically update the data fields in the registry. One example is the UK-PBC 40 questionnaire which has been studied and validated in various settings and languages for PBC(34, 35). Delivering the questionnaire electronically to all PBC patients in the registry will facilitate earlier symptom management and will gauge response to pharmacological therapies. In their landmark study, Carbone et al also successfully collected self-reported data from patients with PBC utilising the PBC-40 questionnaire, the Epworth sleepiness scale, the orthostatic grading scale, the hospital anxiety and depression Scale, and the pruritus visual analogue scale. Patient-self reported data were cross-validated and found to be highly accurate(33). In a similar fashion, the DMS platform can be used to engage with relevant patient groups and charities.
Moreover, data from the DMS can help the development of decision support tools and provide reminders for clinical decision-making e.g. biannual ultrasound and alpha fetoprotein (AFP) screening for hepatocellular carcinoma (HCC) surveillance.
Finally, the DMS should integrate a data export mechanism for getting data out of the registry for research, audit and other purposes. The most popular formats include excel workbooks, comma separated text files, dBase files, XML data source and ODBC data source. The DMS should allow for options to export particular data fields rather than the registry as a whole. Applications for data from the registry should be made to the registry’s project manager and should be discussed with the registry team. Some teams have established a data access review committee to review applications. Applications should be accompanied by a proposal outlining the scope of data use, and where relevant, should be supported by applications to relevant ethics committee (or Institutional Review Boards).
Quality control
The importance of standardisation, careful definition of data fields and field validation, user-friendliness of DMS and intuitiveness of data input has already been discussed above. We have also highlighted the usefulness of data cleaning as part of the registry’s ongoing maintenance. Before the launch of a registry particularly when there is an anticipation of large volume of cases validation is required of existing data sources using data liaison resource i.e. data from a hospital’s EPR and confirming the validity by linking to another data source. For example, a pilot run is recommended where 2-3 independent researchers input data for the same patients into the registry and areas of discrepancy are allowed to surface so that they can be resolved. A double-entry of 5-10% of all patients is considered to be an acceptable quality standard(36). The use of quality committees to audit registry records can also be considered especially for smaller cohorts such as rare liver diseases, but this option might be resource-heavy for larger registries for commoner conditions such as diabetes. Regular internal and external monitoring and auditing processes are required to ensure adherence to quality standards and information governance protocols. Data quality is also improved when the data is used for research and audit purposes. For this reason, it is vital to have feedback systems in place for data users.
Ensuring registry sustainability
Sustainability plans for the registry need to be defined in the early stages of the project design. These plans should be made with the close involvement of project managers, investigators, steering committee and various stakeholders. Sustainability plans will need to involve funders and sponsors as the value of grants and support will dictate the size of the registry and the timescale for data acquisition. Procedures around patient registration, consenting and participant retention (and loss to follow-up) will need to be outlined in the registry protocol. Exit strategies from the registry in the event of funding running out should be clearly discussed. Such strategies may include deletion of the data or discussion with research authorities for ways to contribute the mined data into other existing datasets of similar purpose.
It is not uncommon for incentives, including financial, to be considered for patients. The burden on participants will need to be outlined and weighed up against the benefits, and careful pilot testing of this should be undertaken prior to recruitment.
Whilst some aspects of participation in a registry may be burdensome for patients, benefits may include access to patient forums hosted by the DMS, access to clinical trial information as well as access to useful educational programs and tools. Many registries may be designed in such a way that consent is not required prior to data access. From our own experience, we have been able to set up such registries, however, we have provided our patients with the option of opting out if they did not wish for their data to be included. Ideally, and if patient numbers are manageable, patients should be consented for having their data included in a clinical registry. If the DMS is integrated with patient-controlled medical records, then enrolled participants can have access to results of their investigations and interact directly with their clinicians. The importance of data collected from PROMS on quality of life, clinical outcomes, social function and emotional status has been mentioned earlier. Whilst delivery of data from participants to the DMS can be laborious involving paper forms and multiple clinic visits the same process can become simple and fast in the digital era if participants are given a cross-platform access for data entry or data is extracted from existing patient records. The quality of participant engagement can be improved in some registries by having information or the DMS interface in various languages or culturally appropriate formats.
Feedback
Participant and team-member feedback is very important for the sustainability of registries for rare liver diseases. Engagement is encouraged where the team members reach out to participants with updates, information and newsletters and participants are also given a platform to express their opinions and concerns about the running of the registry. Telephone helplines, online patient forums and feedback to appropriate patient charities can be used to engage those on the registry or those considering registering or opting out. All sources of patient communication should be reviewed regularly by the project team and steering committee in order to improve services and participant experience. Measurable outcomes should be presented regularly at loco-regional, national and international meetings. This engagement will highlight the progressive work of the registry, can improve morale and could impact positively on patient engagement and retention.