2.1 Data sources:
2.1.1 NHS Prescription and practices location datasets:
Monthly national level prescribing datasets for the years 2015 to 2018 were downloaded from NHS Digital[29]. CCG names and codes and CCG geographic boundaries, GP practices geographical location were downloaded from NHS Digital[30] and Office for National Statistics data portal[31] used under the terms of the Open Government Licence. NHS prescription datasets contain dispensed prescriptions from general practitioners and other non-medical prescribers (such as nurses, pharmacists, optometrists, chiropodists and potentially radiographers) but does not cover private prescriptions. Each dataset has more than 10000 rows, with each row representing a prescription, containing information on the dispensed practice code, British National Formulary (BNF) code, BNF Name, number of items prescribed, net ingredient cost, actual cost, quantity, and period as in Table 1.
As we are interested in calculating the total quantity of an individual active pharmaceutical ingredient (API), the whole dataset needs to be evaluated for each API. The columns BNF code and BNF name could help us in the normalisation, as they carry the information regarding the API. The BNF code is a 15-digit code unique for each formulation, dose and product combination. Together with the quantity prescribed, we use this to identify the amount of each API dispensed with each prescription.'
2.1.2 BNF / SNOMED Mapping Data
The BNF code in the 2015 to 2018 annual NHS prescription datasets uses the legacy Master Data Replacement (MDR) drug database[29]. We map these BNF codes to individual API with BNF/SNOMED mapping data (Table 2) provided by the NHS Business Services Authority (BSA) on June 2018[24], BNF/SNOMED mapping data map legacy MDR drug database and the Dictionary of Medicines and Devices (dm+d)[32]. Once we have mapped each BNF in the dataset to its APIs we can evaluate the quantities prescribed in the NHS prescription dataset.
2.1.3 Data aggregation
First step to quantify an individual API from NHS prescription dataset is to map ‘VMPP/AMPP SNOMED’ code in the BNF SNOMED mapping dataset to an individual chemical substance. To achieve this, ‘VMPP/AMPP SNOMED’ code in the BNF SNOMED mapping dataset was matched to the Actual Medicinal Product (AMP) and those without matching AMP were matched to Virtual Medicinal Product (VMP) with help of dm+d files. In some cases, ‘VMPP/AMPP SNOMED’ code was matched to Virtual Medicinal Product Pack (VMPP) or Actual Medicinal Product Pack (AMPP). Once the ‘VMPP/AMPP SNOMED’ codes were mapped to VMP/AMP or VMPP/AMPP, the later helps to match the former to individual chemical substance. This mapping also enables to identify the medicinal form (e.g., tablet, capsule, solution for injection, etc.,), strength with its unit of measurement (e.g., mg, ng, µg, etc.,).
For example, consider amoxicillin 500mg capsules, with our method we have managed to find 50 unique ‘VMPP/AMPP SNOMED’ codes for this and it is matching to 1 unique BNF code, 4 VMPPs, 28 AMPs with 52 AMPPs. For this example, we have identified API as amoxicillin (as amoxicillin trihydrate), medicinal form as capsules, strength as 500 mg and four different pack levels 15 capsules, 28 capsules, 100 capsules and 21 capsules respectively, from 28 different manufacturers.
Our method also helps to differentiate API from the formulations containing more than one API, for example, consider co-amoxiclav 500mg/125mg tablets, with our method we have managed to find 24 unique ‘VMPP/AMPP SNOMED’ codes for this and it is matching to 1 unique BNF code, 2 VMPPs, 23 AMPs with 24 AMPPs. For this example, we have identified APIs as amoxicillin (as amoxicillin trihydrate) and clavulanic acid (as potassium clavulanate) with strength 500 mg and 125 mg respectively, medicinal form as tablet, and two different pack levels 21 tablets and 100 cap tablets respectively, from 26 different manufacturers.
In the second step, the ‘BNF code’ from the dataset generated as above, was matched to the BNF code in the NHS prescription dataset. This enables to identify individual API, along with information on its medicinal form, strength, and its unit of measurement for each row in the NHS prescription dataset.
After the identification of API and its strength, the dataset is grouped by individual API, by GP practice code and ultimately by postcode and the total prescription quantity for individual API was calculated for each GP practice code and postcode. We have used ‘Quantity’ from the NHS prescription dataset to measure total prescription quantity. Postcode and geographical coordinates were linked to the dataset by matching the GP practice code from the dataset downloaded from NHS Digital and Office for National Statistics data portal. The whole process is summarised below in Figure 1. Full methodology for this complete matching process is available in the technical documentation online[33].
2.2 PrAna – an R package implementation
PrAna is an R package providing a comprehensive workflow including data preparation, data conversion, data visualisation and the ability to export/download the generated plots and data, as outlined in the Table 3. The package is now available from https://github.com/PrAnaViz/PrAna under an MIT license and we are in the process of uploading it in the Comprehensive R Archive Network (CRAN). Deploying the PrAna requires R version 3.5 or greater and a small number of package dependencies, detailed instructions can be found in the online documentation[33].
2.2.1 PrAna Installation
Several functions wrapped in the PrAna package utilised in the workflow, we recommend using RStudio to install PrAna, to keep directories specific to a single `'Rproj`. Since the code is published on GitHub, it can be installed using `devtools`. With `devtools` installed you can download and install the latest version of 'PrAna' in the R console with `install_github("PrAnaViz/PrAna”)`.
2.2.2 Data Preparation
It is strongly suggested to setup the destination folder as the working directory using setwd() function. csv2dat() function is used to combine several monthly NHS prescription dataset files, downloaded from NHS digital[29] into a single data table. For example, if the user wants to combine all the monthly prescription dataset downloaded in the "C:/Datasets/Prescription Datasets/2018/PDPI" location to a destination folder “C:/Datasets/2018”, following commands need to be executed in the R console.
importdmd() function is used to import different extracted dm+d files[32] and return multiple data objects including a data table which map each BNF code to its corresponding API(s), strength and medicinal form. Recommended to read the documentation of importdmd() function to know more regarding the different data objects it generate[33].
## Read the extracted dm+d files
2.2.3 Data normalisation and conversion
The final step in the data conversion is to use practice_wise() to generate the prescription dataset mapped with the individual API, prescription quantity, medicinal form, and strength for the defined GP practice(s).. The practice_wise() function require following six parameters, as demonstrated in the example below,
- Combined NHS prescription dataset, generated using csv2dat()
- A character vector containing GP Practices.
- A data table containing BNF Code mapped to individual APIs, strength, medicinal form.
- Unit of measurement with multiplication factors file
- Different medicinal forms with its corresponding codes file
- Different APIs with its corresponding codes file
2.2.4. Database service
As a result of the large datasets, to run PrAnaViz we recommend uploading the processed dataset to a local or a remote database service, for example, MySQL, and link it to the PrAnaViz. More information on the linking databases to PrAnaViz is explained in our technical documentation online.
2.3. PrAnaViz – an interactive data analysis tool
We developed an interactive tool PrAnaViz to visualise, explore and export different spatiotemporal prescription trends for wider use using the GP specific prescription data generated using PrAna package. Created using R/Shiny, PrAnaViz, uses a web-based interactive dashboard layout that most users are familiar with from common websites and web-based tools.
To launch PrAnaViz run the following command in your R Console:
> library(PrAna)
> PrAna::runShiny("PrAnaViz")
The PrAna::runShiny("PrAnaViz") function will pop-up the PrAnaViz tool which will allow you to explore different spatiotemporal and long-term prescription trends with the sample dataset.
The basis of PrAnaViz functionality is that any user can explore prescribing trends broken down by chemical substance, and to explore and visualise the variation in prescribing at region, postcode, and individual GP practice level, where both the overall trends and the relative contribution from each API, medicinal form can be seen. PrAnaViz contains two different dashboards tabs: (1) Targeted API approach in which a user can input their target(s), i.e., API(s) of interest and calculate, visualise, and explore total prescribed quantity of targeted API(s) in a selected region, (2) Non-targeted API approach where the user can select an individual API and visualise total prescribed quantity of that API at resolutions down to individual postcodes.
Users can input API target(s) of interest as a comma separated value (.csv) file, for the targeted API approach and can input the connection strings to connect their databases to the PrAnaViz. Detailed instructions on the input options are available in the supporting information and online tutorial documentation for PrAnaViz[33]. Users can export the graphs as a publication ready Encapsulated PostScript (.eps) file or Portable Network Graphics (.png) or Portable Document Format (.pdf) file and corresponding datasets as a .csv file according to selection criteria defined by the user to carry out their own analyses. The Figures (Figure 2, 3 and 4) used in the article are generated using PrAnaViz.