2.1. Problems and Background
Pooling data, an initial step in the process of meta-analysis, often requires the mean value and the SD. Withal, in many reported cases, the publisher only reveals the median, sample size, range and/or interquartile ranges (IQR) (1, 14). This poses a challenge to the researcher, who many at times might not clearly apprehend the underlying mathematics to derive the required values of mean and variation (or SD). Hence, the rationale behind the software development is to make this approach of deduction easier for non-statisticians.
Based on the data usually available on the scientific article, and the accuracy during different sample size and range, the toolbox has encompassed the two methods devised by Hozo et. al (2005) and Bland (2015). The former requires the input values to be median, size of sample, minimum and maximum of the range of data. It involves certain fundamental inequalities to be applied and infer the mean and SD, pertinent to all types of distribution of data. The final conceived formula for determining mean is as follows:
Formula 1. Mean calculation by Hozo et al., method.
*If sample size (n) exceeds 25, then median itself is the best estimator.
We observe two contexts of data, where the respective estimate of mean is based on the sample size. Understood as when the value of n (sample size) is lesser than 25, then the approximated value of mean would be given by the formula (1) where “a” is smallest value of IQR (minimum), “b” is the largest value of IQR (maximum), and “m” is the median. Whereas, when the value of n exceeds 25, it has been appraised that the median itself is the best estimate [Formula 1] (1).
For the evaluation of SD, three formulae based on n (sample size) have been framed. It uses the same parameters of “a”, “b” and “m”. The formula given for determining variance (or SD) is shown below [Formula 2]:
Formula 2. Standard Deviation calculation by Hozo et al., method.
The latter method, given by Bland (2015) demands First quartile and Third quartile along with the median, sample size, minimum and maximum of the range of data as the input. It is an extended version of the method proposed by Hozo et. al. Similar to the original method, Bland (2015) also exploits the usage of inequalities for each observation. But here, using minimum, maximum and the three quartiles, the inequalities have been generated. The culminating formula is as follows [Formula 3]:
Formula 3. Mean calculation by Bland method.
We observe that the variables necessitated are “a”, “q1”, “m”, “q3”, and “b”; which are the minimum (smallest value of range), first quartile, median, third quartile and maximum (largest value of range) respectively. As for the evaluation of SD, using the same parameters, the formula is given below [Formula 4]:
Formula 4. Standard Deviation calculation Bland method.
2.2. Software Framework
2.2.1 Software Architecture
Deep Meta Tool (DMT) is a tool designed to support data preparation for meta-analysis. The software provides a quick and user-friendly platform for the calculation of Mean and SD from median and IQR, essential to conduct a meta-analysis, combining two different statistical methods.
Our tool is written in Python (3.8.3). The base window and GUI features are given by Tkinter module. Pandas, OS and math module are also used to carry out the inner functioning of the tool.
We have used pyinstaller module to convert our python file to windows executable. Current version of our tool is supported on Windows and Linux operating systems. We are expecting user feedback to enhance and modify the tool in newer versions. We would make it available for multiple operating systems including Android.
2.2.2 Software Functionalities
DMT is an open-source, free software, after whose installation you would obtain the following files at the installation directory –
- Deep Meta Tool Version 1.0 folder, which has all the system files.
- An application file to launch the software. (Make a desktop shortcut for ease.)
- Sample_Data.csv file, which is useful to understand input data format required for the Median Calculator section.
- A User Manual.pdf.
- All other are system files required to run the software.
As elaborated in the previous sections, the software presents both the methods on its user interface (displayed in Fig. 1). The first option of the software utilizes Hozo et al., 2005 method to calculate and the data format required as input for the prediction is Median (IQR). The software enables a very straight-forward functionality demanding inputs required for the formula such as IQR minimum and maximum, median, etc. and a clear “Calculate Mean” and “Calculate SD” button for the generation of output as well.
The data collection and processing prior to meta-analysis involves various datasets. Conceiving this precondition, the software includes a “Store Value” option which enables the user to store their calculated mean and SD data. An Excel sheet compiling all the computed and stored values as the output file named as "Hozo_Method_Output.csv" by default, gets automatically saved in the same directory as the software once “Generate Output” is clicked. It is to be noted that no value must be left empty, and the user must be aware of the sample size and utilize the corresponding formula accordingly.
As for the Bland method, the software has incorporated another input category of Raw Data. DMT is pre-programmed to extract the required parameters for the enumeration of SD and mean, from the Raw Data input and directly inputted into the software. The procured parameters can be found under the name "Bland_Method_Parameters.txt" in the DTM Version 1.0 folder. A similar functionality of storing values and generating an output file is in-built in Bland method too, stored by default as "Bland_Method_Output.csv". It has been observed that the bigger the sample size, more accurate is the estimation of mean and SD. The format of the raw data is as shown in Table 1. and must be named as MEDIAN_DATA.
The application has a “About author” section to exhibit a QR code accessing the author’s website for contact, report a bug or feedback. On the Help toolbar, a detailed guide to usage of software is specified.
MEDIAN_DATA
|
2.85
|
2.85
|
2.98
|
3.04
|
3.1
|
3.1
|
3.9
|
3.3
|
3.54
|
Table 1. Sample data format for bland method.