Automatic magnetic resonance imaging series labelling for large repositories

doi:10.21203/rs.3.rs-4369514/v1

Download PDF

Research Article

Automatic magnetic resonance imaging series labelling for large repositories

https://doi.org/10.21203/rs.3.rs-4369514/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Large medical image repositories present challenges related to unstructured data. A data enrichment process allows the storage of additional information for fast identification of the content and properties of medical imaging studies. The aim of this study is to develop a metadata enrichment pipeline to facilitate the secondary use of medical images in a high-throughput environment. Our aim was to develop a categorization tool for the MR series to generate standardized tags that identify relevant image characteristics such as patient orientation, sequence type, weighting type, or the presence of fat suppression.

Three models that make use of machine learning (ML) and DICOM tags are proposed. The dataset for their development consists of 4,666 MR series from cancer patients, labeled by expert radiologists and acquired from different manufacturers, clinical centers, and anatomical regions, covering as much variability as possible with the aim of making the models generalizable to other databases. Moreover, the inference performance of the end system has been evaluated on 25,596 MR series as well as the final model outputs with an external evaluation set of 1,286 MR series.

The weighting model achieves very reliable results with a macro f1-score of 0.88 in the validation set. Junk and chemical shift models achieved scores of 0.82 and 0.83respectively. These results open the door to the automatic application of image post-processing and deep learning algorithms after accurate labeling, minimizing human intervention. Furthermore, the proposed solution can infer thousands of DICOM series in less than 1 minute. Thanks to the fast inference times provided by this solution, it fits well in a big data ecosystem, eliminating any performance issues on ingestion in a semi-real-time environment.

Magnetic resonance

categorization

weighting identification

medical imaging

data mining

radiology

machine learning

big data

DICOM

metadata

Machine learning (ML) capabilities are being used to foster research in clinical environments [1]–[3]. One area that has particularly benefited is radiology. The type of handled data and the increasing improvement of the results in tasks such as segmentation, classification, or object detection, make medical imaging one of the clinical areas that is growing the most in this technological revolution [4]. For the development and operation of AI technologies, it is essential to create data repositories ingested with the highest possible volume of quality annotated data. The creation of large projects in which multicentric imaging data is collected and harmonized has been encouraged in recent years. Relevant examples of research projects involving cancer patients, imaging data, and AI-based algorithms are PRIMAGE [5], CHAIMELEON [6], ProCancer-I [7], Incisive [8], and EuCanImage [9]. These projects are challenging due to the partially unstructured data obtained from the real-world environment.

Medical images are stored in a standardized file format (DICOM) where some partially unstructured metadata fields are linked to their corresponding images in terms of their pixel intensity values. When it is desired to collect those images that meet a defined condition (e.g., the presence of a tumor), each DICOM file needs to be inspected to verify if the condition is fulfilled, iterating through all the data and its associated metadata in the repository. This inefficient process could be improved by the creation of a separate table to store such information. In big data repositories, manual extraction of this information is a challenging and tedious task [10][11].

Therefore, data indexing and data enrichment is a crucial aspect to improve data readiness. In medical imaging, the identification and categorization of MR weighting and sequence is a relevant and challenging mining operation. As an example, the automatic extraction of the Apparent Diffusion Coefficient (ADC) requires the previous identification of the specific diffusion-weighted sequence [1]. Usually, weighting information is manually entered by technicians with non-standard nomenclature as free text in the DICOM series metadata, which leads to problems of interpretation by automatic processing tools. Therefore, the use of series names or series description DICOM tags is not an option in large repositories from different clinical centers. One solution to this problem is the use of ML tools capable of labeling weighting and sequence type. Related works have proposed ML models that label MR sequences, some by looking directly at the images as matrix data [12]–[14], while others make use of the DICOM metadata [15]–[17]. Solutions that use the DICOM metadata as inputs usually offer very short inference times and require less computational power (no GPU needed) than image-based solutions (that use matrices as input) since metadata inferences are simpler. In contrast, image-based solutions are the only ones capable of identifying contrast in the series [13]. None of the solutions studied in the literature offers sufficient categorization classes or information to cover the automatic application of post-processing analysis on MR images and to be able to be used in high-throughput environments. This weighting information is crucial to automatize the process without human intervention.

The work presented in this paper proposes an automatic and modular categorization of MR series using ML methods on DICOM metadata while defining a new tag nomenclature that summarizes relevant information for fast and efficient identification of MR series to be used by radiologists and MR processing analysis.

Categorization scheme definition

In combination with expert radiologists, a categorization scheme is defined. That includes location, type of sequencing, type of weighting, and information about the use of different suppressions. The information contained in the categorization is helpful to get an overview of the content of the series and covers all preconditions necessary for the post-processing and feature extraction analyses.

To comply with this scheme defined in Fig. 1, three different ML models have been developed (weighting classifier, chemical shift detector, and junk detector) together with a final algorithm that merges model outputs with raw values of DICOM headers.

This scheme is defined based on two main criteria. First, it is preferable to subtract information from a DICOM tag, that has 100% confidence, versus a model that includes a probabilistic approach. Secondly, ML models should have a single objective and must be as simple as possible. For instance, it is preferable to employ two separate models that individually classify weighting and junk than a unique model that classifies both weighting and junk simultaneously, where the number of target classes is duplicated.

The DICOM headers used in the schema of Fig. 1 to obtain the non-ML tags are “image orientation (patient)” (0020, 0037), “scanning sequence” (0018, 0020), “spectrally selected suppression” (0018, 9025) and “scan options” (0018, 0022).

It is relevant to note that one of the possible values that the “scanning sequence” DICOM tag can adopt, namely “research mode”, does not provide useful information for the MR series categorization. Therefore, these cases are treated as empty values in this DICOM tag, forcing the experts to manually label the scanning sequence field when this occurs.

In the “surnames” section of the schema, a combination of two different DICOM tags to detect the possible suppression in a series is applied. This is due to the different casuistry observed in the data around empty values of the two relevant tags (“Spectrally selected suppression”, in all cases, and “Scan Options”, when adopting the “FS” value). By combining them is it possible to identify the suppression in a larger set of cases.

Data

Data from the European H2020 PRIMAGE project was used for this paper [18]. The PRIMAGE project collects retrospective information from multiple clinical partners on Neuroblastoma and Diffuse Intrinsic Pontine Glioma (DIPG) in childhood. While DIPG images are specific to the brain, Neuroblastoma can appear in a wide variety of regions including abdominal, thoracic, and pelvic regions. The study images used were not only taken from different patient locations but also present a wide variety of MR manufacturers. The distribution of the manufacturers can be shown in Fig. 2.

In the PRIMAGE project, all DICOM metadata are extracted in the ingestion phase and stored in a semi-structured MongoDB database [19][20]. At the time of developing the categorization tool, 25,596 MR series are present in the platform, of which 4,666 series have been manually annotated by radiologists for model development and 1,286 for final system evaluation. The rest of the series is used to evaluate the final solution's performance on inference. Figure 3 shows the complete diagram of the MR series distribution.

Annotations

In order to annotate the MR series, the following filters have been used as an exclusion criterion before starting with the manual labeling by radiologists. On the one hand, the DICOM image type property (0008,0008), when adopting the "Derived" or "Secondary" values, was used to filter out post-processing pictures. On the other hand, the terms “ADC”, “preprocessed”, “registered” and “map” were filtered with a pattern-matching approach from the series description attribute (0008,103E). In both cases, these series were labeled as “Derived” (as shown in Fig. 3).

Table 1

Sample size by MR series category in dataset.
Weighting	Number of manually labeled samples
T2W	1326
T1W	1256
DCE (Dynamic Contrast Enhance)	496
JUNK (localizer, calibration, others)	484
DWI (Diffusion weighted)	480
CHS (Chemical Shift)	236
STIR (Short Tau Inversion Recovery)	236
FLAIR (Fluid attenuated inversion recovery)	113
T2*W	36
PDW (Proton Density Weighted)	2
SW (Susceptibility weighted)	1

Two experienced radiologists have performed semi-automatic labeling of the 4,666 DICOM MR series, supervising a traditional pattern-matching algorithm that classifies the weighting type. Table 1 displays the frequency of the manually labeled samples used to create the ML models.

The annotations are used to develop three different ML models (weighting classifier, chemical shift detector, and junk detector), but for their development not all the data in Table 1 is used as each particular model requires a specific set of annotations.

The MR series selected to develop a junk model capable of identifying useless data in the database includes the “JUNK” category as a positive class and the rest of the data are annotated as a negative class. The “JUNK” category not only has localizers and calibrations but also screenshots and images that are not useful for any desirable purpose (e.g., post-processing for the extraction of imaging biomarkers). The same occurs with the chemical shift model where the data corresponding to the “CHS” category is marked as a positive class and the rest as a negative class. The latter ML model will be used to enrich the categorization in the surname part of the scheme.

The MR series used for the development of the weighting model includes several of the manually annotated classified samples (T2W, T1W, DCE, DWI, STIR, FLAIR, T2*W). Categories "PDW" and "SW" presented an insufficient number of samples to train the models, so they had to be discarded from the weighting dataset. In addition, the "T2*W" category also showed an insufficient number of samples to train a robust model for this class. Therefore, it was decided to unify this category within the label "T2W" and to perform an a posteriori classification of the class in a declarative way by checking if the DICOM tag “scanning sequence” adopts the gradient echo (GR) value. If so, the label "T2W" was then replaced by "T2*W".

Models

3.4.1 Input features

The features used in the ML models developed in this study are a combination of those used in related works [15], [16] and others proposed by experienced radiologists. Only features automatically generated by machines are included, which have a high percentage of occurrence in the data used (with less than 15% of missing values in the global of all the series). In addition to the DICOM metadata indicated above, two additional features have also been included as a result of a feature engineering process. They correspond to the number of images in a series (“Images”) and to the number of series with Euclidean distance equal to 0 in the same MR study (“Euclidean_similarities”). The list of input features is shown in Table 2.

Table 2

Input features for ML model development.
Categorical Features	Numerical Features
(0018, 0021) Sequence Variant	(0018, 0081) Echo Time
(0018, 0022) Scan Options	(0018, 0082) Inversion Time
(0018, 0020) Scanning Sequence	(0018, 0091) Echo Train Length
(0018, 0023) MR Acquisition Type	(0028, 0030) Pixel Spacing
(0008, 0008) Image Type	(0018, 0080) Repetition Time
(0028, 0004) Photometric interpretation	(0018, 0050) Slice Thickness
(0018, 9025) Spectrally Selected Suppression	(0020, 0105) Number of temporal positions
(0020, 0037) Image Orientation Patient	(0018, 1314) Flip Angle
	(0018, 0095) Pixel Bandwidth
	(0018, 0086) Number of Echoes, (Optionally studied (0019, 10A9))
	Images (number of images in a series)
	Euclidean_similarities (number of series with Euclidean distance equal to 0 in the same study)

A dynamic MR series constitutes the union of different individual series in one with a specific objective. For example, the DCE series is a combination of multiple T1W series across time with the purpose of observing the effect of contrast as a function of time. For a model, it is easy to detect a dynamic series based on the number of images contained if they are stored in the combined form but, as shown in Fig. 4, in the divided or split way it is very difficult to detect if a T1W series belongs to a DCE combination or if it is a unique T1W acquisition.

The “Euclidean_similarities” feature can detect composite series that are stored in a split way within the same study. Composite series usually keep the same acquisition parameters as their internal series except one or two of them, such as contrast or time. For this reason, a selection of variables has been made to calculate the Euclidean distance among the different series. This method provides the number of identical series within the same study. The characteristics used for the calculation of this Euclidean distance are “Scanning Sequence”, “Orientation Patient”, "Echo Time", "Flip Angle", "Repetition Time", "Pixel Bandwidth", "Number of Images", "Number of temporal positions" and "Slice Thickness".

Other DICOM tags have been studied and discarded due to complex problems in the system but may be desirable in future work. One tag to consider is the diffusion b-value (0018, 9087) of the first and last image in the series. This information could be very helpful to alternatively classify the DWI series.

3.4.2 Transformations

As commonly done when training ML algorithms, several feature transformations have been applied to maximize the learning in trained models. The transformations per feature can be seen in Table 3.

Table 3

Transformations of input features. Abbreviations and definitions: SK (segmented k-space), SS (steady state), SP (spoiled), OSP (oversampling phase), PER (phase encode reordering), RG (respiratory gating), CG (cardiac gating), PPG (peripheral pulse gating), FC (flow compensation), PFF (partial Fourier - frequency), PFP (partial Fourier - phase), SP (spatial presaturation), FS (fat suppression), SE (spin echo), GR (gradient echo), IR (inversion recovery), EP (echo planar), RM (research mode), RGB (red, green, blue), FAT (fat suppression).
DICOM Features	Preprocessing
(0018, 0021) Sequence Variant	One hot encoding for values SK, SS, SP, OSP.
(0018, 0022) Scan Options	One hot encoding for values PER, RG, CG, PPG, FC, PFF, PFP, SP, FS.
(0018, 0020) Scanning Sequence	One hot encoding for values SE, GR, IR, EP, RM.
(0018, 0023) MR Acquisition Type	Categorical label encoded.
(0008, 0008) Image Type	One hot encoding for values DERIVED, SECONDARY, LOCALIZER, ADC, SCREENSAVE.
(0028, 0004) Photometric interpretation	Binary value depending if contains RGB.
(0018, 9025) Spectrally Selected Suppression	Binary value depending if contains FAT.
(0020, 0037) Image Orientation Patient	Transformation from numeric vectors to categories Sagittal, Axial, Coronal. One hot encoded transformation.
(0018, 0081) Echo Time	Round the two numeric values to the first decimal. Label encoded transformation.
(0018, 0082) Inversion Time	Set empty values simulating infinity with 100000000 value.
(0018, 0091) Echo Train Length	-
(0028, 0030) Pixel Spacing	Categorical label encoded.
(0018, 0080) Repetition Time	-
(0018, 0050) Slice Thickness	-
Number of images in series (custom feature)	-
(0018, 1314) Flip Angle	Set empty values simulating infinity with 100000000 value.
(0018, 0095) Pixel Bandwidth	Set empty values simulating infinity with 100000000 value.
(0018, 0086) Number of Echoes	Set empty values simulating infinity with 100000000 value.
(0020, 0105) Number of temporal positions	Set empty values simulating infinity with 100000000 value.

Different transformations are applied on the target variable depending on the model developed. In the case of the weighting model, the different categories are transformed with a label encoder. For the junk classifier, a binary encoding is applied (where a value of “1” corresponds to the “JUNK” category and the rest of the categories adopt a value of ”0”). The same strategy is followed in the chemical shift model.

3.4.4 Feature selection

A total of 43 transformed features have been generated after the application of feature engineering with all the transformations listed in Section 3.4.2 Transformations. To select the most relevant features and reduce as much as possible the number of inputs to be used in the models, the Boruta feature selection algorithm [21] has been applied. As it is a supervised algorithm, it has been applied to each model independently (weighting, junk, and chemical shift).

The Boruta algorithm has been chosen over other alternatives because it is a feature detection algorithm that does not require input parameters and thus avoids the user having to make decisions regarding the cut-off threshold for separating the features to keep from the ones to remove. Boruta's only requirement for its application is the choice of the ML algorithm to be used internally for selection. Typically, a RandomForest is used, due to its robustness to data with anomalies or noise, and its ability to detect non-linear relationships between features [22]. In addition, and before applying the Boruta algorithm together with the RandomForest for selection, an imputation of the missing values that may exist with the K-Neighbours algorithm has been performed, since a RandomForest does not allow the presence of missing values for its execution.

Boruta outputs a classification into three groups of features according to the level of certainty in keeping the variables in a predictor. Only the features of the group with certainty to be eliminated have been filtered out, thus keeping the features recommended to be kept and the features of which there is no certainty to be kept or eliminated. This has resulted in a total of 26 features selected for each model.

3.4.5 Algorithms

An intrinsic characteristic of the metadata contained in the DICOM headers is the absence of some particular tags and labels due to the great heterogeneity of the standard, especially if the series come from different hospitals and machines. It is common to encounter missing DICOM tags when working in environments with heterogeneous sources.

The most common classification algorithms cannot deal with data with missing or empty tags and/or values. For this reason, they are often combined with value imputation algorithms capable of detecting patterns. The addition of an imputation model means an increase in the complexity of the overall system and an increase in inference latencies per sample. In big data environments, inference times could be critical and even more so when dealing with medical images, which are costly data to process. For this reason, it has been proposed to use the Catboost algorithm [23], as its ability to deal with missing data, its fast inference times with both CPU and GPU, and its good performance in non-linear detection problems, make it a perfect fit with the desired requirements.

This decision was made after iteratively optimizing the results of a RandomForest with a KNN imputer and a Catboost classifier [24]. The low variability in the scores and the high performance in the obtained results were the main reasons in choosing this feature selection and model combination in this study, resulting in a simple, stable, robust and explainable tool.

The proposed approach is a modular system with three separate models that are eventually combined in a unified way. The methodology followed to evaluate the system consists of, on the one hand, verifying the score results of each single model with classical partitions in train and test, and on the other hand, evaluating the performance of the combination model individually with new data not used in the previous phase.

4.1 Weighting model results

The weighting model obtained as a result of the training and evaluation processes performed with the train and test partition data, respectively, yields remarkably good results. Both the accuracy and the f1-macro metric have a score of 0.98. These results are particularly remarkable as the data utilized belongs to a real clinical environment and therefore is highly heterogeneous (e.g., includes a high variability, the presence of inconsistencies and missing values).

Looking at the detail of the metrics obtained per category (model class) in Table 4, it can be seen that all of them are generally well identified.

Table 4

Weighting classifier evaluation results by category
Class	Precision	Recall	F1-score	Support
DCE	0.98	0.98	0.98	99
DWI	0.99	0.96	0.97	96
FLAIR	0.96	1.00	0.98	23
STIR	0.98	0.96	0.97	47
T1W	0.99	1.00	1.00	126
T2W	0.99	1.00	0.99	133

Looking at the evaluation results of the weighting model in terms of the confusion matrix (Fig. 5), it can be seen that the series FLAIR, T1W and T2W are perfectly classified while the dynamic series (DCE and DWI) present a few errors, confirming the additional difficulty of classifying these additionally complex categories. Nevertheless, the number of errors committed are very low overall (range: 0–5% per series type).

Regarding the importance of the features in the weighting model, as shown in Fig. 6, echo time, repetition time, and flip angle are known to be some of the most important parameters when defining the type of image weighting during the image acquisition process [25], [26]. Therefore, it is to be expected that these features are among the most predictive ones for this particular model. Unexpectedly, the pixel bandwidth feature comes in second place. This may be because the pixel bandwidth is directly related to the voxel size and resolution of the image. There is a well-known rule of compromise between image and temporal resolution. In dynamic series, as they are acquired, the shorter the acquisition time, the lower the resolution they can achieve. The same fact is true for non-composite series, as these types of images can usually be acquired with a higher resolution because they do not require any temporal information.

Additionally, the categorical and binary variable scanning sequence when adopting the IR value is found to be one of the most important model features. The scanning sequence type IR is known to be only used in STIR and FLAIR series, so the high importance of this variable in the model can be justified as it may help to discriminate against this type of weighting and suggests that the behavior of the model is coherent andreasonable.

4.2 Junk and chemical shift model results

The evaluation of the junk classification model performed with the test data set also yields high-quality results, achieving an accuracy of 1.00 and a value of the f1 metric of 0.99. Looking at the evaluation results of the junk model in terms of the confusion matrix (Fig. 7 (left)), only 3 failures out of the 879 samples evaluated are observed (corresponding to a total of 0.34% of errors). These results are shown to be highly satisfactory and useful, and will be used in a combinatory model for the final solution developed in this study.

In terms of the importance of the features of the junk model (Fig. 8), it can be seen that the number of images present in the series is of great relevance. This observation is coherent with the fact that localizers and calibrations, which constitute a large proportion of the junk data in the study, tend to have a small number of images. The feature “ScanOptions_PFP” is used when technicians aim to shorten acquisition times, so it is also expected that localizers and calibrations may contain this particular DICOM tag and, therefore, to be discriminative. The same applies to the Pixel Spacing feature, which includes valuable information regarding image resolution and voxel size.

The results obtained with the chemical shift model are marginally worse than the previous models, resulting in a f1 score of 0.95 and an accuracy of 0.99. Looking at the evaluation results of the model in terms of the confusion matrix (Fig. 7 (right)), 9 failures out of the 697 samples evaluated are observed (corresponding to a total of 1.29% of errors). The main reason for this slight decrease in performance is the unbalanced nature of the data employed to develop this model, as the number of cases labeled as chemical shift constitute only 9.32% of the total data set. Nevertheless, this model will also be used as part of the final combination model as it can add additional valuable information into the complete categorization process.

4.3 External validation in production

After training and evaluating the models with the train and test data sets, respectively, a prediction tool for MR series categorization has been developed and implemented in the PRIMAGE platform. This solution unifies the DICOM tags, and the model outputs required to comply with the scheme defined in Fig. 1.

To conduct an external validation within this production environment, a panel of radiologists reviewed and, where necessary, corrected the final output labels of 1,286 MRI series. Of these, 284 were identified as unsuitable for model evaluation and subsequently excluded.

Table 5

Evaluation results of the weighting classifier in terms of metrics per category (model class) with the external validation data set and performed in a production environment.
Class	Precision	Recall	F1-score	Support
DCE	0.89	0.85	0.87	158
DWI	0.79	1.00	0.88	76
FLAIR	1.00	0.85	0.90	11
STIR	1.00	0.34	0.51	71
T1W	0.94	0.96	0.95	460
T2W	0.83	0.92	0.87	226

Table 6

Results of the end system evaluation in terms of the f1-score and the accuracy metrics per model (weighting, chemical shift and junk) and for the final model combination with the external validation data set and performed in a production environment.
Model	F1-score	Accuracy
Weighting	0.88 (weighted)	0.89
Chemical Shift	0.83 (standard)	0.98
Junk	0.82 (standard)	0.88
Combined	0.77 (macro)	0.88

Table 5 shows the evaluation results of the weighting model in terms of the different metrics for each individual category. It is observed that all series types are classified remarkably well, except in the case of STIR, where the series are misclassified between FLAIR and T2W series.

In Table 6 the overall results of the end system evaluation can be observed. The weighting model shows an accuracy of 0.98 and a f1-macro score of 0.83, suggesting a high model generalizability. The junk model performs consistently with an accuracy of 0.88 and an f1 score of 0.82, while the chemical shift model obtains an accuracy of 0.98 and an f1 score of 0.83. Both junk and chemical shift models present a decrease in the range of 1–12% in accuracy when compared to the results obtained in the training/testing phase.

When evaluating the final combined solution, the corresponding prediction is only tagged as successful when the three individual model predictions are correct simultaneously. The accumulative error of the three models results in a slight decrease in the final score, as expected. Despite this, the combined model offers satisfactory results (an accuracy of 0.88 and a f1 score of 0.77) that are helpful to automate both the identification of MR series and the execution of future image processing analysis.

AI and big data technologies are greatly improving the field of medical image repositories, making it easier to carry out new research projects [27].

Several research studies have shown how feature extraction from medical images may be used to develop clinical predictive models and/or better understand the behavior and heterogeneity of different pathologies [1], [28], [29]. To be able to automate the extraction of MR imaging features, certain information such as the MR sequence type and the dynamic acquisition parameters need to be known in advance. For instance, T1-weighted and T2-weighted images are commonly used to extract radiomics features while DWI and DCE are well-established sequences for the obtention of ADC and AUC60 (defined as the area under the time activity curve of gadolinium contrast over 60 seconds from the start of contrast enhancement) parametric maps, respectively. To perform these types of image analyses, a manual selection task, performed by highly qualified radiologists, is required to identify which MR series comply with the definition of each relevant sequence. This results in a very tedious, challenging, and slow data selection process. The proposed ML approach offers the possibility to automate the identification of the series while opening the door to apply the analysis algorithms on the whole database.

The mined information not only needs to consider the requirements of the automatic analysis but also the demands of potential users that intend to perform a search in the image database. Related works propose other DICOM metadata models [15], [16] but the labels used are not enough to be useful in a real clinical environment. The absence of weightings classes such as DCE or information around the existence of fat suppression supposes a lack in the utility of the AI models as data mining tools. In the case of image selection for applying the radiomic feature extraction process, T2 weighting with fat suppression is preferable to basic T2 weighting. Being aware of the existence of fat suppression in an image may have a positive impact on the segmentation phase due to the ease for the radiologist to segment with this characteristic. At the same time, it may help to avoid possible biases in the extraction of radiomic features, as they are expected to be significantly different depending on the presence or absence of fat suppression.

This study does not manage to consider all the desired enhancements such as proton density, susceptibility or MRA, due to the lack of images annotated with these classes in the employed database. However, this could be an easy improvement when new images are included in the repository.

In this work, instead of harmonizing the entire database to unify dynamic series in their combined form, a more generalist approach has been chosen that is useful in a wider range of scenarios. A custom feature (Euclidean similarities) has been created for each MR series to help the models learn to identify those individual series that belong to a dynamic series and, therefore, can be combined. This solution is crucial to get satisfactory results on the DCE and DWI classifications.

With respect to the junk classification model, it should be noted that it has been especially useful as a first step to be able to clean the database of unwanted series. It has reduced by about 2% the number of series stored in a supervised way. Not only localizers and calibrations are removed, but also other unhelpful series, such as synchronized respiratory series, which should not be stored in the repository and would otherwise not have been identified.

The Catboost algorithm supports the input of empty data in both training and test, reducing the complexity of the system by not having to integrate a missing value imputer in advance. At the same time, it can deal with categorical data natively which makes it very well adapted to DICOM metadata. For these reasons and for its demonstrated performance in big data environments, the Catboost algorithm is considered to be the optimal choice for this type of studies [30].

Testing the performance of the final algorithm that is composed of three Catboost models in parallel, it was found that the entire image repository (~ 20.000 MR series) can be categorized in only 40–50 seconds with a good level of accuracy using a standard CPU with 12 threads. Alternative models that make use of matrices as input with deep learning (DL) algorithms [12], [14], [31] offer much longer categorization times when compared to models based on DICOM metadata. As discussed in [13], a DL approach that only detects contrast in MR studies takes about 4–6 seconds to classify a single MR study making use of a GPU. In addition to this performance difference, the proposed work presents a modular approach that leaves open the option of adding a possible "contrast" classification model that may extract such information if needed, considering that a GPU will be required, and inference times will then increase accordingly.

Working with large medical image databases presents great challenges and problems. In a warehouse with thousands of image files, it is necessary to extract meta-information in a tabular, structured, and standardized format that allows for quick identification and search of the desired content. This is especially relevant in environments where time and computational requirements are critical. At the same time, the lack of a standard nomenclature is a problem that makes the consumption of data a big challenge.

In this work, a standardized categorization schema is proposed with the objective of unifying 1) the nomenclature requirements for the automated execution of image processing tasks and analysis and 2) the additional relevant information required by expert radiologists for an accurate and robust definition of MR sequences. Together with this schema an ML data mining process is designed to extract the information in a big data ecosystem and opens the door to apply automatic image analysis without any human intervention.

The ML data mining process proposed offers inference outputs with low computational power and has a modular design that allows the possibility to expand their functionality by adding new models to the data mining process, such as the identification of contrast with Deep Learning algorithms.

MR: Magnetic resonance

ML: Machine learning

DL: Deep learning

PDW: Proton density weighting

DCE: Dynamic Contrast Enhance

DWI: Diffusion weighted imaging

SW: Susceptibility weighting

STIR: Short Tau Inversion Recovery

FLAIR: Fluid attenuated inversion recovery

CHS: Chemical shift

GR: Gradient echo

SE: Spin echo

IR: Inversion recovery

RM: Research mode

NB: Neuroblastoma

DIPG: Diffuse Intrinsic Pontine Glioma

KNN: K nearest neighbors

ADC: Apparent Diffusion Coefficient

Ethics approval and consent to participate

This study has been approved by the Hospital’s Ethics Committee (The Ethics Committee for Investigation with medicinal products of the University and Polytechnic La Fe Hospital, ethic code: 2018/0228).

Consent for publication

Not applicable

Availability of data and materials

The datasets generated and/or analysed during the current study are available in the Primage Platform Web repository, [https://primage.quibim.com] Anyone interested can request the creation of an account for research on the platform by email to [email protected].

Competing interests

All authors certify that there is no actual or potential conflict of interest in relation to this article.

Funding

This study was funded by PRIMAGE (PRedictive In-silico Multiscale Analytics to support cancer personalized diaGnosis and prognosis, empowered by imaging biomarkers), a Horizon 2020|RIA project (Topic SC1-DTH-07-2018), grant agreement no: 826494.

Authors' contributions

L.M.-B conceived the idea for this study and supervised the work. A.G.M and L.C. A created the datasets, developed the algorithms, and took the lead in writing the manuscript in consultation with L.M.-B. From the radiological point of view, D.V.C, L.M.-B, and A.T. performed the series annotation and provided expert information. All authors discussed the results and contributed to the preparation of the final manuscript submitted. All authors have read and agreed to the published version of the manuscript.

Acknowledgments

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third-party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

L. Cerdá Alberich et al., «A Confidence Habitats Methodology in MR Quantitative Diffusion for the Classification of Neuroblastic Tumors», Cancers, vol. 12, n.^o 12, Art. n.^o 12, dic. 2020, doi: 10.3390/cancers12123858.
A. Rodríguez‐Ortega et al., «Machine Learning‐Based Integration of Prognostic Magnetic Resonance Imaging Biomarkers for Myometrial Invasion Stratification in Endometrial Cancer», J. Magn. Reson. Imaging, vol. 54, n.^o 3, pp. 987-995, sep. 2021, doi: 10.1002/jmri.27625.
Y. Suter, «Radiomics for glioblastoma survival analysis in pre-operative MRI: exploring feature robustness, class boundaries, and machine learning techniques», p. 13, 2020.
C. Scapicchio, M. Gabelloni, A. Barucci, D. Cioni, L. Saba, y E. Neri, «A deep look into radiomics», Radiol. Med. (Torino), vol. 126, n.^o 10, pp. 1296-1311, oct. 2021, doi: 10.1007/s11547-021-01389-x.
L. Martí-Bonmatí et al., «PRIMAGE project: predictive in silico multiscale analytics to support childhood cancer personalised evaluation empowered by imaging biomarkers», Eur. Radiol. Exp., vol. 4, n.^o 1, p. 22, abr. 2020, doi: 10.1186/s41747-020-00150-9.
L. Mart, «CHAIMELEON Project: Creation of a Pan-European Repository of Health Imaging Data for the Development of AI-Powered Cancer Management Tools», Front. Oncol., vol. 12, p. 11, 2022.
«An AI Platform integrating imaging data and models, supporting precision care through prostate cancer’s continuum | ProCAncer-I Project | Fact Sheet | H2020», CORDIS | European Commission. https://cordis.europa.eu/project/id/952159 (accedido 6 de septiembre de 2022).
«A multimodal AI-based toolbox and an interoperable health imaging repository for the empowerment of imaging analysis related to the diagnosis, prediction and follow-up of cancer | INCISIVE Project | Fact Sheet | H2020 | CORDIS | European Commission». https://cordis.europa.eu/project/id/952179 (accedido 6 de septiembre de 2022).
«A European Cancer Image Platform Linked to Biological and Health Data for Next-Generation Artificial Intelligence and Precision Medicine in Oncology | EuCanImage Project | Fact Sheet | H2020 | CORDIS | European Commission». https://cordis.europa.eu/project/id/952103/es (accedido 11 de octubre de 2022).
M. Tanwar, R. Duggal, y S. K. Khatri, «Unravelling unstructured data: A wealth of information in big data», en 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), sep. 2015, pp. 1-6. doi: 10.1109/ICRITO.2015.7359270.
A. Eberendu, «Unstructured Data: an overview of the data of Big Data», Int. J. Comput. Trends Technol., vol. 38, pp. 46-50, ago. 2016, doi: 10.14445/22312803/IJCTT-V38P109.
S. Ranjbar et al., «A Deep Convolutional Neural Network for Annotation of Magnetic Resonance Imaging Sequence Type», J. Digit. Imaging, vol. 33, n.^o 2, pp. 439-446, abr. 2020, doi: 10.1007/s10278-019-00282-4.
R. Pizarro et al., «Using Deep Learning Algorithms to Automatically Identify the Brain MRI Contrast: Implications for Managing Large Databases», Neuroinformatics, vol. 17, n.^o 1, pp. 115-130, ene. 2019, doi: 10.1007/s12021-018-9387-8.
J. P. V. de Mello et al., «Deep Learning-based Type Identification of Volumetric MRI Sequences», en 2020 25th International Conference on Pattern Recognition (ICPR), ene. 2021, pp. 1-8. doi: 10.1109/ICPR48806.2021.9413120.
S. Liang et al., «Magnetic Resonance Imaging Sequence Identification Using a Metadata Learning Approach», Front. Neuroinformatics, vol. 15, 2021, Accedido: 6 de septiembre de 2022. [En línea]. Disponible en: https://www.frontiersin.org/articles/10.3389/fninf.2021.622951
R. Gauriau et al., «Using DICOM Metadata for Radiological Image Series Categorization: a Feasibility Study on Large Clinical Brain MRI Datasets», J. Digit. Imaging, vol. 33, n.^o 3, pp. 747-762, jun. 2020, doi: 10.1007/s10278-019-00308-x.
F. Florea, A. Rogozan, A. Bensrhair, J.-N. Dacher, y S. Darmoni, «Modality categorization by textual annotations interpretation in medical imaging», sep. 2022.
«PRedictive In-silico Multiscale Analytics to support cancer personalized diaGnosis and prognosis, Empowered by imaging biomarkers | PRIMAGE Project | Fact Sheet | H2020 | CORDIS | European Commission». https://cordis.europa.eu/project/id/826494 (accedido 6 de septiembre de 2022).
«MongoDB Atlas: Cloud Document Database», MongoDB. https://www.mongodb.com/cloud/atlas/lp/try4 (accedido 7 de septiembre de 2022).
S. Budd, E. C. Robinson, y B. Kainz, «A Survey on Active Learning and Human-in-the-Loop Deep Learning for Medical Image Analysis», Med. Image Anal., vol. 71, p. 102062, jul. 2021, doi: 10.1016/j.media.2021.102062.
M. Kursa, A. Jankowski, y W. Rudnicki, «Boruta - A System for Feature Selection», Fundam Inf., vol. 101, pp. 271-285, ene. 2010, doi: 10.3233/FI-2010-288.
L. Breiman, «Random Forests», Mach. Learn., vol. 45, n.^o 1, pp. 5-32, oct. 2001, doi: 10.1023/A:1010933404324.
L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, y A. Gulin, «CatBoost: unbiased boosting with categorical features». arXiv, 20 de enero de 2019. doi: 10.48550/arXiv.1706.09516.
G. Batista y M.-C. Monard, A Study of K-Nearest Neighbour as an Imputation Method., vol. 30. 2002, p. 260.
S. M. Erturk, A. Alberich-Bayarri, K. A. Herrmann, L. Marti-Bonmati, y P. R. Ros, «Use of 3.0-T MR Imaging for Evaluation of the Abdomen1», RadioGraphics, oct. 2009, doi: 10.1148/rg.296095516.
M. Skalski, «MRI sequence parameters | Radiology Reference Article | Radiopaedia.org», Radiopaedia. https://radiopaedia.org/articles/mri-sequence-parameters (accedido 11 de octubre de 2022).
S. Dash, S. K. Shakyawar, M. Sharma, y S. Kaushik, «Big data in healthcare: management, analysis and future prospects», J. Big Data, vol. 6, n.^o 1, p. 54, jun. 2019, doi: 10.1186/s40537-019-0217-0.
M. Carles et al., «18F-FMISO-PET Hypoxia Monitoring for Head-and-Neck Cancer Patients: Radiomics Analyses Predict the Outcome of Chemo-Radiotherapy», Cancers, vol. 13, n.^o 14, p. 3449, jul. 2021, doi: 10.3390/cancers13143449.
L. Marti-Bonmati et al., «Pancreatic cancer, radiomics and artificial intelligence», Br. J. Radiol., vol. 95, n.^o 1137, p. 20220072, sep. 2022, doi: 10.1259/bjr.20220072.
J. T. Hancock y T. M. Khoshgoftaar, «CatBoost for big data: an interdisciplinary review», J. Big Data, vol. 7, n.^o 1, p. 94, nov. 2020, doi: 10.1186/s40537-020-00369-8.
A. Kumar, J. Kim, D. Lyndon, M. Fulham, y D. Feng, «An Ensemble of Fine-Tuned Convolutional Neural Networks for Medical Image Classification», IEEE J. Biomed. Health Inform., vol. 21, n.^o 1, pp. 31-40, ene. 2017, doi: 10.1109/JBHI.2016.2635663.

No competing interests reported.

GraphicalAbstract.png

Download PDF

Reviewers agreed at journal
24 Jul, 2024
Reviewers invited by journal
24 Jul, 2024
Editor assigned by journal
18 May, 2024
Submission checks completed at journal
06 May, 2024
First submitted to journal
04 May, 2024

You are reading this latest preprint version

Automatic magnetic resonance imaging series labelling for large repositories

Status:

Version 1

Abstract

Figures

Introduction

Materials and methods

Categorization scheme definition

Data

Annotations

Models

Results

4.1 Weighting model results

4.2 Junk and chemical shift model results

4.3 External validation in production

Discussion

Conclusion

Abbreviations

Declarations

Ethics approval and consent to participate

Consent for publication

Availability of data and materials

Competing interests

Funding

Authors' contributions

Acknowledgments

Rights and permissions

References

Additional Declarations

Supplementary Files

Status:

Version 1