Comparison of transcriptomic and phenomic profiles for the prediction of drug mechanism

doi:10.21203/rs.3.rs-3460430/v1

Download PDF

Article

Comparison of transcriptomic and phenomic profiles for the prediction of drug mechanism

https://doi.org/10.21203/rs.3.rs-3460430/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Transcriptomic and phenomic profiling assays analyze drug perturbations to provide unbiased information regarding the mechanisms of action (MOAs) of drugs. However, few studies have compared the bioinformatics contents derived from these assays. This study investigated the transcriptomic and phenomic features in terms of diversities and MOA prediction. From publicly available L1000 and Cell Painting datasets, transcriptomic and phenomic features for 274 compounds annotated with 30 MOAs were prepared for analyses. Feature-extraction analyses with tSNE and Isomap algorithms showed that the compound distribution based on transcriptomic features was more dispersed than that based on phenomic features. Pairwise comparison across compounds showed high correlative clusters in phenomic feature heatmap. To explore the predictive potential for the MOA of compounds, transcriptomic and/or phenomic features were used to train machine learning models. XGBoost and Extra Tree models resulted in overfitting, whereas the KNN and Adaboost models yielded a relatively lower performance. Notably, the glucocorticoid receptor agonist was the class of MOA with the highest predictability based on transcriptomic and/or phenomic features. In conclusion, L1000 features were more diverse than the Cell Painting features. Machine learning analysis suggested new similar pairs of compounds and predicted certain classes among MOAs more accurately than others.

Biological sciences/Computational biology and bioinformatics

Biological sciences/Systems biology

phenomics

transcriptomics

informatics

machine learning

mode of action

The identification of a compound’s mechanism of action (MOA) is a significant challenge for drug discovery¹. Traditional target-based or phenotypic cell-based screening strategies often fail to reveal the pharmacodynamics of a compound in an unbiased manner, highlighting the need for further downstream characterizations. Profiling assays provide high-dimensional information on DNA, RNA, proteins, metabolites, microbiota, and spatial information, enabling systematic evaluation on the effects of drug treatment. Gene expression and cell morphology are widely used profiles to assess human cancer cells. Gene expression in the RNA reflects cellular responses to drug treatment and the effects of multiple genetic variants². Morphological features, such as cell size, shape, and texture, define the phenotypic profile³. Both transcriptomic and phenomic profiling can generate sufficient information to be able to identify the MOA of a compound, delineating off-target effects and predicting cytotoxicity^{2, 4–6}. However, few comprehensive studies have compared these two assays in terms of feature diversity or MOA prediction.

The L1000 platform, a bead-based hybridization assay, enables the measurement of non-abundant transcripts at a low cost, thus providing a high degree of cross-platform similarity with RNA sequencing⁷. The platform evaluates the expressions of 978 landmark transcripts, enabling the computational inference of expression levels of 81% of non-measured transcripts. It was adapted in the Connectivity Map, which couples large-scale perturbations induced by small molecules and genetic modulations with gene expression readouts, thereby facilitating the identification of the MOA of small molecules and aiding the development of new therapeutic hypothesis^{7, 8}. For example, screening the L1000 profiles of perturbagens to inhibit carnitine O-octanoyltransferase expression has led to the identification of niclosamide as a potential drug candidate for vascular calcification⁹. The L1000 profiles of perturbagens were used to construct machine learning¹⁰ and deep learning¹¹ models for MOA prediction.

The Cell Painting assay uses images from five fluorescent channels of cultured cells treated with six dyes to label eight cellular compartments. Subsequently, the software extracts thousands of morphological readouts from these images^{12, 13}. This morphologic profiling can be envisioned as a morphological map, reflecting underlying gene functions¹⁴. The morphological features of compound are matched with gene expressions, and this phenotype screening has proven instrumental for the identification of hit compounds that target YAP1¹⁵. Multiple datasets containing Cell Painting profiles of small molecules are deposited in public repositories, such as The Cell Image Library and Image Data Resource. In addition to Cell Painting, various phenomic profiling platforms have demonstrated potential usefulness for quantifying NETosis¹⁶, profiling compounds to distinguish MOAs¹⁷, and identifying candidates for antibiotics¹⁸ and PROTAC¹⁹.

While profiling assays provide information on a broad scale, devoid of human bias, to predict the MOA of a drug, several questions need to be addressed prior to selecting traditional omics analyses. These include the informativeness or redundancy of profiling assays for depicting a compound’s perturbation, whether certain compounds associated with a specific MOA yield more discernable signals compared to compounds with other MOAs, and whether the integration of profiles is complementary to each other enough to offset the concern of redundancy across features^{5, 20}. To address these concerns, this study compared transcriptomic and phenomic profilings in terms of both features and compound diversities, which hold the potential to more effectively reveal biological changes. Furthermore, we investigated the transcriptomic and phenomic features that were useful for constructing machine learning models for predicting a compound’s MOA. Performance assessments were conducted for the entire and the subclasses of MOAs. To achieve these objectives, we used the features of compounds that are common to both the L1000 and Cell Painting datasets, annotated with their respective MOAs. These features were analyzed using unsupervised machine learning with tSNE and Isomap algorithms and supervised machine learning XGBoost, Extra Tree, KNN and Adaboost models.

Data collection, MOA annotation and data pre-processing

To analyze the transcriptomic and phenomic profiles of small molecules, we utilized publicly available datasets under relevant cell culture conditions, and extracted the features of compounds that were common between the two profiles. In particular, the “L1000 gene expression profiling assay–DOS small molecule perturbagens (LDG-1191: LDS-1194)” dataset in LINCS Data portal pertained to U2OS cells subjected to perturbations by 21,603 compounds at a concentration of 10 µM for 48 h. We used GIGA database to extract corresponding phenomic data which contains Cell Painting profiles of U2OS cells treated with 30,337 compounds at a concentration of 5 µM for 6 h (id: 100200)²¹. Among the compounds included in the two profiling assays, 14,449 compounds were shared, of which 13,554 were uncharacterized compounds labeled with BRD codes and 895 were characterized compounds labeled with trivial names.

To use transcriptomic and phenomic profiles for the purpose of machine learning to predict MOAs, the 895 characterized compounds were annotated with their respective MOAs based on the PubChem, DrugBank, and OpenTarget databases. In instances where a compound is associated with multiple MOAs, the MOA deemed clinically relevant or representative of experimental use was selected as the adapted annotation. To mitigate potential imbalance of compound count for each MOA, MOAs with ≥ 5 compounds were selected to ensure a balanced representation across MOAs. As a result, 274 compounds, each annotated with 1 of 30 distinct MOAs, were selected for this study (Supplementary Table 1). The process of MOA annotation was independently reviewed by two pharmacists.

The L1000 profiles contain 22,268 gene expression features per compound which were already normalized and collapsed for replicates according to the z-scoring per compound and the moderated z-score procedure, respectively⁷. Cell Painting profiles contain 1,727 phenomic features per compound, but the values were not normalized or collapsed across replicates. To ensure equitable feature numbers across the two datasets, we discarded the inferred gene features within L1000, resulting in a set of 1,764 features that corresponded to landmark genes for further analysis. In addition, the features obtained from the Cell Painting dataset underwent normalization using the z-scoring per compound and moderated z-score transformation according to the L1000 procedures. In summary, the normalized and replica-collapsed features from 1,764 transcriptomic traits and 1,727 phenomic traits across 274 compounds were used for this analysis.

Assessing the diversity of transcriptomic and phenomic profiles based on dimension reduction and similarity metrics

The presence of diverse features can be advantageous for determining the cell status and MOA of compounds. The diversity of transcriptomic and phenomic features was explored using unsupervised dimension reduction and visualization. We used nonlinear feature extraction algorithms, tSNE, and Isomap, with the former retaining geodesic distance, whereas the latter represents the overall structure of data points²². Using tSNE or Isomap, the compound distribution based on transcriptomic features was more dispersed than that based on phenomic features (Fig. 1). Isomap exhibited a slightly more distinct pattern than tSNE. The distribution of data based on merged (transcriptomic + phenomic) features was intermediate to that of data based on individual features.

As phenomic features demonstrated lower diversity than transcriptomic features for distinguishing compounds, we examined the redundancy of features by computing pairwise cosine and Spearman similarities between features across all compounds. As anticipated, phenomic features were highly correlated with each other in specific clusters (Fig. 2). Conversely, transcriptomic features were less strongly correlated than phenomic features, with only rare correlations between transcriptomic and phenomic features.

Assessing compound similarity based on transcriptomic and phenomic profiles

We assessed whether the transcriptomic and phenomic features would confer unique properties to compounds. Pairwise similarity between compounds was determined based on cosine correlation analysis of transcriptomic and/or phenomic features, and the resulting distribution of correlation scores across the compounds is presented as a histogram (Fig. 3). Most compounds had a similarity score of − 0.3 to 0.1 based on transcriptomic features, indicating that these transcriptomic features convey unique compound perturbations (Fig. 3A). However, the compound similarity based on phenomic features was skewed toward the right compared to the transcriptomic features (Fig. 3B). In most cases, the similarity score based on phenomic features ranged between 0.3 and 0.7, with a mean value of 0.54. The distribution of the compound similarity based on merged features was intermediate between the similarity distribution based on individual features, with a mean value of 0.21 (Fig. 3C).

We examined whether the similarity score of the compound pairs targeting the same MOA would be higher than that of all the pairs. The mean pairwise similarity scores between compound pairs based on merged features according to the 30 MOAs are presented in Fig. 3D. However, the resulting data demonstrated a high level of variance, which makes it challenging to derive conclusions. In addition, the compound similarity matrix based on unbiased feature comparison suggested potential correlations between compound pairs. For example, doxylamine was highly correlated with LFM-A12, mepivacaine, indoprofen, and cetirizine (Table 1).

Table 1

Representative correlative compound pairs based on merged features
Compound 1		Compound 2		Cosine correlation score^*	Evidence
Name	MOA	Name	MOA	Cosine correlation score^*	Evidence
Doxylamine	Histamine receptor antagonist	LFM-A12	EGFR inhibitor	0.771	Reference 24
		Mepivacaine	Sodium channel alpha blocker	0.767	Reference 25, 26
		Indoprofen	Cyclooxygenase inhibitor	0.766	-
		Cetirizine	Histamine receptor antagonist	0.600	The same MOA
^*The mean cosine correlation score of doxylamine with all compounds was 0.155.

Predicting the MOA of compounds based on transcriptomic and phenomic profiles

Finally, we investigated whether the supervised machine learning model could predict the MOA of compounds based on the aforementioned data types. The data were split into training and validation sets at a ratio of 70:30. The models were trained using the KNN, AdaBoost, XGBoost, and Extra Tree algorithms, which are commonly used in biomedical datasets. Analysis of model accuracy showed that training with transcriptomic and phenomic features on XGBoost or Extra Tree algorithms led to overfitting for the training set and low performance in the validation set (Fig. 4A). Training with combined transcriptomic and phenomic features by KNN and Adaboost algorithms did not provoke overfitting. The KNN algorithm showed accuracies of 0.376 and 0.069 and Adaboost algorithm showed accuracies of 0.163 and 0.115 in the training and validation sets, respectively. The accuracy of the model trained on merged features was higher than that trained on features from a single profile in the KNN model (validation set = 0.023, 0.036, and 0.069 for transcriptomic, phenomic, and merged features, respectively). But the training on merged features produced comparable performance to the training on transcriptomic features in the Adaboost model (validation set = 0.115, 0.069, and 0.115 for transcriptomic, phenomic, and merged features, respectively).

Because model accuracy decreases by the presence of insignificant or redundant similar features, we next reduced feature size and re-evaluated model performance. As XGBoost and ExtraTree algorithms provide the features list of the top 10% in importance for learning, we extracted these features and applied them to machine learning with KNN, which showed the highest accuracy for the training set without overfitting (Fig. 4B). In the validation set, the accuracies of the models trained with these selected 10% features from the XGBoost algorithm were 0.035, 0.081, and 0.081 for transcriptomic, phenomic, and merged features, respectively, whereas those trained with selected features from ExtraTree algorithm were 0.058, 0.046, and 0.035 for transcriptomic, phenomic, and merged features, respectively (Fig. 4B). These results indicate a similar performance of these models with those trained on all features.

Next, we examined whether certain MOAs could be better predicted by the selected 10% of transcriptomic, phenomic, and merged features based on precision, recall, and F1-score (Table 2). Transcriptomic features predict the MOAs of adrenergic receptor agonist and glucocorticoid receptor agonist with an F1-score of 0.40 for the validation set. The MOA of the glucocorticoid receptor agonist was also predicted by phenomic features and merged data with an F1-score of 0.44 and 0.33 for the validation set, respectively.

Table 2

The list of MOAs best predicted by KNN model trained with features of the top 10% of importance
Features	MOA	Training set			Validation set
Features	MOA	Precision	Recall	F1-score	Precision	Recall	F1-score
Top10% Transcriptomic	Adrenergic receptor agonist	0.400	0.667	0.500	0.500	0.333	0.400
	Glucocorticoid receptor agonist	0.667	0.400	0.500	1.000	0.250	0.400
	Adrenergic receptor antagonist	0.600	1.000	0.750	0.250	0.500	0.333
	Cyclooxygenase inhibitor	0.539	0.539	0.539	0.167	0.200	0.182
Top10% Phenomic	Glucocorticoid receptor agonist	0.800	0.800	0.800	0.400	0.500	0.444
Top10% Phenomic	Cyclooxygenase inhibitor	0.300	0.462	0.364	0.200	0.200	0.200
Top10% Transcriptomic + Phenomic	Glucocorticoid receptor agonist	1.000	0.400	0.570	0.500	0.250	0.333
Top10% Transcriptomic + Phenomic	Cyclooxygenase inhibitor	0.429	0.692	0.529	0.077	0.200	0.111

Despite the increasing use of omics analysis, few studies have compared features from different profiling modalities in terms of variability, redundancy, and MOA dependency. Various analyses with transcriptomic and phenomic datasets can contribute to enhance our understanding of the strengths and limitations of profiling assays for evaluating the MOAs of compounds⁶. To meet this purpose, our study examined the diversity of transcriptomic and phenomic profiling by feature extraction and similarity-matric analysis. Furthermore, we analyzed the performance of machine learning to predict MOAs depending on feature types and the MOA of compounds using the L1000 and Cell Painting datasets.

Our analysis revealed that transcriptomic features have higher diversity than phenomic features. Data visualization by tSNE or Isomap algorithm showed that the compounds were the most dispersed by transcriptomic feature extraction, whereas phenomic feature extraction led to the formation of aggregated clusters of multiple compounds. Furthermore, the heatmap of pairwise feature correlations showed an overall pattern of higher scores in phenomic features than in transcriptomic features. These results indicate that Cell Painting contains more redundant measurements than L1000. Feature redundancy in Cell Painting was also observed in another study in which only 1,020 features were selected for analysis¹. Currently, phenomic profiling lacks representative platforms, standardized computational pipelines, and comprehensive publicly available datasets. Experimental and computational methods, such as robust analysis of the spatial and functional changes in cell status and the use of packages compatible with images from various platforms, will enhance the usefulness of the morphology datasets^{3, 23}.

As certain MOAs may alter cell morphology with relatively few changes in gene expression and vice versa, we examined whether the compounds associated with the same MOA have higher correlative features than the compounds of random pairs (Fig. 3D). In spite of the variation in the correlation scores of compounds, a tendency toward capturing the MOAs of adrenergic receptor antagonists and glucocorticoid receptor agonists by phenomic features and capturing the MOAs of beta-adrenergic receptor agonists by transcriptomic features were observed. Meanwhile, compound similarity analysis identified new highly correlated compound pairs. Experimental validation for correlative compound pairs is challenging, but previous studies have provided supporting evidence for some of the correlative pairs. Doxylamine was highly correlated with cetirizine, which has the same MOA as the histamine receptor antagonist doxylamine. Doxylamine was also correlated with mepivacaine and LFM-A12 based on the similarity in merged features (Table 1). Both doxylamine and mepivacaine produce spinal motor and sensory blockade²⁴. LFM-A12 is a specific inhibitor of the EGFR tyrosine kinase, and doxylamine potentially inhibits non-homologous end joining pathway 1 (LINP1) expression, which are regulated by the EGF signaling pathways^{25, 26}. Given the common use of similarity-based repurposing, the integration of omics features can aid in the identification of new MOAs for compounds.

To improve the machine learning model for MOA prediction, multiple trials have evaluated distance-based or tree-based algorithms, training with transcriptome, phenome, or merged features, and training on selected features with top 10% importance, but the performances were low (Fig. 4). The application of the XGBoost and Extra Tree algorithms led to overfitting for the training set, which could be produced by the disparity between the number of variables and the number of samples (also called the curse of dimensionality)²⁰. Training with the whole transcriptomic and phenomic features led to similar accuracies of 0.03–0.08 in KNN models to training with selected features with top 10% importance. A recent study also demonstrated a low performance of machine learning models with the L1000 or Cell Painting features of 1,327 compounds with 511 MOAs using deep learning and Ensemble architecture (area under the precision-recall curve of 0.04)¹. Thus, a systemic approach encompassing MOA annotation, robust acquisition of experimental results, and computational support for normalization, feature selection, and training algorithm would be required for the use of profiling readouts for MOA prediction. Notably, glucocorticoid receptor agonist was the MOA best predicted by selected features from transcriptomic, phenomic, and concatenating profilings (Table 2), indicating that the MOA-specific response of profiling assay can be used to plan profiling.

The present study had several limitations. First, the two profiling assays used in our study were performed separately, so the differential experimental conditions may have led to increased variance in describing the compound’s perturbations. Currently, comprehensive datasets including a large number of compounds under the relevant cell condition have limited availability. However, data-sharing policies and preference for unbiased screening methods from biotech companies will likely accelerate large-scale analyses of various profiling assays. Second, the lack of a dominant platform for phenomic data processing can lead to additional variation in a compound’s perturbation. Despite the aforementioned limitations, our analysis of transcriptomic and phenomic datasets can aid in the design of profiling assays to evaluate the MOAs of compounds. In summary, our results demonstrated that the L1000 transcriptomic features were more diverse than the Cell Painting phenomic features. The use of unsupervised and supervised machine learning suggests that these profiling assays can identify new drug pairs based on similarity and predict a distinct set of MOAs.

Data sources

To obtain transcriptomic profiles of small molecules, “L1000 gene expression profiling assay–DOS small molecule perturbagens (LDG-1191: LDS-1194)” dataset was downloaded from the LINCS Data portal (https://lincsportal.ccs.miami.edu/datasets/view/LDG-1191) in May 2021. The corresponding phenomic data were extracted from “A dataset of images and morphological profiles of 30,000 small-molecule treatments using the Cell Painting assay” retrieved from the GIGA database in May, 2021 (http://www.gigadb.org/dataset/view/id/100200).

Machine learning framework

The distribution of transcriptomic and phenomic features was visualized using t-distributed stochastic neighbor embedding (tSNE) and Isomap techniques. Unsupervised clustering of all features and compounds was analyzed by pairwise cosine or Spearman correlations.

In the context of machine learning, models were constructed to predict a compound’s MOA using four commonly used algorithms: distance-based k-nearest neighbor (KNN) classifier along with decision tree-based Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost), and Extra Tree (ET) classifiers. The data were split into a 70% training set and a 30% separate validation set using the train_test_split function from the Scikit-learn library. Each model was trained on the training set to estimate model parameters, followed by evaluation of the validation set to gauge performance.

The performance of each model was evaluated based on accuracy (Eq. 1), precision (Eq. 2), recall (Eq. 3), or F1-score (Eq. 4). All machine learning models and default hyperparameters were implemented using the Scikit-learn library from Python (version 3.11.0). Scatter plot, heatmaps, and histograms were generated using the Seaborn and Matplotlib library in Python. The features were standardized to fall between 0 and 1 using the MinMaxScaler of the scikit-learn library in Python for machine learning.

\(\text{A}\text{c}\text{c}\text{u}\text{r}\text{a}\text{c}\text{y}=\frac{\text{T}\text{P} + \text{T}\text{N}}{\text{T}\text{P}+\text{T}\text{N}+\text{F}\text{P}+\text{F}\text{N}}\)	(Eq. 1)
\(\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n} = \frac{\text{T}\text{P}}{\text{T}\text{P}+\text{F}\text{P}}\)	(Eq. 2)
\(\text{R}\text{e}\text{c}\text{a}\text{l}\text{l} = \frac{\text{T}\text{P}}{\text{T}\text{P}+\text{F}\text{N}}\)	(Eq. 3)
\(\text{F}1–\text{s}\text{c}\text{o}\text{r}\text{e} =2 \times \frac{\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n} \times \text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}{\text{P}\text{r}\text{e}\text{c}\text{i}\text{s}\text{i}\text{o}\text{n}+\text{R}\text{e}\text{c}\text{a}\text{l}\text{l}}\)	(Eq. 4)

(TP, true positive; TN, true negative; FP, false positive; FN, false negative)

Conflict of Interest

None

Acknowledgements

This work was partly supported by National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)(No. NRF-2021R1A2C2014145) and by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT)(No.RS-2022-00155857, Artificial Intelligence Convergence Innovation Human Resources Development (Chungnam National University))

Author contributions

I.Y.B. and T.G. contributed to data collection, analysis, and interpretation of the study. T.T.C. contributed to analyze data and to write manuscript. D.K. contribute to the idea, design, and interpretation of the study. S.J.L. contribute to the idea, design, interpretation, and manuscript of the study. The contents and publication of the manuscript have been approved by all authors.

Data availability statement

The data that support the findings of this study are available from the LINCS Data portal (https://lincsportal.ccs.miami.edu/datasets/view/LDG-1191) and GIGA database (http://www.gigadb.org/dataset/view/id/100200).

Additional Information

None.

Way, G.P. et al. Morphology and gene expression profiling provide complementary information for mapping cell state. Cell Syst 13, 911-+ (2022).
Kwon, O.S., Kim, W., Cha, H.J. & Lee, H. In silico drug repositioning: from large-scale transcriptome data to therapeutics. Arch Pharm Res 42, 879–889 (2019).
Caicedo, J.C. et al. Data-analysis strategies for image-based cell profiling. Nat Methods 14, 849–863 (2017).
Jamal, S., Goyal, S., Shanker, A. & Grover, A. Predicting neurological Adverse Drug Reactions based on biological, chemical and phenotypic properties of drugs using machine learning models. Sci Rep-Uk 7 (2017).
Misra, B.B., Langefeld, C.D., Olivier, M. & Cox, L.A. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol (2018).
Liu, A., Seal, S., Yang, H. & Bender, A. Using chemical and biological data to predict drug toxicity. SLAS Discov 28, 53–64 (2023).
Subramanian, A. et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 171, 1437–1452 e1417 (2017).
Han, H.W. et al. LINCS L1000 dataset-based repositioning of CGP-60474 as a highly potent anti-endotoxemic agent. Sci Rep-Uk 8 (2018).
Tanaka, T. et al. Computational Screening Strategy for Drug Repurposing Identified Niclosamide as Inhibitor of Vascular Calcification. Front Cardiovasc Med 8, 826529 (2021).
Zhao, K., Shi, Y.J. & So, H.C. Prediction of Drug Targets for Specific Diseases Leveraging Gene Perturbation Data: A Machine Learning Approach. Pharmaceutics 14 (2022).
Gao, S.Q. et al. Modeling drug mechanism of action with large scale gene-expression profiles using GPAR, an artificial intelligence platform. Bmc Bioinformatics 22 (2021).
Bray, M.A. et al. Cell Painting, a high-content image-based assay for morphological profiling using multiplexed fluorescent dyes. Nat Protoc 11, 1757–1774 (2016).
Cimini, B.A. et al. Optimizing the Cell Painting assay for image-based profiling. Nat Protoc 18, 1981–2013 (2023).
Rohban, M.H. et al. Systematic morphological profiling of human gene and allele function via Cell Painting. Elife 6 (2017).
Rohban, M.H. et al. Virtual screening for small-molecule pathway regulators by image-profile matching. Cell Syst 13, 724-+ (2022).
Elsherif, L. et al. Machine Learning to Quantitate Neutrophil NETosis. Sci Rep-Uk 9 (2019).
Cox, M.J. et al. Tales of 1,008 small molecules: phenomic profiling through live-cell imaging in a panel of reporter cell lines. Sci Rep-Uk 10 (2020).
Zoffmann, S. et al. Machine learning-powered antibiotics phenotypic drug discovery. Sci Rep-Uk 9 (2019).
Trapotsi, M.A. et al. Cell Morphological Profiling Enables High-Throughput Screening for PROteolysis TArgeting Chimera (PROTAC) Phenotypic Signature. Acs Chem Biol 17, 1733–1744 (2022).
Mirza, B., Wang, W., Wang, J., Choi, H., Chung, N.C. & Ping, P. Machine Learning and Integrative Analysis of Biomedical Big Data. Genes (Basel) 10 (2019).
Bray, M.A. et al. A dataset of images and morphological profiles of 30 000 small-molecule treatments using the Cell Painting assay. Gigascience 6, 1–5 (2017).
Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res 12, 2825–2830 (2011).
Sexton, J.Z. et al. Machine Learning and Assay Development for Image-based Phenotypic Profiling of Drug Treatments, in Assay Guidance Manual. (eds. S. Markossian et al.) (Bethesda (MD); 2004).
Tzeng, J.I., Chiu, C.C., Wang, J.J., Hung, C.H. & Chen, Y.W. Spinal sensory and motor blockade by intrathecal doxylamine and triprolidine in rats. J Pharm Pharmacol 70, 1654–1661 (2018).
Shang, L.M. et al. Genome-wide RNA-sequencing dataset reveals the prognostic value and potential molecular mechanisms of lncRNA in non-homologous end joining pathway 1 in early stage Pancreatic Ductal Adenocarcinoma. J Cancer 11, 5556–5567 (2020).
Sakthianandeswaren, A., Liu, S. & Sieber, O.M. Long noncoding RNA LINP1: scaffolding non-homologous end joining. Cell Death Discov 2 (2016).

No competing interests reported.

SuppPTMoAfinal231018.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Comparison of transcriptomic and phenomic profiles for the prediction of drug mechanism

Status:

Version 1

Abstract

Figures

Introduction

Results

Data collection, MOA annotation and data pre-processing

Assessing the diversity of transcriptomic and phenomic profiles based on dimension reduction and similarity metrics

Assessing compound similarity based on transcriptomic and phenomic profiles

Predicting the MOA of compounds based on transcriptomic and phenomic profiles

Discussion

Methods

Data sources

Machine learning framework

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1