Methods of Analyses of Open Science (Genelab) Datasets: Differential Gene Expression and GSEA Pathway Analysis
The genelab datasets (GLDS) that were analyzed for this work are: OSD-21, OSD-52, OSD-99, OSD-101, OSD-103, OSD-104, OSD-105, OSD-195, and OSD-202. Information on the experimental investigations leading to the datasets is provided in Table 1.
Table 1. Genelab datasets analyzed in the manuscript
Identifier
|
Title
|
Authors, title, publisher, version, DOI
|
Analyzed Tissue
|
OSD-21
|
Effects of spaceflight on murine skeletal muscle gene expression
|
Barth J. "Effects of spaceflight on murine skeletal muscle gene expression", NASA Open Science Data Repository, Version 3, http://doi.org/10.25966/c36b-3g68
|
Gastrocnemius muscle
|
OSD-52
|
Expression data from SPHINX (SPaceflight of Huvec: an INtegrated eXperiment)
|
Bradamante S, Versari S, Longinotti G, Barenghi L, Maier J. "Expression data from SPHINX (SPaceflight of Huvec: an INtegrated eXperiment)", NASA Open Science Data Repository, Version 4, http://doi.org/10.26030/nt3p-p547
|
Endothelial
|
OSD-99
|
Rodent Research-1 (RR1) NASA Validation Flight: Mouse extensor digitorum longus muscle transcriptomic and epigenomic data
|
Galazka J, Globus R "Rodent Research 1", GeneLab, Version 4, http://doi.org/10.26030/1h3m-3q49
|
Extensor digitorum longus muscle
|
OSD-101
|
Rodent Research-1 (RR1) NASA Validation Flight: Mouse left gastrocnemius muscle transcriptomic, proteomic, and epigenomic data
|
Galazka J, Globus R "Rodent Research 1", GeneLab, Version 5, http://doi.org/10.26030/sdmt-ae51
|
Gastrocnemius muscle
|
OSD-103
|
Rodent Research-1 (RR1) NASA Validation Flight: Mouse quadriceps muscle transcriptomic, proteomic, and epigenomic data
|
Galazka J, Globus R "Rodent Research 1", GeneLab, Version 4, http://doi.org/10.26030/9vzk-b116
|
Quadriceps muscle
|
OSD-104
|
Rodent Research-1 (RR1) NASA Validation Flight: Mouse soleus muscle transcriptomic and epigenomic data
|
Galazka J, Globus R "Rodent Research 1", GeneLab, Version 4, http://doi.org/10.26030/em9r-w619
|
Soleus muscle
|
OSD-105
|
Rodent Research-1 (RR1) NASA Validation Flight: Mouse tibialis anterior muscle transcriptomic, proteomic, and epigenomic data
|
Galazka J, Globus R "Rodent Research 1", GeneLab, Version 4, http://doi.org/10.26030/xgw6-6t64
|
Tibialis anterior muscle
|
OSD-195
|
Effects of 21 days of bedrest on human skeletal muscle gene expression
|
Rullman E. "Effects of 21 days of bedrest on human skeletal muscle gene expression", NASA Open Science Data Repository, Version 1, http://doi.org/10.26030/r6bv-rk07
|
Vastus lateralis muscle
|
OSD-202
|
Low dose (0.04 Gy) irradiation (LDR) and hindlimb unloading (HLU) microgravity in mice: brain transcriptomic and epigenomic data
|
Mao X. "Low dose (0.04 Gy) irradiation (LDR) and hindlimb unloading (HLU) microgravity in mice: brain transcriptomic and epigenomic data", NASA Open Science Data Repository, Version 8, http://doi.org/10.26030/ewfb-7g23
|
Brain
|
The Genelab datasets are re-analyzed to produce the t-score. The Differential Expression (DE) analysis of RNA-Seq datasets (OSD-99, OSD-101, OSD-103, OSD-104, OSD-105, OSD-202) is performed using DESeq2 version 1.26.065, in R software version 4.1.2. Expected counts from the RSEM step were extracted and rounded up to the next integer and used as input for DE analysis. The Differential Expression (DE) analysis of microarray datasets (OSD-21, OSD-52, OSD-195) is performed using the package limma66, in R version 4.1.2. The ensembl IDs of the genes in the DE datasets were annotated to their corresponding gene symbols using the R package biomaRt67,68.
We performed gene set enrichment analysis (GSEA) on the differentially expressed datasets using fgsea69 to determine to what extent aging and putative frailty pathways were impacted in spaceflight. Tentative aging and frailty gene pathways were downloaded from the Molecular Signatures Database (MSigDB)26,70 – a joint project of UC San Diego and Broad Institute. MSigDB is a multicollection of databases among which are databases of curated gene sets, and databases of ontology gene sets that are used in the pathway analysis. The latest version MSigDB v2022.1 was used in the pathway search.
In search of pathways that are potentially related to frailty, in MSigDB, we entered keywords of processes that are common denominators to aging in different organisms, and are described in6. The processes of interest are: changes in genomic and genetic material; cellular processes; nutrient signaling; and responses at the tissue level with bone and muscle as the tissues of interest. Changes in genomic and genetic material that are deemed to potentially associate with frailty include genome instability, epigenetic alterations, histone modifications, dna methylation, dna damage, and changes in chromatin structure. Cellular processes that are of interest to frailty pathways include: 1) processes that are key to maintaining the cell cycle – cell senescence, stem-cell generation, and the pathways that involve p53, the protein tumor suppressor; 2) mechanisms involved in proteostasis (protein regulation); 3) mitochondrial processes involving dysfunction and disease; and 4) intercellular processes that involve inflammation and intercellular communication. The actions of the protein complexes – proteasome, lysosome, and chaperone – are considered in the network of pathways that regulate proteins. Putative pathways of nutrient sensing and signaling include pathways mediated by insulin signaling and mTor signaling, and the role of Ampk.
The downloaded pathways were identified as putative frailty pathways when at least two (2) putative frailty biomarker genes were present among the genes that are involved in the pathway. This is applied from the following reasoning: if multiple genes that can possibly interact to express a phenotype are present in a pathway, then that pathway can be associated with that phenotype.
The outcome of the fGSEA analysis is the normalized enrichment score (NES) of any given pathway, alongside valuable information on the analysis, such as the adjusted p-value, and the leading-edge genes of the pathway. The normalized enrichment score is the enrichment score normalized based on the number of genes in the gene set. It indicates the representation of the pathway genes in the dataset gene list, which is priorly ranked according to the values of the t-score of the genes. A positive NES indicates that the pathway genes are mostly represented at the top of the gene list, while a negative NES indicates that the pathway genes are mostly represented at the bottom of the gene list. The plots of normalized enrichment score (NES) of the different pathways for a given dataset are produced using ggplot271, and the heatmaps across different datasets or different genes are obtained using a complex heatmap R package72.
Analysis on Frailty Biomarker Genes and Pathways
A list of Frailty Biomarkers for humans and mice were generated based on a literature survey. Subsequently, differentially expressed (log2foldchange and adjPvalue <0.5) frailty biomarkers genes were identified from the following two human datasets (OSD-52 and 195) and seven mice datasets (OSD-21, 99, 101, 103, 104, 105, and 202). Subsequently, to generate the venn diagram, R language v4.1.073 was used, through the library ggplot2 v3.4.071. It was extracted from the table of frailty genes, the genes unique to mice and humans and the genes in common for the two (the intersections). After generating the diagram, a list of genes was extracted and manually added to the diagram. To generate the two upset plot graphs (for mice genes and human genes), the R language73 was also used, using the library ggplot2 v3.4.071 and ComplexUpset v1.3.574. Finally, to generate the heatmap, the python 3 v3.10.0 language was used, through the Seaborn v0.12.1 and Clustermap v0.11.12 libraries. Two heatmaps were generated (one for the mice genes and the other for the human genes), relating the genes to the respective OSD datasets. To complement the heatmap analyses, a dendrogram has also been added for each graph.
Inspiration4 (i4) sample collection
Inspiration4 was the world’s first all-civilian mission to orbit Earth. Four civilians, two males and two females, spent three days in low-Earth orbit (LEO) at 585 km above Earth. The mission launched from NASA Kennedy Space Center on September 15th, 2021 and splashed down in the Atlantic Ocean near Cape Canaveral on September 18th, 2021. Several human related experiments were carried out to study the effects of spaceflight on human health and performance in collaboration with SpaceX, the Translational Research Institute for Space Health (TRISH) at Baylor College of Medicine (BCM), and Weill Cornell Medicine. The experiments carried out on the Inspiration4 crew-members were performed in accordance with the relevant guidelines at the principal investigators’ institutions. Moreover, the different study designs and the corresponding methods to collect and analyze the biological samples were approved by the corresponding institutional IRB. All biological data derived from the Inspiration4 mission were collected pre and post flights. For this study, only data from blood samples were used. Pre-flight samples were collected at L-92, L-44, and L-3 days prior to launch to space. Upon return, post-flight samples were collected at R+1, R+45, and R+82 days.
i4 PBMC Single cell sequencing and analysis
Blood samples were collected before (Pre-launch: L-92, L-44, and L-3) and after (Return; R+1, R+45, and R+82) the spaceflight. Chromium Next GEM Single Cell 5’ v2, 10x Genomics was used to generate single cell data from isolated PBMCs. Subpopulations were annotated based on Azimuth human PBMC reference75. Dot plots are generated by the Seurat R package.
JAXA Cell-Free Epigenome (CFE) Study RNA quantification data
Aggregated RNA differential expression data and study protocols were shared through NASA’s Open Science Data Repository with accession number: OSD-53029. Plasma cell-free RNA samples for RNA-seq analysis were derived from blood samples collected from 6 astronauts before, during, and after spaceflight on the ISS. Mean expression values were obtained from normalized read counts of 6 astronauts for each time point. Heatmaps were made for the frailty genes on the normalized values per time point using R package pheatmap version 1.0.12.
Quantification of genes associated with sarcopenia
The curated short gene list of predictors of sarcopenia was obtained by analyzing a population of age matched individuals with and without sarcopenia using the expression superseries GSE111017. This list contains 21 genes that can predict sarcopenia in patients with an accuracy >75%. The method for obtaining the gene list was performed as previously described41. Here, we used GOstat 2022-11-03 on R version 4.2 to find enriched Biological Process and Molecular Function terms (Fig A and C). Using the obtained GO terms we found that some of the Frailty linked genes from JAXA CFE were also part of some of the same processes (Fig B and D). The figure was created using the GOplot package 1.0.2 and modified to display the MAS for each gene. Finally, to determine the relevance of the shared 21 genes between all gene lists (Fig E), we assessed their expression in the mice datasets OSD-21, OSD-52, OSD-91, OSD-195, and OSD-202. Finally, the heatmap was generated as before using python 3 v3.10.0 language, through the Seaborn v0.12.1 and Clustermap v0.11.12 libraries and a dendrogram has also been added for each graph.
Metabolic Flux Simulation Analysis
We ran metabolic flux simulation on all available metabolic reactions for each RNA sequencing sample while applying our custom-made context-specific constraint-based metabolic modeling approach significantly updated from27,76,77. This updated simulation model was constructed based on the RECON3D metabolic model78 whose metabolic reactions were subsetted through CORDA79 and Cobrapy80 with gene-reaction-rule, where enzyme expression levels are linked to their corresponding metabolic reactions. For CORDA, flux confidence parameters ranging from 3 to 1 was decided as; i) 55%, 25%, and 20% for OSD-91; ii) 45%, 40%, and 15% for OSD-127, in order to minimize standard deviation of flux-levels across samples from each group for all groups. Also, several essential pathways such as ‘Glycolysis/Gluconeogenesis’, ‘CoA Synthesis’, ‘CoA Catabolism’, ‘NAD Metabolism’, ‘Fatty Acid Synthesis’, ‘Fatty Acid Oxidation’, and ‘Biomass and Maintenance Functions’ were manually activated for the simulation model stability. Note that ‘Oxidative Phosphorylation’ and ‘Citric Acid Cycle’ were not included in the essential pathways to take into account mitochondrial dysregulation. Our metabolic model optimized for NAD biosynthesis by Gurobi solver as LP (Gurobi Optimization, L. L. C., 2020) whose reference manual can be found in https://www.gurobi.com. NAD biosynthesis capacity through all pathways of human was computed by estimating optimized ‘NAD sink’ reaction flux level where the sink was defined as large pools of metabolites importing metabolites from/to the system. While gene expression levels were applied into the model by each sample, the other parameters were maintained identically for all simulation, where an environment i.e., Jupyter notebook 6.1.5 on Python 3.7.7 was utilized. The simulation reported outcomes as flux levels of all available reactions through the custom-made flux balance analysis (FBA) analyses and the levels were analyzed as grouped variables for comparison between ‘Flight’ and ‘Ground’ groups. Since it is impossible to presume variance or normality between ‘Flight’ and ‘Ground’, non-parametric Van der Waerden (VdW) test was applied to properly compare their flux levels using the R matrixTests package (v. 0.1.9). The comparison is illustrated as heatmaps with row-wise Z-scores on flux levels per each reaction in Figure 7.