Publications

Publications

2023

A fair experimental comparison of neural network architectures for latent representations of multi-omics for drug response prediction

Recent years have seen a surge of novel neural network architectures for the integration of multi-omics data for prediction. Most of the architectures include either encoders alone or encoders and decoders, i.e., autoencoders of various sorts, to transform multi-omics data into latent representations. One important parameter is the depth of integration: the point at which the latent representations are computed or merged, which can be either early, intermediate, or late. The literature on integration methods is growing steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05166-7
2023

Overview, evaluation, and development of group variable selection methods for knowledge integration, Talk, International Biometric Society: Austria-Switzerland Region

Author: Buch, G.


Introduction:
Many datasets have a natural group structure due to high correlations or contextual similarities of variables. Group variable selection methods can account for such structure in the selection process to identify variables that are related to each other and share a common and traceable relationship with the response variable. There is a need for a systematic review of implemented approaches, a fair comparison and evaluation of these techniques, and the development of improved methods for omics datasets.

Methods:
A systematic literature search was conducted to identify group variable selection methods implemented in R. The selection performance of the identified methods was evaluated within simulation studies. A subset of the best performing approaches was used to select predictors in a time to event, regression, and classification task in bootstrap samples from a prospective cohort study
(MyoVasc; NCT04064450). The Sparse Group Exponential Penalty (SGE) was proposed to address the limitations of existing methods. Its performance was compared to established techniques in simulation studies and applied to data from a randomized clinical trial (EmDia; NCT02932436).


Results:
The systematic review revealed 14 methods, which were classified into knowledge-driven and datadriven approaches. The first category includes group-level and bi-level selection methods, while twostep and collinear tolerant approaches constitute the second category. Simulation studies show the advantage of bi-level selection methods over the other approaches. In the real-world scenario, the bilevel
selection methods were also shown to outperform the traditional LASSO, as they were able to treat variables of a group consistently and select correlated variables together. SGE demonstrated superiority in variable and group selection in almost all settings where the number of observations exceeded the number of variables. In cases where there were fewer observations than variables, SGE was the best bi-level selection method when few groups contained predictive signals.

Conclusions:
A variety of methods can incorporate a group structure of predictors in the selection process. The choice of the most appropriate method is dependent on the specific research question and demands careful consideration. Bi-level selection methods, particularly the SGE, appear promising for exploratory analysis of grouped omics data, such as lipidomics data.

2023

A systematic review and evaluation of statistical methods for group variable selection

This review condenses the knowledge on variable selection methods implemented in R and appropriate for datasets with grouped features. The focus is on regularized regressions identified through a systematic review of the literature, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A total of 14 methods are discussed, most of which use penalty terms to perform group variable selection. Depending on how the methods account for the group structure, they can be classified into knowledge and data-driven approaches. The first encompass group-level and bi-level selection methods, while two-step approaches and collinearity-tolerant methods constitute the second category. The identified methods are briefly explained and their performance compared in a simulation study. This comparison demonstrated that group-level selection methods, such as the group minimax concave penalty, are superior to other methods in selecting relevant variable groups but are inferior in identifying important individual variables in scenarios where not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as group bridge. Two-step and collinearity-tolerant approaches such as elastic net and ordered homogeneity pursuit least absolute shrinkage and selection operator are inferior to knowledge-driven methods but provide results without requiring prior knowledge. Possible applications in proteomics are considered, leading to suggestions on which method to use depending on existing prior knowledge and research question.

https://pubmed.ncbi.nlm.nih.gov/36546512/
2023

Effects of empagliflozin on left ventricular diastolic function in addition to usual care in individuals with type 2 diabetes mellitus-results from the randomized, double-blind, placebo-controlled EmDia trial

Background:
The sodium-glucose co-transporter 2 inhibitor empagliflozin improves cardiovascular outcome in patients with type 2 diabetes mellitus (T2DM) and heart failure. Experimental studies suggest a direct cardiac effect of empagliflozin associated with an improvement in left ventricular diastolic function.

Methods:
In the randomized, double-blind, two-armed, placebo-controlled, parallel group trial EmDia, patients with T2DM and elevated left ventricular E/E´ ratio were enrolled and randomized 1:1 to receive empagliflozin 10 mg/day versus placebo. The primary endpoint was the change of left ventricular E/E´ ratio after 12 weeks of intervention.

Results:
A total of 144 patients with T2DM and an elevated left ventricular E/e´ ratio (age 68.9 ± 7.7 years; 14.1% women; E/e´ ratio 9.61[8.24/11.14], left ventricular ejection fraction 58.9% ± 5.6%). After 12 weeks of intervention, empagliflozin resulted in a significant higher decrease in the primary endpoint E/e´ ratio by - 1.18 ([95% confidence interval (CI) - 1.72/- 0.65]; P < 0.0001) compared with placebo. The beneficial effect of empagliflozin was consistent across all subgroups and also occurred in subjects with heart failure and preserved ejection fraction (n = 30). Additional effects of empagliflozin on body weight, HbA1c, uric acid, red blood cell count, hemoglobin, mean corpuscular hemoglobin, and hematocrit were detected (all P < 0.001). Approximately one-third of the reduction in E/e´ by empagliflozin could be explained by the variables examined.

Conclusions:
Empagliflozin improves diastolic function in patients with T2DM and elevated end-diastolic pressure. Since the positive effects were consistent in patients with and without heart failure with preserved ejection fraction, the data add a mechanistic insight for the beneficial cardiovascular effect of empagliflozin.

https://pubmed.ncbi.nlm.nih.gov/36763159/
2023

Rationale and design of the effects of EMpagliflozin on left ventricular DIAstolic function in diabetes (EmDia) study

Abstract:
Background: Data of the EMPA-REG OUTCOME study have demonstrated a beneficial effect of the sodium-glucose cotransporter 2 inhibitor empagliflozin on cardiovascular outcome in patients with type 2 diabetes. The reduction in cardiovascular mortality and hospitalization due to heart failure might be in part explained by the direct effects of empagliflozin on cardiac diastolic function. The EmDia trial investigates the short-term effects of empagliflozin compared to placebo on the left ventricular E/E' ratio as a surrogate of left ventricular diastolic function.

Methods:
EmDia is a single-center, randomized, double-blind, two-arm, placebo-controlled, parallel group study of phase IV. Individuals with diabetes mellitus type 2 (T2DM) are randomized 1:1 to receive empagliflozin 10 mg per day or a placebo for 12 weeks. The main inclusion criteria are diagnosed as T2DM with stable glucose-lowering and/or dietary treatment, elevated HbA1c level (6.5-10.0% if receiving glucose-lowering therapy, or 6.5-9.0% if drug-naïve), and diastolic cardiac dysfunction with left ventricular E/E'≥8. The primary end point is the difference of the change in the E/E' ratio by treatment groups after 12 weeks. Secondary end points include assessment of the effect of empagliflozin on left ventricular systolic function, measures of vascular structure and function, as well as humoral cardiovascular biomarkers (i.e. brain natriuretic peptide, troponin, C-reactive protein). In addition, the multidimensional biodatabase enables explorative analyses of molecular biomarkers to gain insights into possible mechanisms of the effects of empagliflozin on human health in a systems medicine-oriented, multiomics approach.

Conclusion:
By evaluating the short-term effect of empagliflozin with a comprehensive biobanking program, the EmDia Study offers an opportunity to primarily assess the effects on diastolic function but also to examine effects on clinical and molecular cardiovascular traits.

https://pubmed.ncbi.nlm.nih.gov/34939776/
2023

Unsupervised clustering of venous thromboembolism patients by clinical features at presentation identifies novel endotypes that improve prognostic stratification

Background:
Individuals with acute venous thromboembolism (VTE) constitute a heterogeneous group of patients with diverse clinical characteristics and outcome.

Objectives:
To identify endotypes of individuals with acute VTE based on clinical characteristics at presentation through unsupervised cluster analysis and to evaluate their molecular proteomic profile and clinical outcome.

Methods:
Data from 591 individuals from the Genotyping and Molecular phenotyping of Venous thromboembolism (GMP-VTE) project were explored. Hierarchical clustering was applied to 58 variables to define VTE endotypes. Clinical characteristics, three-year incidence of thromboembolic events or death, and acute-phase plasma proteomics were assessed.

Results:
Four endotypes were identified, exhibiting different patterns of clinical characteristics and clinical course. Endotype 1 (n = 300), comprising older individuals with comorbidities, had the highest incidence of thromboembolic events or death (HR [95 % CI]: 3.76 [1.96-7.19]), followed by endotype 4 (n = 127) (HR [95 % CI]: 2.55 [1.26-5.16]), characterised by men with history of VTE and provoking risk factors, and endotype 3 (n = 57) (HR [95 % CI]: 1.57 [0.63-3.87]), composed of young women with provoking risk factors, vs. reference endotype 2 (n = 107). The reference endotype was constituted by individuals diagnosed with PE without comorbidities, who had the lowest incidence of the investigated endpoint. Differentially expressed proteins associated with the endotypes were related to distinct biological processes, supporting differences in molecular pathophysiology. The endotypes had superior prognostic ability compared to existing risk stratifications such as provoked vs unprovoked VTE and D-dimer levels.

Conclusion:
Four endotypes of VTE were identified by unsupervised phenotype-based clustering that diverge in clinical outcome and plasmatic protein signature. This approach might support the future development of individualized treatment in VTE.

https://pubmed.ncbi.nlm.nih.gov/37202285/
2023

Much higher prevalence of keratoconus than announced results of the Gutenberg Health Study (GHS)

Keratoconus appears to be a rare corneal disease with a prevalence previously estimated at 1:2000. The aim of our study was to investigate the prevalence of keratoconus in a large German cohort and to evaluate possible associated factors.

Method:
In the population-based, prospective, monocentric cohort study, Gutenberg Health Study, 12,423 subjects aged 40-80 years were examined at the 5-year follow-up. Subjects underwent a detailed medical history and a general and ophthalmologic examination including Scheimpflug imaging. Keratoconus diagnosis was performed in two steps: all subjects with conspicuous TKC analysis of corneal tomography were included in further grading. Prevalence and 95% confidence intervals were calculated. Logistic regression analysis was carried out to investigate association with age, sex, BMI, thyroid hormone, smoking, diabetes, arterial hypertension, atopy, allergy, steroid use, sleep apnea, asthma, and depression.

Results:
Of 10,419 subjects, 75 eyes of 51 subjects were classified as having keratoconus. The prevalence for keratoconus in the German cohort was 0.49% (1:204; 95% CI: 0.36-0.64%) and was approximately equally distributed across the age decades. No gender predisposition could be demonstrated. Logistic regression showed no association between keratoconus and age, sex, BMI, thyroid hormone, smoking, diabetes, arterial hypertension, atopy, allergy, steroid use, sleep apnea, asthma, and depression in our sample.

Conclusion:
The prevalence of keratoconus disease in a mainly Caucasian population is approximately tenfold higher than previously reported in the literature using latest technologies (Scheimpflug imaging). Contrary to previous assumptions, we did not find associations with sex, existing atopy, thyroid dysfunction, diabetes, smoking, and depression.

https://pubmed.ncbi.nlm.nih.gov/37314521/
2023

Ionmob: A Python Package for Prediction of Peptide Collisional Cross-Section Values

Abstract
Motivation

Including ion mobility separation (IMS) into mass spectrometry proteomics experiments is useful to improve coverage and throughput. Many IMS devices enable linking experimentally derived mobility of an ion to its collisional cross-section (CCS), a highly reproducible physicochemical property dependent on the ion’s mass, charge and conformation in the gas phase. Thus, known peptide ion mobilities can be used to tailor acquisition methods or to refine database search results. The large space of potential peptide sequences, driven also by post-translational modifications (PTMs) of amino acids, motivates an in silico predictor for peptide CCS. Recent studies explored the general performance of varying machine-learning techniques, however, the workflow engineering part was of secondary importance. For the sake of applicability, such a tool should be generic, data driven and offer the possibility to be easily adapted to individual workflows for experimental design and data processing.

Results
We created ionmob, a Python based framework for data preparation, training, and prediction of collisional cross-section values of peptides. It is easily customizable and includes a set of pretrained, ready-to-use models and preprocessing routines for training and inference. Using a set of ≈ 21.000 unique phosphorylated peptides and ≈ 17.000 MHC ligand sequences and charge state pairs, we expand upon the space of peptides that can be integrated into CCS prediction. Lastly, we investigate the applicability of in silico predicted CCS to increase confidence in identified peptides by applying methods of re-scoring and demonstrate that predicted CCS values complement existing predictors for that task.

Availability
The Python package is available at github: https://github.com/theGreatHerrLebert/ionmob.

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad486/7237255?login=true
2023

Lipid-focused cardiovascular disease research: trends and opportunities (Submitted in Journal of Clinical Medicine).

Anyaegbunam, A., P. More, J.F. Fontaine, V. ten Cate, K. Bauer, U. Distler, E. Araldi, L. Bindila, P. Wild and M.A. Andrade-Navarro.

https://www.scirp.org/journal/ijcm/
2023

Four-dimensional trapped ion mobility spectrometry lipidomics for high throughput clinical profiling of human blood samples

Lipidomics encompassing automated lipid extraction, a four-dimensional (4D) feature selection strategy for confident lipid annotation as well as reproducible and cross-validated quantification can expedite clinical profiling. Here, we determine 4D descriptors (mass to charge, retention time, collision cross section, and fragmentation spectra) of 200 lipid standards and 493 lipids from reference plasma via trapped ion mobility mass spectrometry to enable the implementation of stringent criteria for lipid annotation. We use 4D lipidomics to confidently annotate 370 lipids in reference plasma samples and 364 lipids in serum samples, and reproducibly quantify 359 lipids using level-3 internal standards. We show the utility of our 4D lipidomics workflow for high-throughput applications by reliable profiling of intra-individual lipidome phenotypes in plasma, serum, whole blood, venous and finger-prick dried blood spots.

https://www.nature.com/articles/s41467-023-36520-1
2023

DNA methylation and cardiovascular disease in humans: a systematic review and database of known CpG methylation sites

Background:
Cardiovascular disease (CVD) is the leading cause of death worldwide and considered one of the most environmentally driven diseases. The role of DNA methylation in response to the individual exposure for the development and progression of CVD is still poorly understood and a synthesis of the evidence is lacking.

Results:
A systematic review of articles examining measurements of DNA cytosine methylation in CVD was conducted in accordance with PRISMA (preferred reporting items for systematic reviews and meta-analyses) guidelines. The search yielded 5,563 articles from PubMed and CENTRAL databases. From 99 studies with a total of 87,827 individuals eligible for analysis, a database was created combining all CpG-, gene- and study-related information. It contains 74,580 unique CpG sites, of which 1452 CpG sites were mentioned in ≥ 2, and 441 CpG sites in ≥ 3 publications. Two sites were referenced in ≥ 6 publications: cg01656216 (near ZNF438) related to vascular disease and epigenetic age, and cg03636183 (near F2RL3) related to coronary heart disease, myocardial infarction, smoking and air pollution. Of 19,127 mapped genes, 5,807 were reported in ≥ 2 studies. Most frequently reported were TEAD1 (TEA Domain Transcription Factor 1) and PTPRN2 (Protein Tyrosine Phosphatase Receptor Type N2) in association with outcomes ranging from vascular to cardiac disease. Gene set enrichment analysis of 4,532 overlapping genes revealed enrichment for Gene Ontology molecular function “DNA-binding transcription activator activity” (q = 1.65 × 10–11) and biological processes “skeletal system development” (q = 1.89 × 10–23). Gene enrichment demonstrated that general CVD-related terms are shared, while “heart” and “vasculature” specific genes have more disease-specific terms as PR interval for “heart” or platelet distribution width for “vasculature.” STRING analysis revealed significant protein–protein interactions between the products of the differentially methylated genes (p = 0.003) suggesting that dysregulation of the protein interaction network could contribute to CVD. Overlaps with curated gene sets from the Molecular Signatures Database showed enrichment of genes in hemostasis (p = 2.9 × 10–6) and atherosclerosis (p = 4.9 × 10–4).

Conclusion:
This review highlights the current state of knowledge on significant relationship between DNA methylation and CVD in humans. An open-access database has been compiled of reported CpG methylation sites, genes and pathways that may play an important role in this relationship.

https://clinicalepigeneticsjournal.biomedcentral.com/articles/10.1186/s13148-023-01468-y#Ack1
2023

Clinical profile and outcome of isolated pulmonary embolism: a systematic review and meta-analysis

Background:
Isolated pulmonary embolism (PE) appears to be associated with a specific clinical profile and sequelae compared to deep vein thrombosis (DVT)-associated PE. The objective of this study was to identify clinical characteristics that discriminate both phenotypes, and to characterize their differences in clinical outcome.

Methods:
We performed a systematic review and meta-analysis of studies comparing PE phenotypes. A systematic search of the electronic databases PubMed and CENTRAL was conducted, from inception until January 27, 2023. Exclusion criteria were irrelevant content, inability to retrieve the article, language other than English or German, the article comprising a review or case study/series, and inappropriate study design. Data on risk factors, clinical characteristics and clinical endpoints were pooled using random-effects meta-analyses.

Findings:
Fifty studies with 435,768 PE patients were included. In low risk of bias studies, 30% [95% CI 19–42%, I2 = 97%] of PE were isolated. The Factor V Leiden [OR: 0.47, 95% CI 0.37–0.58, I2 = 0%] and prothrombin G20210A mutations [OR: 0.55, 95% CI 0.41–0.75, I2 = 0%] were significantly less prevalent among patients with isolated PE. Female sex [OR: 1.30, 95% CI 1.17–1.45, I2 = 79%], recent invasive surgery [OR: 1.31, 95% CI 1.23–1.41, I2 = 65%], a history of myocardial infarction [OR: 2.07, 95% CI 1.85–2.32, I2 = 0%], left-sided heart failure [OR: 1.70, 95% CI 1.37–2.10, I2 = 76%], peripheral artery disease [OR: 1.36, 95% CI 1.31–1.42, I2 = 0%] and diabetes mellitus [OR: 1.23, 95% CI 1.21–1.25, I2 = 0%] were significantly more frequently represented among isolated PE patients. In a synthesis of clinical outcome data, the risk of recurrent VTE in isolated PE was half that of DVT-associated PE [RR: 0.55, 95% CI 0.44–0.69, I2 = 0%], while the risk of arterial thrombosis was nearly 3-fold higher [RR: 2.93, 95% CI 1.43–6.02, I2 = 0%].

Interpretation:
Our findings suggest that isolated PE appears to be a specific entity that may signal a long-term risk of arterial thrombosis. Randomised controlled trials are necessary to establish whether alternative treatment regimens are beneficial for this patient subgroup.

https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(23)00150-5/fulltext#secsectitle0155
2023

Tinnitus Prevalence in the Adult Population—Results from the Gutenberg Health Study

Abstract
Background and Objectives: Tinnitus is a common symptom in medical practice, although data on its prevalence vary. As the underlying pathophysiological mechanism is still not fully understood, hearing loss is thought to be an important risk factor for the occurrence of tinnitus. The aim of this study was to assess tinnitus prevalence in a large German cohort and to determine its dependence on hearing impairment. Materials and Methods: The Gutenberg Health Study (GHS) is a population-based cohort study and representative for the population of Mainz and its district. Participants were asked whether they suffer from tinnitus and how much they are burdened by it. Extensive audiological examinations using bone- and air-conduction were also performed. Results: 4942 participants (mean age: 61.0, 2550 men and 2392 women) were included in the study. The overall prevalence of tinnitus was 26.1%. Men were affected significantly more often than women. The prevalence of tinnitus increased with age, peaking at ages 75 to 79 years. Considering only annoying tinnitus, the prevalence was 9.8%. Logistic regression showed that participants with severe to complete hearing loss (>65 dB) were more likely to have tinnitus. Conclusions: Tinnitus is a common symptom, and given demographic changes, its prevalence is expected to increase.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10052845/
2023

Effects of empagliflozin on left ventricular diastolic function in addition to usual care in individuals with type 2 diabetes mellitus-results from the randomized, double-blind, placebo-controlled EmDia trial

Abstract:
Background: The sodium-glucose co-transporter 2 inhibitor empagliflozin improves cardiovascular outcome in patients with type 2 diabetes mellitus (T2DM) and heart failure. Experimental studies suggest a direct cardiac effect of empagliflozin associated with an improvement in left ventricular diastolic function.

Methods:
In the randomized, double-blind, two-armed, placebo-controlled, parallel group trial EmDia, patients with T2DM and elevated left ventricular E/E´ ratio were enrolled and randomized 1:1 to receive empagliflozin 10 mg/day versus placebo. The primary endpoint was the change of left ventricular E/E´ ratio after 12 weeks of intervention.

Results:
A total of 144 patients with T2DM and an elevated left ventricular E/e´ ratio (age 68.9 ± 7.7 years; 14.1% women; E/e´ ratio 9.61[8.24/11.14], left ventricular ejection fraction 58.9% ± 5.6%). After 12 weeks of intervention, empagliflozin resulted in a significant higher decrease in the primary endpoint E/e´ ratio by - 1.18 ([95% confidence interval (CI) - 1.72/- 0.65]; P < 0.0001) compared with placebo. The beneficial effect of empagliflozin was consistent across all subgroups and also occurred in subjects with heart failure and preserved ejection fraction (n = 30). Additional effects of empagliflozin on body weight, HbA1c, uric acid, red blood cell count, hemoglobin, mean corpuscular hemoglobin, and hematocrit were detected (all P < 0.001). Approximately one-third of the reduction in E/e´ by empagliflozin could be explained by the variables examined.

Conclusions:
Empagliflozin improves diastolic function in patients with T2DM and elevated end-diastolic pressure. Since the positive effects were consistent in patients with and without heart failure with preserved ejection fraction, the data add a mechanistic insight for the beneficial cardiovascular effect of empagliflozin.

https://pubmed.ncbi.nlm.nih.gov/36763159/
2023

Effect of Empagliflozin Compared to Placebo on the Plasma Lipidome in Patients with Type 2 Diabetes Mellitus – Results from the EmDia Trial, Poster, DGK

Author: Araldi E., Bauer K., Koeck T., Buch G., Baker D. , Lerner R., Tenzer S., Andrade-Navarro M. A. , S. Rapp S. , V. ten Cate V. ten, M. Nuber M., Lackner K. J. , Daiber A. , Münzel T., Wild P. S. ,Bindila L., Prochaska J. H.

Background:
Empagliflozin has recently emerged as an effective treatment to reduce the risk of cardiovascular death and hospitalization in patients with heart failure (HF). However, the molecular changes driven by Empagliflozin treatment and responsible for the amelioration of cardiac parameters are largely unknown. In this work, the effects of Empagliflozin on the lipidome of patients with heart failure and type 2 diabetes mellitus (T2DM) were investigated.

Methods:
Samples were obtained from the EmDia study (NCT02932436, 144 participants). Participants with T2DM and elevated left ventricular end-diastolic pressure (as measured via lateral E/e´ ratio), at baseline were randomized 1:1 to receive Empagliflozin or placebo. Identical (sub)clinical and molecular characterization including biodata banking was performed at baseline and after 12 weeks of intervention. Lipids were quantified by mass spectrometry in a 4D- LC/TIMS-IMS lipidomics approach at both time points. The lipid signatures reflecting the effect of Empagliflozin treatment were identified using sparse group LASSO regularized regression. Lipids that mapped clinical features altered by Empagliflozin treatment were investigated with linear regressions.

Results:
Sparse group LASSO regularized regression selected a signature of 27 lipids (at least 90% sample coverage) across several lipid classes with significantly different abundance in Empagliflozin vs placebo treatment. Approximately 74% of lipids in the Empagliflozin lipid signature (N=20) were associated with at least one clinical feature affected by Empagliflozin. In particular, changes in the Empagliflozin lipid signature significantly explained changes in E/E’, the primary endpoint of the study (estimate -0.45, p-value < 0.01), and changes in secondary endpoints (BMI, HbA1c, hemoglobin, erythrocyte counts, uric acid, eGFR). Within each lipid class (ceramides, sphingomyelins, etc.), virtually all lipids showed a consistent relation to Empagliflozin treatment and a consistent association to primary endpoint (E/E’ ratio), and secondary endpoints (red blood cell count, hemoglobin, BMI, HbA1c, and uric acid).

Conclusions:
The analysis of the lipidome of Empagliflozin or placebo treated participants of the EmDia study provided insights into putative molecular mechanisms of action of Empagliflozin through modulating lipids which might contribute to an improvement of clinical features.

https://dgk.org/kongress_programme/jt2023/aP542.html
2023

A Fair Experimental Comparison of Neural Network Architectures for Latent Representations of Multi-Omics for Drug Response Prediction., Talk and Poster, ISCB

Author: Hauptmann T. and Kramer S.

Recent years have seen a surge of novel neural network architectures for multi-omics integration. One important parameter is the integration depth: the point at which the latent representations are computed or merged, which can be early, intermediate, or late. The literature on integration methods grows steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. We developed a comparison framework that trains multi-omics integration methods under equal conditions. We incorporated four recent deep learning methods, early integration, PCA, and a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a drug response data set with multiple omics data. Our experiments confirmed that early integration has the lowest predictive performance. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT performed best in a cross-validation setting and Omics Stacking best on the external test set. When faced with a new data set, Super.FELT is a good option in the cross-validation setting as well as Omics Stacking in the external test set setting

https://www.iscb.org/ismbeccb2023-programme/tracks/general-computational-biology
2023

midiaPASEF maximizes information content in data-independent acquisition proteomics

Abstract:

Data-independent acquisition (DIA) approaches provide comprehensive records of all detectable pre-cursor and fragment ions. Here we introduce midiaPASEF, a novel DIA scan mode using mobility-specific micro-encoding of overlapping quadrupole windows to optimally cover the ion population in the ion mobility-mass to charge plane. Using overlapping ion mobility-encoded quadrupole windows, midiaPASEF maximizes information content in DIA acquisitions which enables the determination of the precursor m/z of each fragment ion with a precision of less than 2 Th. The Snakemake-based MIDIAID pipeline integrates algorithms for multidimensional peak detection and for machine-learning-based classification of precursor-fragment relationships. The MIDIAID pipeline enables fully automated processing and multidimensional deconvolution of midia-PASEF files and exports highly specific DDA-like MSMS spectra which are suitable for de novo sequencing and can be searched directly with established tools including PEAKS, FragPipe and Mascot. midiaPASEF acquisition identifies over 40 unique peptides per second and provides powerful library-free DIA analyses including phosphopeptidome and immunopeptidome samples.

https://www.biorxiv.org/content/10.1101/2023.01.30.526204v1.full
2023

Disturbed Plasma Lipidomic Profiles in Females with Diffuse Large B-Cell Lymphoma: A Pilot Study

Abstract

Lipidome dysregulation is a hallmark of cancer and inflammation. The global plasma lipidome and sub-lipidome of inflammatory pathways have not been reported in diffuse large B-cell lymphoma (DLBCL). In a pilot study of plasma lipid variation in female DLBCL patients and BMI-matched disease-free controls, we performed targeted lipidomics using LC-MRM to quantify lipid mediators of inflammation and immunity, and those known or hypothesised to be involved in cancer progression: sphingolipids, resolvin D1, arachidonic acid (AA)-derived oxylipins, such as hydroxyeicosatetraenoic acids (HETEs) and dihydroxyeicosatrienoic acids, along with their membrane structural precursors. We report on the role of the eicosanoids in the separation of DLBCL from controls, along with lysophosphatidylinositol LPI 20:4, implying notable changes in lipid metabolic and/or signalling pathways, particularly pertaining to AA lipoxygenase pathway and glycerophospholipid remodelling in the cell membrane. We suggest here the set of S1P, SM 36:1, SM 34:1 and PI 34:1 as DLBCL lipid signatures which could serve as a basis for the prospective validation in larger DLBCL cohorts. Additionally, untargeted lipidomics indicates a substantial change in the overall lipid metabolism in DLBCL. The plasma lipid profiling of DLBCL patients helps to better understand the specific lipid dysregulations and pathways in this cancer.

https://www.mdpi.com/2072-6694/15/14/3653
2023

Circulating microRNAs predict recurrence and death following venous thromboembolism.

Background: Recurrent events frequently occur after venous thromboembolism (VTE) and remain difficult to predict based on established genetic, clinical, and proteomic contributors. The role of circulating microRNAs (miRNAs) has yet to be explored in detail.

Objectives: To identify circulating miRNAs predictive of recurrent VTE or death, and to interpret their mechanistic involvement.

Methods: Data from 181 participants of a cohort study of acute VTE and 302 individuals with a history of VTE from a population-based cohort were investigated. Next-generation sequencing was performed on EDTA plasma samples to detect circulating miRNAs. The endpoint of interest was recurrent VTE or death. Penalized regression was applied to identify an outcome-relevant miRNA signature, and results were validated in the population-based cohort. The involvement of miRNAs in coregulatory networks was assessed using principal component analysis, and the associated clinical and molecular phenotypes were investigated. Mechanistic insights were obtained from target gene and pathway enrichment analyses.

Results: A total of 1950 miRNAs were detected across cohorts after postprocessing. In the discovery cohort, 50 miRNAs were associated with recurrent VTE or death (cross-validated C-index, 0.65). A weighted miRNA score predicted outcome over an 8-year follow-up period (HRSD, 2.39; 95% CI, 1.98-2.88; P < .0001). The independent validation cohort validated 20 miRNAs (ORSD for score, 3.47; 95% CI, 2.37-5.07; P < .0001; cross-validated-area under the curve, 0.61). Principal component analysis revealed 5 miRNA networks with distinct relationships to clinical phenotype and outcome. Mapping of target genes indicated regulation via transcription factors and kinases involved in signaling pathways associated with fibrinolysis.

Conclusion: Circulating miRNAs predicted the risk of recurrence or death after VTE over several years, both in the acute and chronic phases.

https://pubmed.ncbi.nlm.nih.gov/37481073/
2023

Gender Differences and the Impact of Partnership and Children on Quality of Life During the COVID-19 Pandemic

Abstract
Objectives: The COVID-19 pandemic and its protective measures have changed the daily lives of families and may have affected quality of life (QoL). The aim of this study was to analyze gender differences in QoL and to examine individuals living in different partnership and family constellations.

Methods: Data from the Gutenberg COVID-19 cohort study (N = 10,250) with two measurement time points during the pandemic (2020 and 2021) were used. QoL was assessed using the EUROHIS-QOL questionnaire. Descriptive analyses and autoregressive regressions were performed.

Results: Women reported lower QoL than men, and QoL was significantly lower at the second measurement time point in both men and women. Older age, male gender, no migration background, and higher socioeconomic status, as well as partnership and children (especially in men), were protective factors for QoL. Women living with children under 14 and single mothers reported significantly lower QoL.

Conclusion: Partnership and family were protective factors for QoL. However, women with young children and single mothers are vulnerable groups for lower QoL. Support is especially needed for women with young children.

https://pubmed.ncbi.nlm.nih.gov/37284508/
2023

Much higher prevalence of keratoconus than announced results of the Gutenberg Health Study (GHS).

Abstract
Keratoconus appears to be a rare corneal disease with a prevalence previously estimated at 1:2000. The aim of our study was to investigate the prevalence of keratoconus in a large German cohort and to evaluate possible associated factors.

Method:
In the population-based, prospective, monocentric cohort study, Gutenberg Health Study, 12,423 subjects aged 40-80 years were examined at the 5-year follow-up. Subjects underwent a detailed medical history and a general and ophthalmologic examination including Scheimpflug imaging. Keratoconus diagnosis was performed in two steps: all subjects with conspicuous TKC analysis of corneal tomography were included in further grading. Prevalence and 95% confidence intervals were calculated. Logistic regression analysis was carried out to investigate association with age, sex, BMI, thyroid hormone, smoking, diabetes, arterial hypertension, atopy, allergy, steroid use, sleep apnea, asthma, and depression.

Results:
Of 10,419 subjects, 75 eyes of 51 subjects were classified as having keratoconus. The prevalence for keratoconus in the German cohort was 0.49% (1:204; 95% CI: 0.36-0.64%) and was approximately equally distributed across the age decades. No gender predisposition could be demonstrated. Logistic regression showed no association between keratoconus and age, sex, BMI, thyroid hormone, smoking, diabetes, arterial hypertension, atopy, allergy, steroid use, sleep apnea, asthma, and depression in our sample.

Conclusion: The prevalence of keratoconus disease in a mainly Caucasian population is approximately tenfold higher than previously reported in the literature using latest technologies (Scheimpflug imaging). Contrary to previous assumptions, we did not find associations with sex, existing atopy, thyroid dysfunction, diabetes, smoking, and depression.

https://pubmed.ncbi.nlm.nih.gov/37314521/
2023

Plasma protein signatures for high on-treatment platelet reactivity to aspirin and clopidogrel in peripheral artery disease.

Abstract
Background:
A significant proportion of patients with peripheral artery disease (PAD) displays a poor response to aspirin and/or the platelet P2Y12 receptor antagonist clopidogrel. This phenomenon is reflected by high on-treatment platelet reactivity (HTPR) in platelet function assays in vitro and is associated with an increased risk of adverse cardiovascular events.

Objective:
This study aimed to elucidate specific plasma protein signatures associated with HTPR to aspirin and clopidogrel in PAD patients.

Methods and results:
Based on targeted plasma proteomics, 184 proteins from two cardiovascular Olink panels were measured in 105 PAD patients. VerifyNow ASPI- and P2Y12-test values were transformed to a continuous variable representing HTPR as a spectrum instead of cut-off level-defined HTPR. Using the Boruta random forest algorithm, the importance of 3 plasma proteins for HTPR in the aspirin, six in clopidogrel and 10 in the pooled group (clopidogrel or aspirin) was confirmed. Network analysis demonstrated clusters with CD84, SLAMF7, IL1RN and THBD for clopidogrel and with F2R, SELPLG, HAVCR1, THBD, PECAM1, TNFRSF10B, MERTK and ADM for the pooled group. F2R, TNFRSF10B and ADM were higher expressed in Fontaine III patients compared to Fontaine II, suggesting their relation with PAD severity.

Conclusions:
A plasma protein signature, including eight targets involved in proatherogenic dysfunction of blood cell-vasculature interaction, coagulation and cell death, is associated with HTPR (aspirin and/or clopidogrel) in PAD. This may serve as important systems-based determinants of poor platelet responsiveness to aspirin and/or clopidogrel in PAD and other cardiovascular diseases and may contribute to identify novel treatment strategies

https://pubmed.ncbi.nlm.nih.gov/37708596/
2023

Discriminative machine learning for maximal representative subsampling

Abstract
Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS – Soft-MRS – that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.

https://www.nature.com/articles/s41598-023-48177-3
2023

Autoantibodies against the chemokine receptor 3 predict cardiovascular risk

Abstract
Background and Aims

Chronic inflammation and autoimmunity contribute to cardiovascular (CV) disease. Recently, autoantibodies (aAbs) against the CXC-motif-chemokine receptor 3 (CXCR3), a G protein-coupled receptor with a key role in atherosclerosis, have been identified. The role of anti-CXCR3 aAbs for CV risk and disease is unclear.

Methods
Anti-CXCR3 aAbs were quantified by a commercially available enzyme-linked immunosorbent assay in 5000 participants (availability: 97.1%) of the population-based Gutenberg Health Study with extensive clinical phenotyping. Regression analyses were carried out to identify determinants of anti-CXCR3 aAbs and relevance for clinical outcome (i.e. all-cause mortality, cardiac death, heart failure, and major adverse cardiac events comprising incident coronary artery disease, myocardial infarction, and cardiac death). Last, immunization with CXCR3 and passive transfer of aAbs were performed in ApoE(−/−) mice for preclinical validation.

Results
The analysis sample included 4195 individuals (48% female, mean age 55.5 ± 11 years) after exclusion of individuals with autoimmune disease, immunomodulatory medication, acute infection, and history of cancer. Independent of age, sex, renal function, and traditional CV risk factors, increasing concentrations of anti-CXCR3 aAbs translated into higher intima–media thickness, left ventricular mass, and N-terminal pro-B-type natriuretic peptide. Adjusted for age and sex, anti-CXCR3 aAbs above the 75th percentile predicted all-cause death [hazard ratio (HR) (95% confidence interval) 1.25 (1.02, 1.52), P = .029], driven by excess cardiac mortality [HR 2.51 (1.21, 5.22), P = .014]. A trend towards a higher risk for major adverse cardiac events [HR 1.42 (1.0, 2.0), P = .05] along with increased risk of incident heart failure [HR per standard deviation increase of anti-CXCR3 aAbs: 1.26 (1.02, 1.56), P = .03] may contribute to this observation. Targeted proteomics revealed a molecular signature of anti-CXCR3 aAbs reflecting immune cell activation and cytokine–cytokine receptor interactions associated with an ongoing T helper cell 1 response. Finally, ApoE(−/−) mice immunized against CXCR3 displayed increased anti-CXCR3 aAbs and exhibited a higher burden of atherosclerosis compared to non-immunized controls, correlating with concentrations of anti-CXCR3 aAbs in the passive transfer model.

Conclusions
In individuals free of autoimmune disease, anti-CXCR3 aAbs were abundant, related to CV end-organ damage, and predicted all-cause death as well as cardiac morbidity and mortality in conjunction with the acceleration of experimental atherosclerosis.

https://academic.oup.com/eurheartj/article/44/47/4935/7370225
2022

Bi-level variable selection with the sparse group penalty framework, Talk, 67th GMDS Annual Conference / 13th TMF Annual Congress

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Introduction:
Bi-level selection methods account for grouped predictors in the selection process to identify relevant variable groups and highlight their predictive members. This property is particularly helpful when analyzing omics datasets, as such data is often characterized by a natural group structure due to high correlations or contextual similarities of features. One of the best known bi-level selection approaches combines the absolute shrinkage and selection operator (LASSO)1 with the group LASSO2 in an additive manner: sparse group LASSO (SGL)3.
A generalization of SGL that enables combinations of other shrinkage terms is desirable, as the LASSO components have some shortcomings that can be addressed by using alternative penalties.

Methods:
To enable the combination of various shrinkage conditions as in SGL, a framework for sparse group penalties (SGP) is proposed. Within this framework, we have combined the minimax concave penalty (MCP)4, the smoothly clipped absolute deviation (SCAD)5, the exponential penalty (EP)6 and their group versions analogous to SGL. The emerging methods are the sparse group MCP (SGM), the sparse group SCAD (SGS) and the sparse group EP (SGE). A local linear approximated coordinate descent7 was implemented in C++ to solve their objective functions for linear and logistic regressions. Simulated datasets were used to determine optimal values for the tuning parameter α, a mixing parameter that determines the influence of the group information in the selection process. The performance of the new methods in variable and group selection was compared with other bi-level selection methods (group exponential LASSO6, composite MCP7 and group Bridge8) in simulation studies. Finally, the novel approaches were applied to the problem of detecting regulated lipids in an interventional trial (EmDia study, ClinicalTrials.gov Identifier: NCT02932436).

Results:
Low values for α such as 1/10 lead to a group-level emphasized selection of the SGPs, while higher values such as 1/2 lead to better results at the variable-level. Setting α to 1/3 provides a balanced performance at both levels. Using this value, SGE was superior for variable and group selection in almost all cases where the number of variables was less than that of observations. In settings where there were more variables than observations, SGE was the best approach when few groups were relevant, SGM when a moderate number of groups were predictive, and SGS when many groups contained predictive signals. Classical SGL was consistently inferior to the other bi-level selection methods in regard to variable and group selection, but its predictive performance was strong in some situations. In the applied example, the results of the SGPs differ especially in their sparsity on the group and variable level. SGE generated the most parsimonious model followed by SGM and SGS, while SGL created the largest model.

Conclusions:
Replacing the LASSO components in SGL with other shrinkage terms provides improvements in multiple performance criteria, making methods such as SGM, SGS, and SGE preferable over SGL. The advantages of these novel techniques are underscored by their ability to achieve better performance than alternative bi-level selection approaches, which the original SGL fails to do.

2022

Sparse group penalties for bi-level variable selection, Poster, MSCoreSys Status Meeting 2022

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

"Introduction
An important characteristic of many omics data sets is their intrinsic group structure due to high correlations or contextual similarities of features. Bi-level selection methods account for such groupings in the selection process to identify relevant variable groups and highlight their predictive members. One of the best known approaches of this kind combines the absolute shrinkage and selection operator (LASSO) with the group LASSO in an additive manner: sparse group LASSO (SGL). Since LASSO has some shortcomings that can be addressed by using alternative penalties, a generalization of SGL that enables combinations of other shrinkage terms is desirable.
Methods
We propose a framework for sparse group penalties (SGP) that allows the combination of different SGL-style shrinkage conditions. Within this framework, we have combined the minimax concave penalty (MCP), the smoothly clipped absolute deviation (SCAD), the exponential penalty (EP) and their group versions analogous to SGL: sparse group MCP (SGM), sparse group SCAD (SGS) and sparse group EP (SGE). Corresponding objective functions were solved using the locally approximated coordinate descent, which we implemented in C++. The performance of the new methods in variable and group selection was compared with other bi-level selection methods (group exponential LASSO, composite MCP and group Bridge) in simulation studies.
Results
SGE demonstrated superiority for variable and group selection in almost all settings where the number of observations exceeded the number of variables. In cases where there were fewer observations than variables, SGE was the best method when few groups contained predictive signals, SGM when a moderate amount of groups were relevant, and SGS when many groups were predictive. The classical SGL was always inferior to the other bi-level selection techniques in terms of variable and group selection, but its predictive performance was convincing in some situations.
Conclusions
Replacing the LASSO components in SGL with other penalties offers advantages with respect to several performance criteria, making approaches such as SGM, SGS, and SGE advisable over SGL. The benefits of these novel techniques are underlined by their ability to achieve better results than alternative bi-level selection methods, which SGL fails to do."

2022

Proteolizard - A python-based framework for access, processing and visualization of timsTOF raw data, Poster, Status meeting

Author: "David Teschner, Konstantin Bob, Jennifer Leclaire, Thomas Kemmer, Mateusz K.Łącki, Michał Startek, David Gomez-Zepeda, Bertil Schmidt, Stefan Tenzer, Andreas Hildebrandt

Abstract:
Valuable insight into high-dimensional driving factors of diseases like heart failure are to be gained by analysing samples using high-throughput omics technologies. The newly introduced timsTOF mass spectrometer is a notable device implementing such technology. Here, peak capacity and acquisition speed are of the greatest experimental interest but at the same time increase the dimensionality of generated datasets through the addition of ion mobility measurements.
It is crucial for the processing of the underlying data not to be constrained by its increased complexity and volume while retaining the ability to be flexibly integrated into existing workflows. We therefore present Proteolizard: a collection of software tools integrating high-performance C++ code with user friendly Python bindings. They enable seamless integration of timsTOF raw data into the Python-centric stack of machine learning libraries such as TensorFlow, PyTorch or scikit-learn. This allows for an effective utilization of multi-core systems or accelerators such as GPUs and implementation of new algorithms based on e.g., deep learning.

2022

Protective behavior and SARS-CoV-2 infection risk in the population - Results from the Gutenberg COVID-19 study

During the SARS-CoV-2 pandemic, preventive measures like physical distancing, wearing face masks, and hand hygiene have been widely applied to mitigate viral transmission. Beyond increasing vaccination coverage, preventive measures remain urgently needed. The aim of the present project was to assess the effect of protective behavior on SARS-CoV-2 infection risk in the population.

https://pubmed.ncbi.nlm.nih.gov/36316662/
2022

Cardiovascular profiling in the diabetic continuum: results from the population-based Gutenberg Health Study.

The study sample comprised 15,010 individuals aged 35-74 years of the population-based Gutenberg Health Study. Subjects were classified into euglycaemia, prediabetes and T2DM according to clinical and metabolic (HbA1c) information. The prevalence of prediabetes was 9.5% (n = 1415) and of T2DM 8.9% (n = 1316). Prediabetes and T2DM showed a significantly increased prevalence ratio (PR) for age, obesity, active smoking, dyslipidemia, and arterial hypertension compared to euglycaemia (for all, P < 0.0001). In a robust Poisson regression analysis, prediabetes was established as an independent predictor of clinically-prevalent cardiovascular disease (PRprediabetes 1.20, 95% CI 1.07-1.35, P = 0.002) and represented as a risk factor for asymptomatic cardiovascular organ damage independent of traditional risk factors (PR 1.04, 95% CI 1.01-1.08, P = 0.025). Prediabetes was associated with a 1.5-fold increased 10-year risk for cardiovascular disease compared to euglycaemia. In Cox regression analysis, prediabetes (HR 2.10, 95% CI 1.76-2.51, P < 0.0001) and T2DM (HR 4.28, 95% CI 3.73-4.92, P < 0.0001) indicated for an increased risk of death. After adjustment for age, sex and traditional cardiovascular risk factors, only T2DM (HR 1.89, 95% CI 1.63-2.20, P < 0.0001) remained independently associated with increased all-cause mortality.

https://link.springer.com/article/10.1007/s00392-021-01879-y
2022

A systematic review and evaluation of statistical methods for group variable selection

Abstract
This review condenses the knowledge on variable selection methods implemented in R and appropriate for datasets with grouped features. The focus is on regularized regressions identified through a systematic review of the literature, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A total of 14 methods are discussed, most of which use penalty terms to perform group variable selection. Depending on how the methods account for the group structure, they can be classified into knowledge and data-driven approaches. The first encompass group-level and bi-level selection methods, while two-step approaches and collinearity-tolerant methods constitute the second category. The identified methods are briefly explained and their performance compared in a simulation study. This comparison demonstrated that group-level selection methods, such as the group minimax concave penalty, are superior to other methods in selecting relevant variable groups but are inferior in identifying important individual variables in scenarios where not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as group bridge. Two-step and collinearity-tolerant approaches such as elastic net and ordered homogeneity pursuit least absolute shrinkage and selection operator are inferior to knowledge-driven methods but provide results without requiring prior knowledge. Possible applications in proteomics are considered, leading to suggestions on which method to use depending on existing prior knowledge and research question.

https://onlinelibrary.wiley.com/doi/full/10.1002/sim.9620
2022

Subtype-specific plasma signatures of platelet-related protein releasate in acute pulmonary embolism

There is evidence that plasma protein profiles differ in the two subtypes of pulmonary embolism (PE), isolated PE (iPE) and deep vein thrombosis (DVT)-associated PE (DVT-PE), in the acute phase. The aim of this study was to determine specific plasma signatures for proteins related to platelets in acute iPE and DVT-PE compared to isolated DVT (iDVT).

https://pubmed.ncbi.nlm.nih.gov/36274391/
2022

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04833-5
2022

Mass Spectrometry to investigate the Heart Failure Syndrome within the DIASyM Core, Poster, Status meeting, Heidelberg

Author: Thierry Schmidlin, Elisa Araldi, Laura Bindila, Miguel A. Andrade-Navarro, Andreas Hildebrandt, Stefan Kramer, Philipp S. Wild, Stefan Tenzer

Heart failure (HF) affects 15 million people in Europe and is associated with a substantial public health burden and poor survival prognosis. Its molecular mechanisms are largely unknown, requiring a systems-medicine approach utilizing machine learning methods to decipher molecular pathophysiologic sub-phenotypes based on a comprehensive multi-OMICs phenomapping. The DIASyM research core develops, optimizes and standardizes ion-mobility enhanced mass spectrometric workflows for multi-OMICs-based biomaterial characterization on the proteome, lipidome and metabolome level using primarily data-independent acquisition workflows. We generate unbiased mass spectrometric sample records on multi-OMICS level, which provide a rich resource for data mining and modelling. DIASyM develops the software and methods for data processing, interpretation and high-level integration, which will be independently validated in other research cores to ensure cross-center compatibility and reproducibility. This poster describes the activities of the DIASyM groups developing novel mass spectrometry-based approaches to unravel the molecular mechanisms of the heart failure syndrome.

2022

ROSS PLATFORM COMPARISON OF PEPTIDE ION MOBILITY SEPARATION, Poster, ASMS, Minneapolis.

Author: Hein D., Distler U., Gomez-Zepeda D., Łącki M.K., Tenzer S.

Introduction
Currently, several types of ion mobility separation devices are integrated into commercial instrument platforms which are used routinely in proteomic analyses, including field asymmetric ion mobility separation (FAIMS), trapped ion mobility separation (TIMS) and traveling wave (TWIMS). We analyzed the separation capabilities of the three ion mobility separation devices and investigated correlations between ion mobility separation of tryptic peptides derived from complex proteomic samples on three different instrument platforms, including Waters Synapt G2-S, Bruker TimsTOF Pro 2 and Thermo Orbitrap Exploris 480 with FAIMS Pro.

Mass spectrometric analysis
Tryptic digests (200 ng) of HeLa cells were analyzed by LC-IMS-MS on three different platforms. On the TimsTOF Pro 2, samples were injected on a nanoElute system using a 20 cm Aurora column and separated by a 47 min gradient, analyzed in PASEF-DDA mode and processed with MaxQuant. For the Synapt G2-S HDMS, samples were injected on a nanoAcquity UPLC system and separated using a 2-hour method on a HSS-T3 column. Samples were measured in triplicates in UDMSE mode and data were analyzed with PLGS and IsoQuant. To investigate the separation capabilities of the FAIMS Pro device, 31 FAIMS voltages were covered in individual injections each on a Ultimate3000-Exploris 480 platform.

Methods
The whole data analysis was performed in R. For a first overview of the data Fig. 1 sketches the dependence between peptide appearance in successive FAIMS voltages. In the next step we wanted to clarify if we can see this property in the inverse ion mobility (observed on other platforms) as well. With the distribution plot of the inverse ion mobility per FAIMS voltage (Fig. 2) we show that the FAIMS voltage correlates with the inverse ion mobility. In Fig.3 we considered additional dependence of other well-known factor correlated with inverse ion mobilities - the mass to charge ratios. A natural question to ask, if the two can simultaneously correlate with the appearance of peptides at a given FAIMS voltage. Fig. 3 showed that this was the case.

Conclusion
By comparing observed ion mobilities and respective optimal FAIMS compensation voltages for more than 40.000 tryptic peptides across three different types of ion mobility separation devices, we analyzed their distinct separation properties for peptides in the gas phase. All three IMS separators apply slightly different separation conditions in the gas phase, which are reflected in systematic differences in the observed ion mobilities. Our analysis reveals higher correlation between traveling wave and trapped ion mobility separators, while lower correlations are observed between optimal compensation voltages in a FAIMS-Pro device and ion mobility values reported by both TIMS and TWIMS-based instruments. Our correlation analysis revealed positive correlation between the optimal FAIMS voltages for each peptide and their respective ion mobility measurements on other platforms. In detail, the correlation between the Orbitrap and the Synapt G2-S HDMS was around R=0.86 and between the Orbitrap and the TimsTOF Pro-2 R=0.70. Highest correlation was observed between ion mobility measurements of TimsTOF Pro-2 and Synapt G2-S HDMS (R=0.95).

2022

High-Throughput human plasma proteome analysis using FAIMS pro interface, Poster, IMSC, Maastricht

Author: Hein D., Distler U., Kumm E., Łącki M., Gomez-Zepeda D., Tenzer S.


Due to the composition and associated properties of plasma, mass spectrometry-based proteomic analysis of human plasma is challenging. Compared to other tissues, the otherall number of proteins is lower and their concentrations vary considerably. For these reasons, high-throughput analysis methods of plasma samples need to be constantly monitored and adapted in order to reach maximal
proteome coverage.
To achieve a high number of detected peptides and proteins in high-throughput analysis we decided to use the FAIMS Pro Interface. The FAIMS Pro Interface uses an asymmetric electric field on the cylindrical electrode to separate ions by their ion mobility, including only ions with a specific ion mobility. For optimal results, different voltages of the electrode and mass over charge filter combinations were tested.
DIA-NN2 and MaxQuant were used for raw data analysis. DIA-NN employs neural networks and interference correction and is thus particularly well suited for high-throughput set up, allowing fast analysis and deep coverage of proteomes. MaxQuant is a quantitative proteomic software which is able to analyze large datasets and uses serval labeling techniques.
The number of identified peptides and proteins in human plasma obtained with FAIMS Pro Interface differs by an order of magnitude compared to standard methods. For specific voltages, our data shows a significant increase of identified peptides and proteins compared to the standard method.
We demonstrate that a meticulous selection of two or three FAIMS voltages results in significantly higher numbers of uniquely identified proteins and peptides. If paired with additional mass to charge filtering, the number of identified proteins and peptides drops slightly, while boosting the quality of findings.

2022

Identifying predictive signals in grouped datasets with the novel Sparse Group Smoothly Clipped Absolute Deviation, Talk, 17th Annual Conference of the DGEpi

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Introduction:
Many datasets exhibit a group structure, like lipid markers that are interrelated based on chemical and biochemical principles. Considering such groupings in a model identification task improves selection, but existing methods are not sufficient for the need, especially when the dataset contains more variables than observations. We propose the Sparse Group Smoothly Clipped Absolute Deviation (SGS), to improve selections in such settings.

Methods:
A simulation study was conducted to optimize α, the tuning parameter of SGS. To evaluate their performance in model selection, SGS, Sparse Group LASSO (SGL), composite Minimax Concave Penalty (cMCP), and Group Exponential LASSO (GEL) were compared in artificial datasets. The approaches were then applied to data from a randomized clinical, phase-IV trial in individuals with diabetes mellitus to identify lipids and lipid groups regulated by empagliflozin intake (EmDia; NCT02932436). Correct and permuted groupings were provided to investigate the impact of groupings on the selection process of the approaches.

Results:
SGS with tuned parameters (α value set to 1/3, λ determined with 10-fold cross-validation) was superior to other model selection techniques in many of the simulated settings. Especially when many variables were related to the response, SGS convinced in variable and group selection performance. When applied to the use case, SGS identified more lipids (selected features, N=16) compared to cMCP (2) and GEL (1) when grouping was correct and obtained similar results when grouping was incorrect (2, 2, 1). SGL created the largest model in both situations (36, 6).

Conclusions:
SGS incorporates groupings stronger than cMCP and GEL in the selection process without the risk of selecting suspicious signals in settings with incorrect group formations. Since these findings are based on simulation studies and a real-world use case, SGS can be recommended for selection tasks with prior knowledge of groupings and datasets with more features than observations.

2021

Interpretability of bi-level variable selection methods, Poster, 16th Annual Conference of the DGEpi

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.


Background: Many datasets possess a natural group structure due to high correlations or contextual similarities of variables. Incorporating this information in a selection process enables the identification of relevant variable groups and also relevant members of those groups. It has been argued that incorporating such prior knowledge can improve the interpretability of the selection output, but this hypothesis has not yet been investigated for bi-level selection methods. A comparison of bi-level selection methods with the gold standard LASSO for variable selection can provide insights into the interpretability of the selection results.
Methods: Composite Minimax Concave Penalty (cMCP), Group Exponential LASSO (GEL), Sparse Group LASSO (SGL), and LASSO as reference method were used to select predictors in a time-to-event (survival), regression (linear trait) and classification (binary trait) task. For this purpose, three group formations based on prior knowledge, correlation structure, or random assignment were provided. Selections were done in 1.000 bootstrap samples derived from a cohort of 1.001 patients (MyoVasc-study; NCT04064450). Interpretability of the generated models was assessed by selection accuracy, group consistency, and collinearity tolerance.
Results: Bi-level selection methods outperformed LASSO in all three dimensions of interpretability, for most selection tasks considered. Here, cMCP demonstrated superiority in selection accuracy in most applications, while GEL and SGL were superior in group consistency and collinearity tolerance. The performance of bi-level selection methods was maintained even when group formation was inaccurate.
Conclusions: If there is interest in interpreting the selection results and information on relationships between variables is available, the use of bi-level selection methods seems to be recommended over LASSO. This is due to their ability to treat variables of a group consistently and the tendency to select correlated variables together.

2021

A systematic review and evaluation of methods for group variable selection, Talk, 66th GMDS Annual Conference / 12th TMF Annual Congress

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Introduction:
Many datasets have a natural group structure due to high correlations or contextual similarities of variables, like in proteomics. Group variable selection methods are able to account for such structure in the selection process to identify variables that are related to each other and share a common and traceable relationship with the response variable.
To date, only selective comparisons of group variable selection methods are available, but a review is needed that systematically identifies and evaluates the wide range of existing approaches.

Methods:
A structured literature search was conducted, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses recommendations, to identify group variable selection methods that were sufficiently programmed and suitable for studying Gaussian, binomial, or time-to-event data types.
The selection performance of the identified methods was evaluated based on the correlation between true and generated models within simulation studies defined by a varying number of variables associated with a Gaussian distributed response variable.

Results:
The systematic literature review revealed 14 methods for selecting variable groups, which can be classified into knowledge-driven and data-driven approaches. The first category includes group-level and bi-level selection methods that use pre-defined group formations, while two-step and collinear tolerant approaches constitute the second category, which use the correlation structure of the data to select related variables. Group-level and two-step approaches select all or none of the variables in a group, while bi-level and collinear tolerant methods propose sparsity even within groups of variables.
Simulation studies demonstrate that group-level selection methods, such as Group MCP, are superior to other methods in selecting relevant variable groups, but are inferior in identifying important individual variables once not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as Group Bridge. Two-step and collinearity tolerant approaches such as the Elastic Net and Ordered Homogeneity Pursuit LASSO are inferior to knowledge-driven methods but provide comparable results without prior knowledge.

Discussion:
Methods in all four categories are suitable for analyzing data with variables that have a natural group structure. The choice of the appropriate method depends on the objective and the availability of prior information. If the interest is to identify related variables associated with a response variable, group-level selection, and two-step methods are recommended, while bi-level selection and collinear tolerant methods are appropriate, when identifying variables associated with a given response from a structure of related variables is of interest.
Since the results of the simulation study indicate that inclusion of prior information improves the selection process, such information should be used when available. A potential application for analyzing omics datasets could use the information on coexpression or biological function to group variables.

Conclusions:
A variety of methods can incorporate a natural group structure of predictors in selection. This improves selection, especially when the group structure is known and does not need to be estimated via the correlation structure. Since the identified methods are specialized for different situations, the choice of an appropriate method strongly depends on the research question.

2021

Variable selection using regularized regressions, Workshop, MSCoreSys Summer School – “Mass spectrometry meets systems medicine”

Author: Buch, G.

The course will cover statistical learning methods that extend classical regression with regularization terms to identify predictive variables for a dependent variable in omics data. The underlying theory of the methods will be discussed, as well as practical aspects for their application, such as adjusting for confounders and accounting for interrelated variables. Hands on training will be integrated.

2021

Interpretability of bi-level variable selection methods, Poster, Status Meeting of the MSCoreSys Initiative

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Many datasets possess a natural group structure due to high correlations or contextual similarities of variables. Incorporating this information in a selection process enables the identification of relevant variable groups and also relevant members of those groups. It has been argued that incorporating such prior knowledge can improve the interpretability of the selection output, but this hypothesis has not yet been investigated for bi-level selection methods. A comparison of bi-level selection methods with the gold standard LASSO for variable selection can provide insights into the interpretability of the selection results.
Methods: Composite Minimax Concave Penalty (cMCP), Group Exponential LASSO (GEL), Sparse Group LASSO (SGL), and LASSO as reference method were used to select predictors in a time-to-event (survival), regression (linear trait) and classification (binary trait) task. For this purpose, three group formations based on prior knowledge, correlation structure, or random assignment were provided. Selections were done in 1.000 bootstrap samples derived from a cohort of 1.001 patients (MyoVasc-study; NCT04064450). Interpretability of the generated models was assessed by selection accuracy, group consistency, and collinearity tolerance.
Results: Bi-level selection methods outperformed LASSO in all three dimensions of interpretability, for most selection tasks considered. Here, cMCP demonstrated superiority in selection accuracy in most applications, while GEL and SGL were superior in group consistency and collinearity tolerance. The performance of bi-level selection methods was maintained even when group formation was inaccurate.
Conclusions: If there is interest in interpreting the selection results and information on relationships between variables is available, the use of bi-level selection methods seems to be recommended over LASSO. This is due to their ability to treat variables of a group consistently and the tendency to select correlated variables together.

2021

LipiDisease: associate lipids to diseases using literature mining

Lipids exhibit an essential role in cellular assembly and signaling. Dysregulation of these functions has been linked with many complications including obesity, diabetes, metabolic disorders, cancer and more. Investigating lipid profiles in such conditions can provide insights into cellular functions and possible interventions. Hence the field of lipidomics is expanding in recent years. Even though the role of individual lipids in diseases has been investigated, there is no resource to perform disease enrichment analysis considering the cumulative association of a lipid set. To address this, we have implemented the LipiDisease web server. The tool analyzes millions of records from the PubMed biomedical literature database discussing lipids and diseases, predicts their association and ranks them according to false discovery rates generated by random simulations. The tool takes into account 4270 diseases and 4798 lipids. Since the tool extracts the information from PubMed records, the number of diseases and lipids will be expanded over time as the biomedical literature grows.

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab559/6343440
2021

MaxDIA enables library-based and library-free data-independent acquisition proteomics

MaxDIA is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment. Using spectral libraries, MaxDIA achieves deep proteome coverage with substantially better coefficients of variation in protein quantification than other software. MaxDIA is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries. This is the foundation of discovery DIA-hypothesis-free analysis of DIA samples without library and with reliable FDR control. MaxDIA performs three- or four-dimensional feature detection of fragment data, and scoring of matches is augmented by machine learning on the features of an identification. MaxDIA's bootstrap DIA workflow performs multiple rounds of matching with increasing quality of recalibration and stringency of matching to the library. Combining MaxDIA with two new technologies-BoxCar acquisition and trapped ion mobility spectrometry-both lead to deep and accurate proteome quantification.

https://www.nature.com/articles/s41587-021-00968-7
2021

Right atrium size in the general population

Echocardiography is the most common routine cardiac imaging method. Nevertheless, only few data about sex-specific reference limits for right atrium (RA) dimensions are available. Transthoracic echocardiographic RA measurements were studied in 9511 participants of the Gutenberg-Health-Study. A reference sample of 1942 cardiovascular healthy subjects without chronic obstructive pulmonary disease was defined. We assessed RA dimensions and sex-specific reference limits were defined using the 95th percentile of the reference sample. Results showed sex-specific differences with larger RA dimensions in men that were attenuated by standardization for body-height. RA-volume was 20.2 ml/m in women (5th-95th: 12.7-30.4 ml/m) and 26.1 ml/m in men (5th-95th: 16.0-40.5 ml/m). Multivariable regressions identified body-mass-index (BMI), coronary artery disease (CAD), chronic heart failure (CHF) and atrial fibrillation (AF) as independent key correlates of RA-volume in both sexes. All-cause mortality after median follow-up-period of 10.7 (9.81/11.6) years was higher in individuals who had RA volume/height outside the 95% reference limit (HR 1.70 [95%CI 1.29-2.23], P = 0.00014)). Based on a large community-based sample, we present sex-specific reference-values for RA dimensions normalized for height. RA-volume varies with BMI, CHF, CAD and AF in both sexes. Individuals with RA-volume outside the reference limit had a 1.7-fold higher mortality than those within reference limits.

https://pubmed.ncbi.nlm.nih.gov/34795353/
2021

OpenTIMS, TimsPy, and TimsR: Open and Easy Access to timsTOF Raw Data

The Bruker timsTOF Pro is an instrument that couples trapped ion mobility spectrometry (TIMS) to high-resolution time-of-flight (TOF) mass spectrometry (MS). For proteomics, lipidomics, and metabolomics applications, the instrument is typically interfaced with a liquid chromatography (LC) system. The resulting LC-TIMS-MS data sets are, in general, several gigabytes in size and are stored in the proprietary Bruker Tims data format (TDF). The raw data can be accessed using proprietary binaries in C, C++, and Python on Windows and Linux operating systems. Here we introduce a suite of computer programs for data accession, including OpenTIMS, TimsR, and TimsPy. OpenTIMS is a C++ library capable of reading Bruker TDF files. It opens up Bruker's proprietary codebase. TimsPy and TimsR build on top of OpenTIMS, enabling swift and user-friendly data access to the raw data with Python and R. Both programs are available under a GPL3 license on all major platforms, extending the possibility to interact with timsTOF data to macOS. Additionally, OpenTIMS is capable of translating Bruker data into HDF5 files that can be easily analyzed from Python with the vaex module. OpenTIMS and TimsPy therefore provide easy and quick access to Bruker timsTOF raw data.

https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00962
2021

Pyproteolizard - A Python interface for high-performance processing of timsTOF raw data, Poster, Status Meeting

Author: David Teschner, Konstantin Bob, Jennifer Leclaire, Thomas Kemmer, Mateusz K.Łącki, Michał Startek, Bertil Schmidt, Stefan Tenzer, Andreas Hildebrandt

Abstract:
Valuable insight into high-dimensional driving factors of diseases like heart failure are to be gained by analyzing samples using high-throughput omics technologies. The newly introduced timsTOF mass spectrometer is a notable device implementing such technology. Here, peak capacity and acquisition speed are of the greatest experimental interest but at the same time increase the dimensionality of generated datasets through the addition of ion mobility measurements.
It is crucial for the processing of the underlying data not to be constrained by its increased complexity and volume while retaining the ability to be flexibly integrated into existing workflows. We therefore present (Py)proteolizard: a mix of high performance processing tools written in C++ together with user-friendly Python bindings. It allows for a seamless integration of timsTOF data with algorithms from the locality-sensitive hashing and deep-learning family. Furthermore, it enables a fast visual inspection of data slices such as mass spectrometry (MS|) features.