Publications

2025

Multicenter Longitudinal Quality Assessment of MS-Based Proteomics in Plasma and Serum

ABSTRACT: Advancing MS-based proteomics toward clinical applications evolves around developing standardized start-to-finish and fit-for-purpose workflows for clinical specimens. Steps along the method design involve the determination and optimization of several bioanalytical parameters such as selectivity, sensitivity, accuracy, and precision. In a joint effort, eight proteomics laboratories belonging to the MSCoreSys initiative including the CLINSPECT-M, MSTARS, DIASyM, and SMART-CARE consortia performed a longitudinal round-robin study to assess the analysis performance of plasma and serum as clinically relevant samples. A variety of LC-MS/MS setups including mass spectrometer models from ThermoFisher and Bruker as well as LC systems from ThermoFisher, Evosep, and Waters Corporation were used in this study. As key performance indicators, sensitivity, precision, and reproducibility were monitored over time. Protein identifications range between 300 and 400 IDs across different state-of-the-art MS instruments, with timsTOF Pro, Orbitrap Exploris 480, and Q Exactive HF-X being among the top performers. Overall, 71 proteins are reproducibly detectable in all setups in both serum and plasma samples, and 22 of these proteins are FDA-approved biomarkers, which are reproducibly quantified (CV < 20% with label-free quantification). In total, the round-robin study highlights a promising baseline for bringing MS-based measurements of serum and plasma samples closer to clinical utility.

https://pmc.ncbi.nlm.nih.gov/articles/PMC11894660/

2025

Heart rate variability in patients with cardiovascular diseases

AB S T R A C T
Heart rate variability (HRV) has been reported to predict overall mortality and the risk of cardiovascular disease
events in patients, including those with heart failure. However, inconsistent methods of recording and analyzing
HRV parameters, along with a lack of randomized data substantiating its clinical efficacy and potential to guide
treatment decisions for improved patient outcomes, have limited its use in clinical settings. With the advance-
ments in technologies such as artificial intelligence and machine learning, and emergence of ablation procedures
that can alter autonomic function, this article re-explores HRV assessment methods, their potential for clinical
application, the issues encountered in using them in clinical research, and potential approaches to studying HRV
in the future (Graphical Abstract)

https://www.sciencedirect.com/science/article/pii/S0033062025000817?via%3Dihub

2025

Estimated annual healthcare costs after acute pulmonary embolism: results from a prospective multicentre cohort study

Aims Patients surviving acute pulmonary embolism (PE) necessitate long-term treatment and follow-up. We estimated, the
chronic economic impact of PE on the German healthcare system.

Methods and results
We calculated the direct cost of illness during the first year after discharge for the index PE, analysing data from a
multicentre prospective cohort study in Germany. Main and accompanying readmission diagnoses were used to calculate
DRG-based hospital reimbursements; anticoagulation costs were estimated from the exact treatment duration and each
drug’s unique national identifier; and outpatient post-PE care costs from guidelines-recommended algorithms and national
reimbursement catalogues. Of 1017 patients enrolled at 17 centres, 958 (94%) completed ≥3-month follow-up; of those,
24% were rehospitalized (0.34 [95% CI 0.30–0.39] readmissions per PE survivor). Age, coronary artery, pulmonary and
kidney disease, diabetes, and (in the sensitivity analysis of 837 patients with complete 12-month follow-up) cancer, but
not recurrent PE, were independent cost predictors by hurdle gamma regression accounting for zero readmissions. The
estimated rehospitalization cost was €1138 (95% CI 896–1420) per patient. Anticoagulation duration was 329 (IQR
142–365) days, with estimated average per-patient costs of €1050 (median 972; IQR 458–1197); costs of scheduled
ambulatory follow-up visits amounted to €181. Total estimated direct per-patient costs during the first year after PE
ranged from €2369 (primary analysis) to €2542 (sensitivity analysis).

Conclusion
By estimating per-patient costs and identifying cost drivers of post-PE care, our study may inform decisions concerning
implementation and reimbursement of follow-up programmes aiming at improved cardiovascular prevention.

https://pubmed.ncbi.nlm.nih.gov/38950900/

2025

Risk tools for predicting long-term sequelae based on symptom profiles after known and undetected SARS-CoV-2 infections in the population

Abstract
The aim was to determine the profile of long-term symptoms after known and undetected SARS-CoV-2 infections and to generate tools for risk and diagnostic assessment of Post-COVID syndrome (PCS). In the population-based Gutenberg COVID-19 Study (N = 10,250), sequential, systematic screening for SARS-CoV-2 was performed in 2020/2021. Individuals received a standardized interview on newly occurred or worsened symptoms since the infection or the pandemic. Robust Poisson regression models were fit to compare the frequency of symptoms between groups. Two scores were developed using machine learning techniques and prospectively validated in an independent cohort. Among n = 942 individuals, prevalence of long-term symptoms was 36.4% among individuals with known SARS-CoV-2 infection, 25.0% in those unknowingly infected, and 28.1% among the controls. Individuals with known infection more often reported smell (Prevalence ratio [PR] = 13.66 [95% confidence interval 4.99;37.41]) and taste disturbances (PR = 5.57 [2.62;11.81]), forgetfulness (PR = 2.88 [1.55;5.35]), concentration difficulties (PR = 2.83 [1.55;5.16], trouble with balance (PR = 2.74 [1.18;6.35]), and dyspnea (PR = 2.22 [1.18;4.19]) than controls. The risk score for predicting long-term sequelae based on symptoms during the acute infection had a cross-validated AUC of 0.74 and 0.72 when applied in an independent cohort (N = 6,570). The diagnostic score providing a probability of the presence of PCS had a cross-validated AUC of 0.66 and of 0.64 in the validation cohort (N = 3,176). Individuals with and without SARS-COV-2 infection reported persistent symptoms, but symptoms attributable to PCS were identified. The data-driven scores may help guide further diagnostic decisions in the initial management of PCS

https://pubmed.ncbi.nlm.nih.gov/40387979/

2025

Personalized app-based coaching for improving physical activity in heart failure with preserved ejection fraction patients compared with standard care: rationale and design of the MyoMobile Study

Abstract
Aims: Patients suffering from heart failure with preserved ejection fraction (HFpEF) often exhibit a sedentary lifestyle, contributing to the worsening of their condition. Although there is an inverse relationship between physical activity (PA) and adverse cardiovascular outcomes, the implementation of Class Ia PA guidelines is hindered by low participation in supervised and structured programmes, which are not suitable for a diverse population of HFpEF patients. The MyoMobile study has been designed to assess the effect of a 12-week, app-based coaching programme on promoting PA in patients with HFpEF.

Methods and results: The MyoMobile study was a single-centre, randomized, controlled three-armed parallel group clinical trial with prospective data collection to investigate the effect of a personalized mobile app health intervention compared with usual care on PA levels in patients with HFpEF. Major inclusion criteria were age ≥ 45 years, a diagnosis of HFpEF, LVEF > 40%, and current HF symptoms (NYHA Class I-III). Major exclusion criteria included acute decompensated HF, non-ambulatory status, recent acute coronary syndrome or cardiac surgery, alternative diagnoses for HF symptoms, active cancer treatment, and physical or medical conditions affecting mobility. Participants were recruited from hospitals, general practices, and practices specialized in internal medicine and cardiology in the Rhine-Main area, Germany. Participants underwent an objective 7-day PA measurement with a 3D accelerometer (Dynaport, McRoberts) at screening and after the 12-week intervention period. Following the screening, eligible participants were randomized into one of three groups: standard care (PA consulting), the intervention arm with app-based PA tracking and coaching, or the intervention arm with tracking but without coaching. The primary efficacy endpoint was the change in average daily step count between the average step count at baseline and at the end of the intervention, comparing standard care to a 12-week app-based PA coaching intervention.

Conclusion: Exercise intolerance is a primary symptom in HFpEF patients, leading to poor quality of life and HF-related adverse outcomes due to physical inactivity. The MyoMobile study was designed to investigate the use of app-based coaching to improve PA in patients with HFpEF with a personalized, home-based intervention, focusing on simple step counts for flexibility and ease of integration into daily routines.

https://pubmed.ncbi.nlm.nih.gov/40110212/

2025

Gender-specific changes in vision-related quality of life over time - results from the population-based Gutenberg Health Study

Abstract
Purpose: To investigate potential gender- and age-specific changes over time in vision-related quality of life (VRQoL) on a population-based level. Further, factors associated with changes in VRQoL will be explored.

Methods: The Gutenberg Health Study is a population-based, prospective, observational, single-center cohort study in Germany. VRQoL was quantified at baseline and 5-year follow-up using the visual function scale (VFS) and socio-emotional scale (SES-VRQoL). VFS and SES-VRQoL are calculated using the "National Eye Institute 25-Item Visual Functioning Questionnaire" (NEI-VFQ-25). Both scales range from 0 to 100, 0 corresponds to the sum that would be achieved if a participant had answered all items with the worst performance, and 100 corresponds to the sum of all items answered with the best possible performance. Distance-corrected visual acuity was measured in both eyes. Univariable and multivariable linear regression analyses were conducted to identify ophthalmic and sociodemographic predictors of VRQoL.

Results: A total of 10,152 participants (mean age 54.2 years; 49.2% female) were included in the analysis. The mean visual functioning decreased from 89.6 (IQR: 81.3, 95.1) at baseline to 85.9 (IQR: 79.2, 92.6) at 5-year follow-up in the VFS (p < 0.001). Participants' socio-emotional well-being remained the same from baseline to 5-year follow-up in the SES-VRQoL. In multivariable linear regression analysis, older age (0.03, p = 0.002) and female gender (-1.00, p < 0.001) were associated with a VFS change. Higher baseline socioeconomic status was associated with a slightly positive increase in VFS (0.07, p = 0.001). Deterioration of visual acuity in the better and worse-seeing eye was associated with negative VFS change over 5 years (better-seeing eye: -5.41, p < 0.001, worse-seeing eye: -7.35, p < 0.001). Baseline socioeconomic status was associated with SES-VRQoL change (0.06, p < 0.001). The negative change in visual acuity showed an association with negative SES-VRQoL in the better (-4.15, p < 0.001) and worse-seeing eye (-3.75, p < 0.001). Stratification of the regression models by age and gender showed greater reductions in VFS scores with visual acuity changes in participants aged 65 years or older and a more pronounced decrease in female participants over 5 years.

Conclusions: This study demonstrated an association between visual acuity change and change in VRQoL over 5 years, with a greater decrease in female participants and participants aged 65 years or older. The better-seeing eye and the worse-seeing eye both had an impact on changes in VRQoL.

https://pubmed.ncbi.nlm.nih.gov/39934353/

2025

Rustims: Ein Open-Source-Framework für die schnelle Entwicklung und Verarbeitung von timsTOF-datenabhängigen Erfassungsdaten

Abstract
Mass spectrometry is essential for analyzing and quantifying biological samples. The timsTOF platform is a prominent commercial tool for this purpose, particularly in bottom-up acquisition scenarios. The additional ion mobility dimension requires more complex data processing, yet most current software solutions for timsTOF raw data are proprietary or closed-source, limiting integration into custom workflows. We introduce rustims, a framework implementing a flexible toolbox designed for processing timsTOF raw data, currently focusing on data-dependent acquisition (DDA-PASEF). The framework employs a dual-language approach, combining efficient, multithreaded Rust code with an easy-to-use Python interface. This allows for implementations that are fast, intuitive, and easy to integrate. With imspy as its main Python scripting interface and sagepy for Sage search engine bindings, rustims enables fast, integrable, and intuitive processing. We demonstrate its capabilities with a pipeline for DDA-PASEF data including rescoring and integration of third-party tools like the Prosit intensity predictor and an extended ion mobility model. This pipeline supports tryptic proteomics and nontryptic immunopeptidomics data, with benchmark comparisons to FragPipe and PEAKS. Rustims is available on GitHub under the MIT license, with installation packages for multiple platforms on PyPi and all analysis scripts accessible via Zenodo.

https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00966

2025

Extended coverage of human serum glycosphingolipidome by 4D-RP-LC TIMS-PASEF unravels association with Parkinson’s disease

Abstract
Glycosphingolipids (GSLs) are important targets in immune, infectious, lysosomal storage diseases, cancer, and neurodegenerative diseases. Circulatory GSLs profiling in clinical samples is restricted by the lack of mid- and high-throughput analytical methods and deep coverage of long-chain sialylated glycosphingolipidome. We present a 4-dimensional (4D)-glycosphingolipidomics platform for routine glycosphingolipidome profiling encompassing: extraction and fractionation of sialylated GSLs with 3 to 15 monosaccharides, neutral GSLs and sulfatides; µL-flow reversed-phase LC-TIMS-PASEF MS analysis; semi-quantification strategy adapted for fractionated glycosphingolipidome, and referential CCS, RT, and m/z values for GSLs annotation. 4D-glycosphingolipidomics of human serum reveals a high structural heterogeneity, amounting to 376 GSLs: 159 GSLs of ganglio- and neolacto-series, 145 neutral GSLs and 72 sulfatides. Here we demonstrate the platform’s utility for clinical profiling of Parkinson’s disease (PD) sera. 41 neolacto- and ganglio-species discriminate PD patients from controls and 14 GSLs differentiate sex subgroups, laying the foundation for further functional GSL studies with PD.

https://www.nature.com/articles/s41467-025-59755-6#Ack1

2025

A Map of the Lipid–Metabolite–Protein Network to Aid Multi-Omics Integration

Abstract
The integration of multi-omics data offers transformative potential for elucidating complex molecular mechanisms underlying biological processes and diseases. In this study, we developed a lipid–metabolite–protein network that combines a protein–protein interaction network and enzymatic and genetic interactions of proteins with metabolites and lipids to provide a unified framework for multi-omics integration. Using hyperbolic embedding, the network visualizes connections across omics layers, accessible through a user-friendly Shiny R (version 1.10.0) software package. This framework ranks molecules across omics layers based on functional proximity, enabling intuitive exploration. Application in a cardiovascular disease (CVD) case study identified lipids and metabolites associated with CVD-related proteins. The analysis confirmed known associations, like cholesterol esters and sphingomyelin, and highlighted potential novel biomarkers, such as 4-imidazoleacetate and indoleacetaldehyde. Furthermore, we used the network to analyze empagliflozin’s temporal effects on lipid metabolism. Functional enrichment analysis of proteins associated with lipid signatures revealed dynamic shifts in biological processes, with early effects impacting phospholipid metabolism and long-term effects affecting sphingolipid biosynthesis. Our framework offers a versatile tool for hypothesis generation, functional analysis, and biomarker discovery. By bridging molecular layers, this approach advances our understanding of disease mechanisms and therapeutic effects, with broad applications in computational biology and precision medicine.

https://www.mdpi.com/2218-273X/15/4/484

2025

MR1-ligand cross-linking identifies vitamin B6 metabolites as TCR-reactive antigens

Summary
Major histocompatibility complex class I-related protein 1 (MR1) plays a central role in the immune recognition of infected cells and can mediate T cell detection of cancer. Knowledge of the nature of the ligands presented by MR1 is still sparse and has been limited by a lack of efficient approaches for MR1 ligand discovery. Here, we present a cross-linking strategy to investigate Schiff base-bound MR1 ligands. Our methodology employs reductive amination to stabilize the labile Schiff base bond between MR1 and its ligand, allowing for the detection of ligands as covalent MR1 adducts by mass spectrometry-based proteomics. We apply our approach to identifying vitamin B6 vitamers pyridoxal and pyridoxal 5′-phosphate (PLP) as MR1 ligands and show that both compounds are recognized by T cells expressing either A-F7, a mucosal-associated invariant T (MAIT) cell T cell receptor (TCR), or MC.7.G5, an MR1-restricted TCR reported to recognize cancer cells, highlighting them as immunogenic MR1 ligands.

https://www.cell.com/cell-reports-methods/fulltext/S2667-2375(25)00156-0

2025

Effect of Empagliflozin on the plasma lipidome in patients with type 2 diabetes mellitus: results from the EmDia clinical trial

Abstract
Background
Sodium-glucose cotransporter 2 (SGLT2) inhibitors, such as Empagliflozin, are antidiabetic drugs that reduce glucose levels and have emerged as a promising therapy for patients with heart failure (HF), although the exact molecular mechanisms underlying their cardioprotective effects remain to be fully elucidated. The EmDia study, a randomized, double-blind trial conducted at the University Medical Center of Mainz, has confirmed the beneficial effects of Empagliflozin in HF patients after both one and twelve weeks of treatment. In this work, we aimed to assess whether changes in lipid profiles driven by Empagliflozin use in HF patients in the EmDia trial could assist in gaining a better understanding of its cardioprotective mechanisms.
Methods
Lipid analysis of blood plasma from 144 patients from the EmDia trial was conducted using 4D-LC-TIMS/IMS lipidomics. Lipid signatures after treatment for one and twelve weeks, respectively, were obtained with sparse group LASSO regularized regression models. Linear regression models were employed to highlight associations between significantly changed clinical traits and lipids.
Results
The lipid signatures after one week of treatment consisted of 37 lipids from the lipid groups lysophosphatidylcholine (LPC), phosphatidylcholine (PC), phosphatidylethanolamine (PE), sphingomyelin (SM), and triacylglycerol (TG). After twelve weeks, the signature comprised 24 lipids from the same five lipid groups, along with Ceramides (Cer). Three of five lipids altered at both time points showed consistent directional trends. Empagliflozin treatment led to significant alterations in the lipidome, including increases in both beneficial lipids, such as LPCs, and potentially harmful species, notably ceramides, which have been implicated in lipotoxicity and cardiovascular risk.
Conclusion
This study identified distinct lipid signatures associated with Empagliflozin treatment after both one and twelve weeks, respectively, with five lipids overlapping between signatures and three with consistent directions, revealing that some of the beneficial effects of Empagliflozin could be through lipid modulation. Notably, Empagliflozin-modulated lipids associated with changes in clinical traits and lipid-specific profiles among clinical subgroups were observed. However, challenges remain in establishing direct associations between individual lipids and clinical outcomes. Future research integrating lipidomics data with other omics datasets could provide a more comprehensive understanding of the identified lipid signatures and their potential roles in health and diseases.

https://cardiab.biomedcentral.com/articles/10.1186/s12933-025-02916-0

2025

TIMS2Rescore: A Data Dependent Acquisition-Parallel Accumulation and Serial Fragmentation-Optimized Data-Driven Rescoring Pipeline Based on MS2Rescore

Abstract

The high throughput analysis of proteins with mass spectrometry (MS) is highly valuable for understanding human biology, discovering disease biomarkers, identifying therapeutic targets, and exploring pathogen interactions. To achieve these goals, specialized proteomics subfields, including plasma proteomics, immunopeptidomics, and metaproteomics, must tackle specific analytical challenges, such as an increased identification ambiguity compared to routine proteomics experiments. Technical advancements in MS instrumentation can mitigate these issues by acquiring more discerning information at higher sensitivity levels. This is exemplified by the incorporation of ion mobility and parallel accumulation and serial fragmentation (PASEF) technologies in timsTOF instruments. In addition, AI-based bioinformatics solutions can help overcome ambiguity issues by integrating more data into the identification workflow. Here, we introduce TIMS2Rescore, a data-driven rescoring workflow optimized for DDA-PASEF data from timsTOF instruments. This platform includes new timsTOF MS2PIP spectrum prediction models and IM2Deep, a new deep learning-based peptide ion mobility predictor. Furthermore, to fully streamline data throughput, TIMS2Rescore directly accepts Bruker raw mass spectrometry data and search results from ProteoScape and many other search engines, including Sage and PEAKS. We showcase TIMS2Rescore performance on plasma proteomics, immunopeptidomics (HLA class I and II), and metaproteomics data sets. TIMS2Rescore is open-source and freely available at:
https://github.com/compomics/tims2rescore.

https://pubs.acs.org/doi/full/10.1021/acs.jproteome.4c00609

2025

Benchmarking Software for DDA-PASEF Immunopeptidomics

Abstract

Mass spectrometry (MS) is the method of choice for high-throughput identification of immunopeptides, which are generated by intracellular proteases, unlike proteomics peptides that are typically derived from trypsin-digested proteins. Therefore, the searching space for immunopeptides is not limited by proteolytic specificity, requiring more sophisticated software algorithms to handle the increased complexity. Despite the widespread use of MS in immunopeptidomics, there is a lack of systematic evaluation of data processing software, making it challenging to identify the optimal solution. In this study, we provide a comprehensive benchmarking of the most widespread/used data-dependent acquisition (DDA)-based software platforms for immunopeptidomics: MaxQuant, FragPipe, PEAKS and MHCquant. The evaluation was conducted using data obtained from the JY cell line using the Thunder-DDA-PASEF method. We assessed each software’s ability to identify immunopeptides and compared their identification confidence. Additionally, we examined potential biases in the results and tested the impact of database size on immunopeptide identification efficiency. Our findings demonstrate that all software platforms successfully identify the most prominent subset of immunopeptides with 1% false discovery rate (FDR) control, achieving medium to high identification confidence correlations. The largest number of immunopeptides were identified using the commercial PEAKS software, which is closely followed by FragPipe, making it a viable non-commercial alternative. However, we observed that larger database sizes negatively impacted the performance of some software platforms more than others. These results provide valuable insights into the strengths and limitations of current MS data processing tools for immunopeptidomics, supporting the immunopeptidomics/MS community in determining the right choice of software.

https://www.biorxiv.org/content/10.1101/2025.05.28.656277v2.abstract

2025

Echocardiographic Measures Read by Artificial Intelligence Enable Accurate and Rapid Prediction of the Worsening of Heart Failure

Abstract
Background and Aims
Automatic echocardiographic measurements using artificial intelligence have shown promising results; however, they have not been compared with manual measurements regarding heart failure progression and algorithm runtime.
Methods
Data came from the prospective HF study MyoVasc (NCT04064450), which involved a highly standardised 5-hour examination, including comprehensive echocardiography, at a dedicated study centre between January 2013 and April 2018. Worsening of HF was a primary composite endpoint, recorded by structured follow-up, death certificates, and medical records. The automated assessment was performed using EchoDL, eight 3D convolutional neural networks (CNNs) trained to predict clinical parameters.
Results
Manual and automatic left ventricular ejection fraction, E/E’-ratio and left ventricular mass demonstrated a good intraclass correlation coefficient (LVEF: 0.75 [95% confidence interval (CI) 0.75-0.77], E/E’-ratio: 0.59 [CI 0.56-0.61], LVM: 0.64 [CI 0.62-0.66]). After a median follow-up of 3.8 years (IQR 2.1-5.0), 470 patients experienced worsening of HF. In multivariable Cox analysis, comparison of manually and automatically assessed LVEF, E/E’-ratio and LVM demonstrated risk estimates slightly in favour of the CNNs. Direct comparison of C-indices showed significantly better model performance for automatically determined LVEF (0.71 vs. 0.73, p=0.038) and E/E’-ratio (0.64 vs. 0.66, p=0.013) and a trend for LVM (0.66 vs. 0.68, p=0.063). Echo-DL required an average of 1053.4ms (95% CI 1050.7-1056.0) to analyse a four-second-long echocardiogram.
Conclusions
Automated analysis of echocardiograms using 3D CNNs was comparable to manual measurements in predicting HF-specific outcomes. Echo-DL offers potential time savings and improved risk prediction in clinical settings, allowing integration into echocardiographic hardware.

https://academic.oup.com/ehjdh/advance-article/doi/10.1093/ehjdh/ztaf120/8286249?login=true

2025

Role of Heart Rate Recovery in Chronic Heart Failure: Results From the MyoVasc Study

Abstract
Background: Cardiac autonomic dysfunction is associated with heart failure (HF). Reduced heart rate recovery (HRR) indicates impaired parasympathetic reactivation after physical activity. Heart rate recovery 60 seconds after peak effort (HRR60) is linked to autonomic dysfunction, but data on its relevance across HF phenotypes are scarce. This study aimed to identify clinical determinants of HRR60 in an HF cohort and assess its relationship with clinical outcomes.

Methods: Data from the MyoVasc study (NCT04064450; N=3289) were analyzed. Participants underwent standardized clinical phenotyping including cardiopulmonary exercise testing. HRR60 was defined as the heart rate decline 60 seconds after exercise termination. Clinical determinants of HRR60 were evaluated using multivariate regression, whereas Cox regression analyses assessed all-cause death and worsening of HF.

Results: The analysis sample comprised 1289 individuals (median age, 66.0 [interquartile range {IQR}, 58.0-73.0] years, 30.4% women) ranging from stage B to stage C/D according to the universal definition of HF. Age, sex, smoking, obesity, peripheral artery disease, and chronic kidney disease were identified as determinants of HRR60. HRR60 showed a strong association with all-cause death (hazard ratio [HR]HRR60 [10 bpm], 1.56 [95% CI, 1.32-1.85]; P<0.0001) and worsening of HF (HRHRR60 [10 bpm], 1.36 [95% CI, 1.10-1.69]; P=0.0052) independent of age, sex, and clinical profile. Sensitivity analysis showed a stronger association with worsening HF in HF with preserved left ventricular ejection fraction (Pinteraction=0.027).

Conclusions: HRR60 was associated with clinical outcome in chronic HF. Because it showed a stronger association with outcomes in HF with preserved ejection fraction, future research should consider phenotype-specific differences.

https://pubmed.ncbi.nlm.nih.gov/40371587/

2025

Gender-specific changes in vision-related quality of life over time - results from the population-based Gutenberg Health Study

Abstract

Purpose: To investigate potential gender- and age-specific changes over time in vision-related quality of life (VRQoL) on a population-based level. Further, factors associated with changes in VRQoL will be explored.

Methods:
The Gutenberg Health Study is a population-based, prospective, observational, single-center cohort study in Germany. VRQoL was quantified at baseline and 5-year follow-up using the visual function scale (VFS) and socio-emotional scale (SES-VRQoL). VFS and SES-VRQoL are calculated using the "National Eye Institute 25-Item Visual Functioning Questionnaire" (NEI-VFQ-25). Both scales range from 0 to 100, 0 corresponds to the sum that would be achieved if a participant had answered all items with the worst performance, and 100 corresponds to the sum of all items answered with the best possible performance. Distance-corrected visual acuity was measured in both eyes. Univariable and multivariable linear regression analyses were conducted to identify ophthalmic and sociodemographic predictors of VRQoL.

Results:
A total of 10,152 participants (mean age 54.2 years; 49.2% female) were included in the analysis. The mean visual functioning decreased from 89.6 (IQR: 81.3, 95.1) at baseline to 85.9 (IQR: 79.2, 92.6) at 5-year follow-up in the VFS (p < 0.001). Participants' socio-emotional well-being remained the same from baseline to 5-year follow-up in the SES-VRQoL. In multivariable linear regression analysis, older age (0.03, p = 0.002) and female gender (-1.00, p < 0.001) were associated with a VFS change. Higher baseline socioeconomic status was associated with a slightly positive increase in VFS (0.07, p = 0.001). Deterioration of visual acuity in the better and worse-seeing eye was associated with negative VFS change over 5 years (better-seeing eye: -5.41, p < 0.001, worse-seeing eye: -7.35, p < 0.001). Baseline socioeconomic status was associated with SES-VRQoL change (0.06, p < 0.001). The negative change in visual acuity showed an association with negative SES-VRQoL in the better (-4.15, p < 0.001) and worse-seeing eye (-3.75, p < 0.001). Stratification of the regression models by age and gender showed greater reductions in VFS scores with visual acuity changes in participants aged 65 years or older and a more pronounced decrease in female participants over 5 years.

https://pubmed.ncbi.nlm.nih.gov/39934353/

2024

Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS2Rescore with MS2PIP timsTOF fragmentation prediction model

Abstract
Human leukocyte antigen (HLA) class I peptide ligands (HLAIps) are key targets for developing vaccines and immunotherapies against infectious pathogens or cancer cells. Identifying HLAIps is challenging due to their high diversity, low abundance, and patient individuality. Here, we develop a highly sensitive method for identifying HLAIps using liquid chromatography-ion mobility-tandem mass spectrometry (LC-IMS-MS/MS). In addition, we train a timsTOF-specific peak intensity MS2PIP model for tryptic and non-tryptic peptides and implement it in MS2Rescore (v3) together with the CCS predictor from ionmob. The optimized method, Thunder-DDA-PASEF, semi-selectively fragments singly and multiply charged HLAIps based on their IMS and m/z. Moreover, the method employs the high sensitivity mode and extended IMS resolution with fewer MS/MS frames (300 ms TIMS ramp, 3 MS/MS frames), doubling the coverage of immunopeptidomics analyses, compared to the proteomics-tailored DDA-PASEF (100 ms TIMS ramp, 10 MS/MS frames). Additionally, rescoring boosts the HLAIps identification by 41.7% to 33%, resulting in 5738 HLAIps from as little as one million JY cell equivalents, and 14,516 HLAIps from 20 million. This enables in-depth profiling of HLAIps from diverse human cell lines and human plasma. Finally, profiling JY and Raji cells transfected to express the SARS-CoV-2 spike protein results in 16 spike HLAIps, thirteen of which have been reported to elicit immune responses in human patients.

https://pubmed.ncbi.nlm.nih.gov/38480730/

2024

Deep Immune Phenotyping of Heart Failure to Identify molecular signatures of Metabolically-Induced dysregulations in Peripheral Blood Mononuclear Cells reveals perturbations in cytotoxic activity

Author: Maximilian Nuber

Poster
90. DGK-Jahrestagung,
Deep immune phenotyping of heart failure to identify molecular signatures of metabolically-induced dysregulations in peripheral blood mononuclear cells reveals perturbations in cytotoxic activity
Aim of the study is to find relevant genes and cell types in heart failure and metabolic dysregulation. For that purpose PBMC samples from 64 individuals with Heart Failure with preserved Ejection Fraction (HFpEF), Heart Failure with reduced Ejection Fraction (HFrEF) and controls were submitted to single cell RNA sequencing. The individuals from which samples have been taken were further stratified by metabolic dysregulation, defined by glycated hemoglobin and insulin resistance. The deliberate selection of heterogeneous samples with a range of disease severities enables the analysis of the onset and progression of metabolic dysregulation in heart failure.
More specifically, the study aims to detect inflammatory signatures in circulating immune cells in heart failure individuals, caused by metabolically dysregulated processes in the visceral fat tissue.
Current methods in single cell RNA sequencing analysis rely on pseudo-bulking. In this process, the integer gene expression of cells within a cell type is summed together simulating bulk RNA sequencing of that cell type, and respective methods for differential expression are applied. This approach diminishes expression differences present in small subsets of cells in heterogeneous datasets as our own. Therefore, high-performance generalized linear mixed models for differential expression analysis were constructed, taking into account the hierarchical data structure of single cell RNA sequencing data. Thereby a small but relevant number of genes being differentially expressed in heart failure and metabolic dysregulation simultaneously was identified. The identified genes are known for their role in inflammatory processes and the priming of other cell types in processes of the immune system.

2024

Interpretability of bi-level variable selection methods

Abstract
Variable selection is usually performed to increase interpretability, as sparser models are easier to understand than full models. However, a focus on sparsity is not always suitable, for example, when features are related due to contextual similarities or high correlations. Here, it may be more appropriate to identify groups and their predictive members, a task that can be accomplished with bi-level selection procedures. To investigate whether such techniques lead to increased interpretability, group exponential LASSO (GEL), sparse group LASSO (SGL), composite minimax concave penalty (cMCP), and least absolute shrinkage, and selection operator (LASSO) as reference methods were used to select predictors in time-to-event, regression, and classification tasks in bootstrap samples from a cohort of 1001 patients. Different groupings based on prior knowledge, correlation structure, and random assignment were compared in terms of selection relevance, group consistency, and collinearity tolerance. The results show that bi-level selection methods are superior to LASSO in all criteria. The cMCP demonstrated superiority in selection relevance, while SGL was convincing in group consistency. An all-round capacity was achieved by GEL: the approach jointly selected correlated and content-related predictors while maintaining high selection relevance. This method seems recommendable when variables are grouped, and interpretation is of primary interest

https://onlinelibrary.wiley.com/doi/full/10.1002/bimj.202300063

2024

Unbiased clustering and molecular characterisation of novel metabolic phenotypes in a heart failure cohort

Author: Ekaterina Esenkova

Talk
Introduction:
Heart Failure is a heterogeneous clinical syndrome characterized by the inability to pump enough blood and thus provide enough oxygen to the human body. The current classification of heart failure is based on the left ventricular ejection faction (LVEF). Although the classification clearly defines heart failure subtypes and stages, and clinical guidelines are available for their management, it is preferable to prevent the disorder or take a control over it from its onset. Evidence suggests that the molecular connection of human metabolism and heart failure exists and it could be exploited for patient stratification and as therapeutic target. Having this in mind, the investigation of the subgroups, beyond the established metabolic disorders can be a potential strategy to find out novel molecular biomarkers of onset and progression of heart failure.
Methods:
The current research attempts to identify clinically relevant metabolic subgroups of individuals with the help of a similarity network-based approach, based on targeted proteomics data of a heart failure cohort (the MyoVasc study - NCT04064450). With Olink targeted proteomics technology, six protein abundance panels (535 distinct proteins) from 3063 subjects (1103 females) were obtained. Using nonlinear combination methods, the created networks were integrated into a single similarity network and then used for clustering with unbiased network-based approaches.
Results: The clustering have distinguished the two subgroups with similar profiles for metabolic disorders, age and heart failure proportions, but with a difference in worsening of heart failure and all-cause death outcome. The clinical profiles comparison have shown the difference in atrial fibrillation. The differential protein abundance profiles were then compared between those two clusters to derive novel insights into clinical subtypes, and have highlighted a significant contribution proteins involved in lipids metabolism to the unbiased subtyping rationale.
Conclusion:
The study attempted to fill in the gaps of previous patient stratification, by having a large sample size with a broader variety of variables. These results will pave the way to precision medicine, and will allow indirectly integrate and/or adjust the analysis by clinically relevant social parameters such as lifestyle, parents’ anamnesis, demographics that are often left out due to their lower contribution at the pre-processing steps in comparison to more objective laboratory parameters. Besides that, further analysis will be performed with the inclusion of targeted lipidomics data. Previous studies have already clustered the heart failure population, based on various clinical and biochemical parameters. However, the clustering was guided by well-established metabolic parameters and thus, the results can be highly biased. Current project allows unbiased, data-driven approach that holds more potential to identify novel and clinically relevant metabolic subgroups.

2024

Recent advances in cardiovascular disease research driven by metabolomics technologies in the context of systems biology

Author: Boyao Zhang & Thierry Schmidlin

Abstract

Traditional risk factors and biomarkers of cardiovascular diseases (CVD) have been mainly discovered through clinical observations. Nevertheless, there is still a gap in knowledge in more sophisticated CVD risk factor stratification and more reliable treatment outcome prediction, highlighting the need for a more comprehensive understanding of disease mechanisms at the molecular level. This need has been addressed by integrating information derived from multiomics studies, which provides systematic insights into the different layers of the central dogma in molecular biology. With the advancement of technologies such as NMR and UPLC-MS, metabolomics have become a powerhouse in pharmaceutical and clinical research for high-throughput, robust, quantitative characterisation of metabolic profiles in various types of biospecimens. In this review, we highlight the versatile value of metabolomics spanning from targeted and untargeted identification of novel biomarkers and biochemical pathways, to tracing drug pharmacokinetics and drug-drug interactions for more personalised medication in CVD research:

https://www.nature.com/articles/s44324-024-00028-z

2024

Sparse Group Penalties for bi-level variable selection

Abstract
Many data sets exhibit a natural group structure due to contextual similarities or high correlations of variables, such as lipid markers that are interrelated based on biochemical principles. Knowledge of such groupings can be used through bi-level selection methods to identify relevant feature groups and highlight their predictive members. One of the best known approaches of this kind combines
the classical Least Absolute Shrinkage and Selection Operator (LASSO) with the Group LASSO, resulting in the Sparse Group LASSO. We propose the Sparse Group Penalty (SGP) framework, which allows for a flexible combination of different SGL-style shrinkage conditions. Analogous to SGL, we investigated the combination of the Smoothly Clipped Absolute Deviation (SCAD), the Minimax Concave Penalty (MCP) and the Exponential Penalty (EP) with their group versions, resulting in the Sparse Group SCAD, the Sparse Group MCP, and the novel Sparse Group EP (SGE). Those shrinkage operators provide refined control of the effect of group formation on the selection process through a tuning parameter. In simulation studies, SGPs were compared with other bi-level selection methods (Group Bridge, composite MCP, and Group Exponential LASSO) for variable and group selection evaluated with the Matthews correlation coefficient. We demonstrated the advantages of the new SGE in identifying parsimonious models, but also identified scenarios that highlight the limitations of the approach. The performance of the techniques was further investigated in a real-world use case for the selection of regulated lipids in a randomized clinical trial.

https://pubmed.ncbi.nlm.nih.gov/38747086/

2024

Integrated cellular 4D-TIMS lipidomics and transcriptomics for characterization of anti-inflammatory and anti-atherosclerotic phenotype of MyD88-KO macrophages

Abstract
Introduction: Recent progress in cell isolation technologies and high-end omic technologies has allowed investigation of single cell sets across multiple omic domains and a thorough exploration of cellular function and various functional stages. While most multi-omic studies focused on dual RNA and protein analysis of single cell population, it is crucial to include lipid and metabolite profiling to comprehensively elucidate molecular mechanisms and pathways governing cell function, as well as phenotype at different functional stages. Methods: To address this gap, a cellular lipidomics and transcriptomics phenotyping approach employing simultaneous extraction of lipids, metabolites, and RNA from single cell populations combined with untargeted cellular 4 dimensional (4D)-lipidomics profiling along with RNA sequencing was developed to enable comprehensive multi-omic molecular profiling from the lowest possible number of cells. Reference cell models were utilized to determine the minimum number of cells required for this multi-omics analysis. To demonstrate the feasibility of higher resolution cellular multi-omics in early-stage identification of cellular phenotype changes in pathological and physiological conditions we implemented this approach for phenotyping of macrophages in two different activation stages: MyD88-knockout macrophages as a cellular model for atherosclerosis protection, and wild type macrophages. Results and Discussion: This multi-omic study enabled the determination of the lipid content remodeling in macrophages with anti-inflammatory and atherosclerotic protective function acquired by MyD88-KO, hence expedites the understanding of the molecular mechanisms behind immune cells effector functionality and of possible molecular targets for therapeutic intervention. An enriched functional role of phosphatidylcholine and plasmenyl/plasmalogens was shown here to accompany genetic changes underlying macrophages acquisition of anti-inflammatory function, finding that can serve as reference for macrophages reprogramming studies and for general immune and inflammation response to diseases.

https://pubmed.ncbi.nlm.nih.gov/39247623/

2024

Heart rate variability: reference values and role for clinical profile and mortality in individuals with heart failure

Abstract
Aims
To establish reference values and clinically relevant determinants for measures of heart rate variability (HRV) and to assess their relevance for clinical outcome prediction in individuals with heart failure.

Methods
Data from the MyoVasc study (NCT04064450; N = 3289), a prospective cohort on chronic heart failure with a highly standardized, 5 h examination, and Holter ECG recording were investigated. HRV markers were selected using a systematic literature screen and a data-driven approach. Reference values were determined from a healthy subsample. Clinical determinants of HRV were investigated via multivariable linear regression analyses, while their relationship with mortality was investigated by multivariable Cox regression analyses.

Results
Holter ECG recordings were available for analysis in 1001 study participants (mean age 64.5 ± 10.5 years; female sex 35.4%). While the most frequently reported HRV markers in literature were from time and frequency domains, the data-driven approach revealed predominantly non-linear HRV measures. Age, sex, dyslipidemia, family history of myocardial infarction or stroke, peripheral artery disease, and heart failure were strongly related to HRV in multivariable models. In a follow-up period of 6.5 years, acceleration capacity [HRperSD 1.53 (95% CI 1.21/1.93), p = 0.0004], deceleration capacity [HRperSD: 0.70 (95% CI 0.55/0.88), p = 0.002], and time lag [HRperSD 1.22 (95% CI 1.03/1.44), p = 0.018] were the strongest predictors of all-cause mortality in individuals with heart failure independently of cardiovascular risk factors, comorbidities, and medication.

Conclusion
HRV markers are associated with the cardiovascular clinical profile and are strong and independent predictors of survival in heart failure. This underscores clinical relevance and interventional potential for individuals with heart failure.

https://link.springer.com/article/10.1007/s00392-023-02248-7

2024

Mental health symptoms and burdens after a SARS-CoV-2 infection

Abstract
Background
Previous studies have found adverse efects on mental health following infection with SARS-CoV-2. This
study investigates whether mental health is also impaired in unknowingly infected individuals. In addition, the rel‑
evance of the severity of the infection and the time since the onset of infection were analyzed.

Methods
Data from the population-representative Gutenberg COVID-19 Study (GCS) were used (N=2,267). SARSCoV-2 infection was determined multimodally by self-report, throat swabs (acute infections) and antibody measure‑
ments (previous infections). Participants completed self-report questionnaires on mental health.

Results
Neither unknowing nor knowing SARS-CoV-2 infection had an impact on mental health. However, symptom
severity and previous depression or anxiety predicted higher levels of depressiveness, anxiety and somatic com‑
plaints. Our results confrm fndings suggesting that the severity of the initial infection and previous mental illness,
but not knowledge of the infection, are the most important predictors of negative mental health outcomes follow‑
ing SARS-CoV-2 infection.

Conclusion
The results suggest that mental health care should focus on individuals who sufer from a severe acute
COVID-19 infection or have a history of mental illness.

https://pmc.ncbi.nlm.nih.gov/articles/PMC11645784/

2024

Change in Systemic Medication and its Influence on Intraocular Pressure - Results From the Gutenberg Health Study

Abstract
Purpose: The purpose of this study was to investigate the relationship between the change in systemic medication and intraocular pressure (IOP) on a population-based level.

Methods:
The Gutenberg Health Study is a population-based prospective observational cohort study in Germany. As part of the baseline examination (2007-2012) and 5-year follow-up examination (2012-2017), IOP was measured by non-contact tonometry. Systemic medication was recorded at both time points. Multivariable regression analyses were carried out to analyze associations. Moreover, we calculated the dose-response relationship for the dosage change of selective beta-blockers with IOP change over 5 years.

Results:
The analysis population included 19,161 eyes of 9633 participants. IOP change was lower in participants with new intake of selective beta-blockers (-0.31 mm Hg, P < 0.001) and increased in those with discontinuation of selective beta-blocker intake (+0.28 mm Hg, P = 0.02). Associations between IOP change and statins and calcium channel blockers (CCBs) could be attributed to co-medications. There was a dose-response relationship for change in selective beta-blocker intake and change in IOP (-0.16 mm Hg/100 mg, P = 0.02).

Conclusions:
Use of systemic selective beta-blockers is associated with an IOP change on a population level, whereas the association with other systemic medications on IOP change could be explained by co-medication use or change in blood pressure. Patients undergoing IOP monitoring and management should routinely be asked about changes in systemic medications.

https://pubmed.ncbi.nlm.nih.gov/39625443/

2023

A fair experimental comparison of neural network architectures for latent representations of multi-omics for drug response prediction

Recent years have seen a surge of novel neural network architectures for the integration of multi-omics data for prediction. Most of the architectures include either encoders alone or encoders and decoders, i.e., autoencoders of various sorts, to transform multi-omics data into latent representations. One important parameter is the depth of integration: the point at which the latent representations are computed or merged, which can be either early, intermediate, or late. The literature on integration methods is growing steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05166-7

2023

Overview, evaluation, and development of group variable selection methods for knowledge integration, Talk, International Biometric Society: Austria-Switzerland Region

Author: Buch, G.

Introduction:
Many datasets have a natural group structure due to high correlations or contextual similarities of variables. Group variable selection methods can account for such structure in the selection process to identify variables that are related to each other and share a common and traceable relationship with the response variable. There is a need for a systematic review of implemented approaches, a fair comparison and evaluation of these techniques, and the development of improved methods for omics datasets.

Methods:
A systematic literature search was conducted to identify group variable selection methods implemented in R. The selection performance of the identified methods was evaluated within simulation studies. A subset of the best performing approaches was used to select predictors in a time to event, regression, and classification task in bootstrap samples from a prospective cohort study
(MyoVasc; NCT04064450). The Sparse Group Exponential Penalty (SGE) was proposed to address the limitations of existing methods. Its performance was compared to established techniques in simulation studies and applied to data from a randomized clinical trial (EmDia; NCT02932436).

Results:
The systematic review revealed 14 methods, which were classified into knowledge-driven and datadriven approaches. The first category includes group-level and bi-level selection methods, while twostep and collinear tolerant approaches constitute the second category. Simulation studies show the advantage of bi-level selection methods over the other approaches. In the real-world scenario, the bilevel
selection methods were also shown to outperform the traditional LASSO, as they were able to treat variables of a group consistently and select correlated variables together. SGE demonstrated superiority in variable and group selection in almost all settings where the number of observations exceeded the number of variables. In cases where there were fewer observations than variables, SGE was the best bi-level selection method when few groups contained predictive signals.

Conclusions:
A variety of methods can incorporate a group structure of predictors in the selection process. The choice of the most appropriate method is dependent on the specific research question and demands careful consideration. Bi-level selection methods, particularly the SGE, appear promising for exploratory analysis of grouped omics data, such as lipidomics data.

2023

A systematic review and evaluation of statistical methods for group variable selection

This review condenses the knowledge on variable selection methods implemented in R and appropriate for datasets with grouped features. The focus is on regularized regressions identified through a systematic review of the literature, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A total of 14 methods are discussed, most of which use penalty terms to perform group variable selection. Depending on how the methods account for the group structure, they can be classified into knowledge and data-driven approaches. The first encompass group-level and bi-level selection methods, while two-step approaches and collinearity-tolerant methods constitute the second category. The identified methods are briefly explained and their performance compared in a simulation study. This comparison demonstrated that group-level selection methods, such as the group minimax concave penalty, are superior to other methods in selecting relevant variable groups but are inferior in identifying important individual variables in scenarios where not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as group bridge. Two-step and collinearity-tolerant approaches such as elastic net and ordered homogeneity pursuit least absolute shrinkage and selection operator are inferior to knowledge-driven methods but provide results without requiring prior knowledge. Possible applications in proteomics are considered, leading to suggestions on which method to use depending on existing prior knowledge and research question.

https://pubmed.ncbi.nlm.nih.gov/36546512/

2023

Effects of empagliflozin on left ventricular diastolic function in addition to usual care in individuals with type 2 diabetes mellitus-results from the randomized, double-blind, placebo-controlled EmDia trial

Background:
The sodium-glucose co-transporter 2 inhibitor empagliflozin improves cardiovascular outcome in patients with type 2 diabetes mellitus (T2DM) and heart failure. Experimental studies suggest a direct cardiac effect of empagliflozin associated with an improvement in left ventricular diastolic function.

Methods:
In the randomized, double-blind, two-armed, placebo-controlled, parallel group trial EmDia, patients with T2DM and elevated left ventricular E/E´ ratio were enrolled and randomized 1:1 to receive empagliflozin 10 mg/day versus placebo. The primary endpoint was the change of left ventricular E/E´ ratio after 12 weeks of intervention.

Results:
A total of 144 patients with T2DM and an elevated left ventricular E/e´ ratio (age 68.9 ± 7.7 years; 14.1% women; E/e´ ratio 9.61[8.24/11.14], left ventricular ejection fraction 58.9% ± 5.6%). After 12 weeks of intervention, empagliflozin resulted in a significant higher decrease in the primary endpoint E/e´ ratio by - 1.18 ([95% confidence interval (CI) - 1.72/- 0.65]; P < 0.0001) compared with placebo. The beneficial effect of empagliflozin was consistent across all subgroups and also occurred in subjects with heart failure and preserved ejection fraction (n = 30). Additional effects of empagliflozin on body weight, HbA1c, uric acid, red blood cell count, hemoglobin, mean corpuscular hemoglobin, and hematocrit were detected (all P < 0.001). Approximately one-third of the reduction in E/e´ by empagliflozin could be explained by the variables examined.

Conclusions:
Empagliflozin improves diastolic function in patients with T2DM and elevated end-diastolic pressure. Since the positive effects were consistent in patients with and without heart failure with preserved ejection fraction, the data add a mechanistic insight for the beneficial cardiovascular effect of empagliflozin.

https://pubmed.ncbi.nlm.nih.gov/36763159/

2023

Rationale and design of the effects of EMpagliflozin on left ventricular DIAstolic function in diabetes (EmDia) study

Abstract:
Background: Data of the EMPA-REG OUTCOME study have demonstrated a beneficial effect of the sodium-glucose cotransporter 2 inhibitor empagliflozin on cardiovascular outcome in patients with type 2 diabetes. The reduction in cardiovascular mortality and hospitalization due to heart failure might be in part explained by the direct effects of empagliflozin on cardiac diastolic function. The EmDia trial investigates the short-term effects of empagliflozin compared to placebo on the left ventricular E/E' ratio as a surrogate of left ventricular diastolic function.

Methods:
EmDia is a single-center, randomized, double-blind, two-arm, placebo-controlled, parallel group study of phase IV. Individuals with diabetes mellitus type 2 (T2DM) are randomized 1:1 to receive empagliflozin 10 mg per day or a placebo for 12 weeks. The main inclusion criteria are diagnosed as T2DM with stable glucose-lowering and/or dietary treatment, elevated HbA1c level (6.5-10.0% if receiving glucose-lowering therapy, or 6.5-9.0% if drug-naïve), and diastolic cardiac dysfunction with left ventricular E/E'≥8. The primary end point is the difference of the change in the E/E' ratio by treatment groups after 12 weeks. Secondary end points include assessment of the effect of empagliflozin on left ventricular systolic function, measures of vascular structure and function, as well as humoral cardiovascular biomarkers (i.e. brain natriuretic peptide, troponin, C-reactive protein). In addition, the multidimensional biodatabase enables explorative analyses of molecular biomarkers to gain insights into possible mechanisms of the effects of empagliflozin on human health in a systems medicine-oriented, multiomics approach.

Conclusion:
By evaluating the short-term effect of empagliflozin with a comprehensive biobanking program, the EmDia Study offers an opportunity to primarily assess the effects on diastolic function but also to examine effects on clinical and molecular cardiovascular traits.

https://pubmed.ncbi.nlm.nih.gov/34939776/

2023

Unsupervised clustering of venous thromboembolism patients by clinical features at presentation identifies novel endotypes that improve prognostic stratification

Background:
Individuals with acute venous thromboembolism (VTE) constitute a heterogeneous group of patients with diverse clinical characteristics and outcome.

Objectives:
To identify endotypes of individuals with acute VTE based on clinical characteristics at presentation through unsupervised cluster analysis and to evaluate their molecular proteomic profile and clinical outcome.

Methods:
Data from 591 individuals from the Genotyping and Molecular phenotyping of Venous thromboembolism (GMP-VTE) project were explored. Hierarchical clustering was applied to 58 variables to define VTE endotypes. Clinical characteristics, three-year incidence of thromboembolic events or death, and acute-phase plasma proteomics were assessed.

Results:
Four endotypes were identified, exhibiting different patterns of clinical characteristics and clinical course. Endotype 1 (n = 300), comprising older individuals with comorbidities, had the highest incidence of thromboembolic events or death (HR [95 % CI]: 3.76 [1.96-7.19]), followed by endotype 4 (n = 127) (HR [95 % CI]: 2.55 [1.26-5.16]), characterised by men with history of VTE and provoking risk factors, and endotype 3 (n = 57) (HR [95 % CI]: 1.57 [0.63-3.87]), composed of young women with provoking risk factors, vs. reference endotype 2 (n = 107). The reference endotype was constituted by individuals diagnosed with PE without comorbidities, who had the lowest incidence of the investigated endpoint. Differentially expressed proteins associated with the endotypes were related to distinct biological processes, supporting differences in molecular pathophysiology. The endotypes had superior prognostic ability compared to existing risk stratifications such as provoked vs unprovoked VTE and D-dimer levels.

Conclusion:
Four endotypes of VTE were identified by unsupervised phenotype-based clustering that diverge in clinical outcome and plasmatic protein signature. This approach might support the future development of individualized treatment in VTE.

https://pubmed.ncbi.nlm.nih.gov/37202285/

2023

Much higher prevalence of keratoconus than announced results of the Gutenberg Health Study (GHS)

Keratoconus appears to be a rare corneal disease with a prevalence previously estimated at 1:2000. The aim of our study was to investigate the prevalence of keratoconus in a large German cohort and to evaluate possible associated factors.

Method:
In the population-based, prospective, monocentric cohort study, Gutenberg Health Study, 12,423 subjects aged 40-80 years were examined at the 5-year follow-up. Subjects underwent a detailed medical history and a general and ophthalmologic examination including Scheimpflug imaging. Keratoconus diagnosis was performed in two steps: all subjects with conspicuous TKC analysis of corneal tomography were included in further grading. Prevalence and 95% confidence intervals were calculated. Logistic regression analysis was carried out to investigate association with age, sex, BMI, thyroid hormone, smoking, diabetes, arterial hypertension, atopy, allergy, steroid use, sleep apnea, asthma, and depression.

Results:
Of 10,419 subjects, 75 eyes of 51 subjects were classified as having keratoconus. The prevalence for keratoconus in the German cohort was 0.49% (1:204; 95% CI: 0.36-0.64%) and was approximately equally distributed across the age decades. No gender predisposition could be demonstrated. Logistic regression showed no association between keratoconus and age, sex, BMI, thyroid hormone, smoking, diabetes, arterial hypertension, atopy, allergy, steroid use, sleep apnea, asthma, and depression in our sample.

Conclusion:
The prevalence of keratoconus disease in a mainly Caucasian population is approximately tenfold higher than previously reported in the literature using latest technologies (Scheimpflug imaging). Contrary to previous assumptions, we did not find associations with sex, existing atopy, thyroid dysfunction, diabetes, smoking, and depression.

https://pubmed.ncbi.nlm.nih.gov/37314521/

2023

Ionmob: A Python Package for Prediction of Peptide Collisional Cross-Section Values

Abstract
Motivation
Including ion mobility separation (IMS) into mass spectrometry proteomics experiments is useful to improve coverage and throughput. Many IMS devices enable linking experimentally derived mobility of an ion to its collisional cross-section (CCS), a highly reproducible physicochemical property dependent on the ion’s mass, charge and conformation in the gas phase. Thus, known peptide ion mobilities can be used to tailor acquisition methods or to refine database search results. The large space of potential peptide sequences, driven also by post-translational modifications (PTMs) of amino acids, motivates an in silico predictor for peptide CCS. Recent studies explored the general performance of varying machine-learning techniques, however, the workflow engineering part was of secondary importance. For the sake of applicability, such a tool should be generic, data driven and offer the possibility to be easily adapted to individual workflows for experimental design and data processing.

Results
We created ionmob, a Python based framework for data preparation, training, and prediction of collisional cross-section values of peptides. It is easily customizable and includes a set of pretrained, ready-to-use models and preprocessing routines for training and inference. Using a set of ≈ 21.000 unique phosphorylated peptides and ≈ 17.000 MHC ligand sequences and charge state pairs, we expand upon the space of peptides that can be integrated into CCS prediction. Lastly, we investigate the applicability of in silico predicted CCS to increase confidence in identified peptides by applying methods of re-scoring and demonstrate that predicted CCS values complement existing predictors for that task.

Availability
The Python package is available at github: https://github.com/theGreatHerrLebert/ionmob.

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad486/7237255?login=true

2023

Lipid-focused cardiovascular disease research: trends and opportunities (Submitted in Journal of Clinical Medicine).

Anyaegbunam, A., P. More, J.F. Fontaine, V. ten Cate, K. Bauer, U. Distler, E. Araldi, L. Bindila, P. Wild and M.A. Andrade-Navarro.

https://www.scirp.org/journal/ijcm/

2023

Four-dimensional trapped ion mobility spectrometry lipidomics for high throughput clinical profiling of human blood samples

Lipidomics encompassing automated lipid extraction, a four-dimensional (4D) feature selection strategy for confident lipid annotation as well as reproducible and cross-validated quantification can expedite clinical profiling. Here, we determine 4D descriptors (mass to charge, retention time, collision cross section, and fragmentation spectra) of 200 lipid standards and 493 lipids from reference plasma via trapped ion mobility mass spectrometry to enable the implementation of stringent criteria for lipid annotation. We use 4D lipidomics to confidently annotate 370 lipids in reference plasma samples and 364 lipids in serum samples, and reproducibly quantify 359 lipids using level-3 internal standards. We show the utility of our 4D lipidomics workflow for high-throughput applications by reliable profiling of intra-individual lipidome phenotypes in plasma, serum, whole blood, venous and finger-prick dried blood spots.

https://www.nature.com/articles/s41467-023-36520-1

2023

DNA methylation and cardiovascular disease in humans: a systematic review and database of known CpG methylation sites

Background:
Cardiovascular disease (CVD) is the leading cause of death worldwide and considered one of the most environmentally driven diseases. The role of DNA methylation in response to the individual exposure for the development and progression of CVD is still poorly understood and a synthesis of the evidence is lacking.

Results:
A systematic review of articles examining measurements of DNA cytosine methylation in CVD was conducted in accordance with PRISMA (preferred reporting items for systematic reviews and meta-analyses) guidelines. The search yielded 5,563 articles from PubMed and CENTRAL databases. From 99 studies with a total of 87,827 individuals eligible for analysis, a database was created combining all CpG-, gene- and study-related information. It contains 74,580 unique CpG sites, of which 1452 CpG sites were mentioned in ≥ 2, and 441 CpG sites in ≥ 3 publications. Two sites were referenced in ≥ 6 publications: cg01656216 (near ZNF438) related to vascular disease and epigenetic age, and cg03636183 (near F2RL3) related to coronary heart disease, myocardial infarction, smoking and air pollution. Of 19,127 mapped genes, 5,807 were reported in ≥ 2 studies. Most frequently reported were TEAD1 (TEA Domain Transcription Factor 1) and PTPRN2 (Protein Tyrosine Phosphatase Receptor Type N2) in association with outcomes ranging from vascular to cardiac disease. Gene set enrichment analysis of 4,532 overlapping genes revealed enrichment for Gene Ontology molecular function “DNA-binding transcription activator activity” (q = 1.65 × 10–11) and biological processes “skeletal system development” (q = 1.89 × 10–23). Gene enrichment demonstrated that general CVD-related terms are shared, while “heart” and “vasculature” specific genes have more disease-specific terms as PR interval for “heart” or platelet distribution width for “vasculature.” STRING analysis revealed significant protein–protein interactions between the products of the differentially methylated genes (p = 0.003) suggesting that dysregulation of the protein interaction network could contribute to CVD. Overlaps with curated gene sets from the Molecular Signatures Database showed enrichment of genes in hemostasis (p = 2.9 × 10–6) and atherosclerosis (p = 4.9 × 10–4).

Conclusion:
This review highlights the current state of knowledge on significant relationship between DNA methylation and CVD in humans. An open-access database has been compiled of reported CpG methylation sites, genes and pathways that may play an important role in this relationship.

https://clinicalepigeneticsjournal.biomedcentral.com/articles/10.1186/s13148-023-01468-y#Ack1

2023

Clinical profile and outcome of isolated pulmonary embolism: a systematic review and meta-analysis

Background:
Isolated pulmonary embolism (PE) appears to be associated with a specific clinical profile and sequelae compared to deep vein thrombosis (DVT)-associated PE. The objective of this study was to identify clinical characteristics that discriminate both phenotypes, and to characterize their differences in clinical outcome.

Methods:
We performed a systematic review and meta-analysis of studies comparing PE phenotypes. A systematic search of the electronic databases PubMed and CENTRAL was conducted, from inception until January 27, 2023. Exclusion criteria were irrelevant content, inability to retrieve the article, language other than English or German, the article comprising a review or case study/series, and inappropriate study design. Data on risk factors, clinical characteristics and clinical endpoints were pooled using random-effects meta-analyses.

Findings:
Fifty studies with 435,768 PE patients were included. In low risk of bias studies, 30% [95% CI 19–42%, I2 = 97%] of PE were isolated. The Factor V Leiden [OR: 0.47, 95% CI 0.37–0.58, I2 = 0%] and prothrombin G20210A mutations [OR: 0.55, 95% CI 0.41–0.75, I2 = 0%] were significantly less prevalent among patients with isolated PE. Female sex [OR: 1.30, 95% CI 1.17–1.45, I2 = 79%], recent invasive surgery [OR: 1.31, 95% CI 1.23–1.41, I2 = 65%], a history of myocardial infarction [OR: 2.07, 95% CI 1.85–2.32, I2 = 0%], left-sided heart failure [OR: 1.70, 95% CI 1.37–2.10, I2 = 76%], peripheral artery disease [OR: 1.36, 95% CI 1.31–1.42, I2 = 0%] and diabetes mellitus [OR: 1.23, 95% CI 1.21–1.25, I2 = 0%] were significantly more frequently represented among isolated PE patients. In a synthesis of clinical outcome data, the risk of recurrent VTE in isolated PE was half that of DVT-associated PE [RR: 0.55, 95% CI 0.44–0.69, I2 = 0%], while the risk of arterial thrombosis was nearly 3-fold higher [RR: 2.93, 95% CI 1.43–6.02, I2 = 0%].

Interpretation:
Our findings suggest that isolated PE appears to be a specific entity that may signal a long-term risk of arterial thrombosis. Randomised controlled trials are necessary to establish whether alternative treatment regimens are beneficial for this patient subgroup.

https://www.thelancet.com/journals/eclinm/article/PIIS2589-5370(23)00150-5/fulltext#secsectitle0155

2023

Tinnitus Prevalence in the Adult Population—Results from the Gutenberg Health Study

Abstract
Background and Objectives: Tinnitus is a common symptom in medical practice, although data on its prevalence vary. As the underlying pathophysiological mechanism is still not fully understood, hearing loss is thought to be an important risk factor for the occurrence of tinnitus. The aim of this study was to assess tinnitus prevalence in a large German cohort and to determine its dependence on hearing impairment. Materials and Methods: The Gutenberg Health Study (GHS) is a population-based cohort study and representative for the population of Mainz and its district. Participants were asked whether they suffer from tinnitus and how much they are burdened by it. Extensive audiological examinations using bone- and air-conduction were also performed. Results: 4942 participants (mean age: 61.0, 2550 men and 2392 women) were included in the study. The overall prevalence of tinnitus was 26.1%. Men were affected significantly more often than women. The prevalence of tinnitus increased with age, peaking at ages 75 to 79 years. Considering only annoying tinnitus, the prevalence was 9.8%. Logistic regression showed that participants with severe to complete hearing loss (>65 dB) were more likely to have tinnitus. Conclusions: Tinnitus is a common symptom, and given demographic changes, its prevalence is expected to increase.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10052845/

2023

Effects of empagliflozin on left ventricular diastolic function in addition to usual care in individuals with type 2 diabetes mellitus-results from the randomized, double-blind, placebo-controlled EmDia trial

Abstract:
Background: The sodium-glucose co-transporter 2 inhibitor empagliflozin improves cardiovascular outcome in patients with type 2 diabetes mellitus (T2DM) and heart failure. Experimental studies suggest a direct cardiac effect of empagliflozin associated with an improvement in left ventricular diastolic function.

https://pubmed.ncbi.nlm.nih.gov/36763159/

2023

Effect of Empagliflozin Compared to Placebo on the Plasma Lipidome in Patients with Type 2 Diabetes Mellitus – Results from the EmDia Trial, Poster, DGK

Author: Araldi E., Bauer K., Koeck T., Buch G., Baker D. , Lerner R., Tenzer S., Andrade-Navarro M. A. , S. Rapp S. , V. ten Cate V. ten, M. Nuber M., Lackner K. J. , Daiber A. , Münzel T., Wild P. S. ,Bindila L., Prochaska J. H.

Background:
Empagliflozin has recently emerged as an effective treatment to reduce the risk of cardiovascular death and hospitalization in patients with heart failure (HF). However, the molecular changes driven by Empagliflozin treatment and responsible for the amelioration of cardiac parameters are largely unknown. In this work, the effects of Empagliflozin on the lipidome of patients with heart failure and type 2 diabetes mellitus (T2DM) were investigated.

Methods:
Samples were obtained from the EmDia study (NCT02932436, 144 participants). Participants with T2DM and elevated left ventricular end-diastolic pressure (as measured via lateral E/e´ ratio), at baseline were randomized 1:1 to receive Empagliflozin or placebo. Identical (sub)clinical and molecular characterization including biodata banking was performed at baseline and after 12 weeks of intervention. Lipids were quantified by mass spectrometry in a 4D- LC/TIMS-IMS lipidomics approach at both time points. The lipid signatures reflecting the effect of Empagliflozin treatment were identified using sparse group LASSO regularized regression. Lipids that mapped clinical features altered by Empagliflozin treatment were investigated with linear regressions.

Results:
Sparse group LASSO regularized regression selected a signature of 27 lipids (at least 90% sample coverage) across several lipid classes with significantly different abundance in Empagliflozin vs placebo treatment. Approximately 74% of lipids in the Empagliflozin lipid signature (N=20) were associated with at least one clinical feature affected by Empagliflozin. In particular, changes in the Empagliflozin lipid signature significantly explained changes in E/E’, the primary endpoint of the study (estimate -0.45, p-value < 0.01), and changes in secondary endpoints (BMI, HbA1c, hemoglobin, erythrocyte counts, uric acid, eGFR). Within each lipid class (ceramides, sphingomyelins, etc.), virtually all lipids showed a consistent relation to Empagliflozin treatment and a consistent association to primary endpoint (E/E’ ratio), and secondary endpoints (red blood cell count, hemoglobin, BMI, HbA1c, and uric acid).

Conclusions:
The analysis of the lipidome of Empagliflozin or placebo treated participants of the EmDia study provided insights into putative molecular mechanisms of action of Empagliflozin through modulating lipids which might contribute to an improvement of clinical features.

https://dgk.org/kongress_programme/jt2023/aP542.html

2023

A Fair Experimental Comparison of Neural Network Architectures for Latent Representations of Multi-Omics for Drug Response Prediction., Talk and Poster, ISCB

Author: Hauptmann T. and Kramer S.

Recent years have seen a surge of novel neural network architectures for multi-omics integration. One important parameter is the integration depth: the point at which the latent representations are computed or merged, which can be early, intermediate, or late. The literature on integration methods grows steadily, however, close to nothing is known about the relative performance of these methods under fair experimental conditions and under consideration of different use cases. We developed a comparison framework that trains multi-omics integration methods under equal conditions. We incorporated four recent deep learning methods, early integration, PCA, and a novel method, Omics Stacking, that combines the advantages of intermediate and late integration. Experiments were conducted on a drug response data set with multiple omics data. Our experiments confirmed that early integration has the lowest predictive performance. Statistical differences can, overall, rarely be observed, however, in terms of the average ranks of methods, Super.FELT performed best in a cross-validation setting and Omics Stacking best on the external test set. When faced with a new data set, Super.FELT is a good option in the cross-validation setting as well as Omics Stacking in the external test set setting

https://www.iscb.org/ismbeccb2023-programme/tracks/general-computational-biology

2023

midiaPASEF maximizes information content in data-independent acquisition proteomics

Abstract:

Data-independent acquisition (DIA) approaches provide comprehensive records of all detectable pre-cursor and fragment ions. Here we introduce midiaPASEF, a novel DIA scan mode using mobility-specific micro-encoding of overlapping quadrupole windows to optimally cover the ion population in the ion mobility-mass to charge plane. Using overlapping ion mobility-encoded quadrupole windows, midiaPASEF maximizes information content in DIA acquisitions which enables the determination of the precursor m/z of each fragment ion with a precision of less than 2 Th. The Snakemake-based MIDIAID pipeline integrates algorithms for multidimensional peak detection and for machine-learning-based classification of precursor-fragment relationships. The MIDIAID pipeline enables fully automated processing and multidimensional deconvolution of midia-PASEF files and exports highly specific DDA-like MSMS spectra which are suitable for de novo sequencing and can be searched directly with established tools including PEAKS, FragPipe and Mascot. midiaPASEF acquisition identifies over 40 unique peptides per second and provides powerful library-free DIA analyses including phosphopeptidome and immunopeptidome samples.

https://www.biorxiv.org/content/10.1101/2023.01.30.526204v1.full

2023

Disturbed Plasma Lipidomic Profiles in Females with Diffuse Large B-Cell Lymphoma: A Pilot Study

Abstract

Lipidome dysregulation is a hallmark of cancer and inflammation. The global plasma lipidome and sub-lipidome of inflammatory pathways have not been reported in diffuse large B-cell lymphoma (DLBCL). In a pilot study of plasma lipid variation in female DLBCL patients and BMI-matched disease-free controls, we performed targeted lipidomics using LC-MRM to quantify lipid mediators of inflammation and immunity, and those known or hypothesised to be involved in cancer progression: sphingolipids, resolvin D1, arachidonic acid (AA)-derived oxylipins, such as hydroxyeicosatetraenoic acids (HETEs) and dihydroxyeicosatrienoic acids, along with their membrane structural precursors. We report on the role of the eicosanoids in the separation of DLBCL from controls, along with lysophosphatidylinositol LPI 20:4, implying notable changes in lipid metabolic and/or signalling pathways, particularly pertaining to AA lipoxygenase pathway and glycerophospholipid remodelling in the cell membrane. We suggest here the set of S1P, SM 36:1, SM 34:1 and PI 34:1 as DLBCL lipid signatures which could serve as a basis for the prospective validation in larger DLBCL cohorts. Additionally, untargeted lipidomics indicates a substantial change in the overall lipid metabolism in DLBCL. The plasma lipid profiling of DLBCL patients helps to better understand the specific lipid dysregulations and pathways in this cancer.

https://www.mdpi.com/2072-6694/15/14/3653

2023

Circulating microRNAs predict recurrence and death following venous thromboembolism.

Background: Recurrent events frequently occur after venous thromboembolism (VTE) and remain difficult to predict based on established genetic, clinical, and proteomic contributors. The role of circulating microRNAs (miRNAs) has yet to be explored in detail.

Objectives: To identify circulating miRNAs predictive of recurrent VTE or death, and to interpret their mechanistic involvement.

Methods: Data from 181 participants of a cohort study of acute VTE and 302 individuals with a history of VTE from a population-based cohort were investigated. Next-generation sequencing was performed on EDTA plasma samples to detect circulating miRNAs. The endpoint of interest was recurrent VTE or death. Penalized regression was applied to identify an outcome-relevant miRNA signature, and results were validated in the population-based cohort. The involvement of miRNAs in coregulatory networks was assessed using principal component analysis, and the associated clinical and molecular phenotypes were investigated. Mechanistic insights were obtained from target gene and pathway enrichment analyses.

Results: A total of 1950 miRNAs were detected across cohorts after postprocessing. In the discovery cohort, 50 miRNAs were associated with recurrent VTE or death (cross-validated C-index, 0.65). A weighted miRNA score predicted outcome over an 8-year follow-up period (HRSD, 2.39; 95% CI, 1.98-2.88; P < .0001). The independent validation cohort validated 20 miRNAs (ORSD for score, 3.47; 95% CI, 2.37-5.07; P < .0001; cross-validated-area under the curve, 0.61). Principal component analysis revealed 5 miRNA networks with distinct relationships to clinical phenotype and outcome. Mapping of target genes indicated regulation via transcription factors and kinases involved in signaling pathways associated with fibrinolysis.

Conclusion: Circulating miRNAs predicted the risk of recurrence or death after VTE over several years, both in the acute and chronic phases.

https://pubmed.ncbi.nlm.nih.gov/37481073/

2023

Gender Differences and the Impact of Partnership and Children on Quality of Life During the COVID-19 Pandemic

Abstract
Objectives: The COVID-19 pandemic and its protective measures have changed the daily lives of families and may have affected quality of life (QoL). The aim of this study was to analyze gender differences in QoL and to examine individuals living in different partnership and family constellations.

Methods: Data from the Gutenberg COVID-19 cohort study (N = 10,250) with two measurement time points during the pandemic (2020 and 2021) were used. QoL was assessed using the EUROHIS-QOL questionnaire. Descriptive analyses and autoregressive regressions were performed.

Results: Women reported lower QoL than men, and QoL was significantly lower at the second measurement time point in both men and women. Older age, male gender, no migration background, and higher socioeconomic status, as well as partnership and children (especially in men), were protective factors for QoL. Women living with children under 14 and single mothers reported significantly lower QoL.

Conclusion: Partnership and family were protective factors for QoL. However, women with young children and single mothers are vulnerable groups for lower QoL. Support is especially needed for women with young children.

https://pubmed.ncbi.nlm.nih.gov/37284508/

2023

Much higher prevalence of keratoconus than announced results of the Gutenberg Health Study (GHS).

Abstract
Keratoconus appears to be a rare corneal disease with a prevalence previously estimated at 1:2000. The aim of our study was to investigate the prevalence of keratoconus in a large German cohort and to evaluate possible associated factors.

Conclusion: The prevalence of keratoconus disease in a mainly Caucasian population is approximately tenfold higher than previously reported in the literature using latest technologies (Scheimpflug imaging). Contrary to previous assumptions, we did not find associations with sex, existing atopy, thyroid dysfunction, diabetes, smoking, and depression.

https://pubmed.ncbi.nlm.nih.gov/37314521/

2023

Plasma protein signatures for high on-treatment platelet reactivity to aspirin and clopidogrel in peripheral artery disease.

Abstract
Background:
A significant proportion of patients with peripheral artery disease (PAD) displays a poor response to aspirin and/or the platelet P2Y12 receptor antagonist clopidogrel. This phenomenon is reflected by high on-treatment platelet reactivity (HTPR) in platelet function assays in vitro and is associated with an increased risk of adverse cardiovascular events.

Objective:
This study aimed to elucidate specific plasma protein signatures associated with HTPR to aspirin and clopidogrel in PAD patients.

Methods and results:
Based on targeted plasma proteomics, 184 proteins from two cardiovascular Olink panels were measured in 105 PAD patients. VerifyNow ASPI- and P2Y12-test values were transformed to a continuous variable representing HTPR as a spectrum instead of cut-off level-defined HTPR. Using the Boruta random forest algorithm, the importance of 3 plasma proteins for HTPR in the aspirin, six in clopidogrel and 10 in the pooled group (clopidogrel or aspirin) was confirmed. Network analysis demonstrated clusters with CD84, SLAMF7, IL1RN and THBD for clopidogrel and with F2R, SELPLG, HAVCR1, THBD, PECAM1, TNFRSF10B, MERTK and ADM for the pooled group. F2R, TNFRSF10B and ADM were higher expressed in Fontaine III patients compared to Fontaine II, suggesting their relation with PAD severity.

Conclusions:
A plasma protein signature, including eight targets involved in proatherogenic dysfunction of blood cell-vasculature interaction, coagulation and cell death, is associated with HTPR (aspirin and/or clopidogrel) in PAD. This may serve as important systems-based determinants of poor platelet responsiveness to aspirin and/or clopidogrel in PAD and other cardiovascular diseases and may contribute to identify novel treatment strategies

https://pubmed.ncbi.nlm.nih.gov/37708596/

2023

Discriminative machine learning for maximal representative subsampling

Abstract
Biased population samples pose a prevalent problem in the social sciences. Therefore, we present two novel methods that are based on positive-unlabeled learning to mitigate bias. Both methods leverage auxiliary information from a representative data set and train machine learning classifiers to determine the sample weights. The first method, named maximum representative subsampling (MRS), uses a classifier to iteratively remove instances, by assigning a sample weight of 0, from the biased data set until it aligns with the representative one. The second method is a variant of MRS – Soft-MRS – that iteratively adapts sample weights instead of removing samples completely. To assess the effectiveness of our approach, we induced artificial bias in a public census data set and examined the corrected estimates. We compare the performance of our methods against existing techniques, evaluating the ability of sample weights created with Soft-MRS or MRS to minimize differences and improve downstream classification tasks. Lastly, we demonstrate the applicability of the proposed methods in a real-world study of resilience research, exploring the influence of resilience on voting behavior. Through our work, we address the issue of bias in social science, amongst others, and provide a versatile methodology for bias reduction based on machine learning. Based on our experiments, we recommend to use MRS for downstream classification tasks and Soft-MRS for downstream tasks where the relative bias of the dependent variable is relevant.

https://www.nature.com/articles/s41598-023-48177-3

2023

Autoantibodies against the chemokine receptor 3 predict cardiovascular risk

Abstract
Background and Aims

Chronic inflammation and autoimmunity contribute to cardiovascular (CV) disease. Recently, autoantibodies (aAbs) against the CXC-motif-chemokine receptor 3 (CXCR3), a G protein-coupled receptor with a key role in atherosclerosis, have been identified. The role of anti-CXCR3 aAbs for CV risk and disease is unclear.

Methods
Anti-CXCR3 aAbs were quantified by a commercially available enzyme-linked immunosorbent assay in 5000 participants (availability: 97.1%) of the population-based Gutenberg Health Study with extensive clinical phenotyping. Regression analyses were carried out to identify determinants of anti-CXCR3 aAbs and relevance for clinical outcome (i.e. all-cause mortality, cardiac death, heart failure, and major adverse cardiac events comprising incident coronary artery disease, myocardial infarction, and cardiac death). Last, immunization with CXCR3 and passive transfer of aAbs were performed in ApoE(−/−) mice for preclinical validation.

Results
The analysis sample included 4195 individuals (48% female, mean age 55.5 ± 11 years) after exclusion of individuals with autoimmune disease, immunomodulatory medication, acute infection, and history of cancer. Independent of age, sex, renal function, and traditional CV risk factors, increasing concentrations of anti-CXCR3 aAbs translated into higher intima–media thickness, left ventricular mass, and N-terminal pro-B-type natriuretic peptide. Adjusted for age and sex, anti-CXCR3 aAbs above the 75th percentile predicted all-cause death [hazard ratio (HR) (95% confidence interval) 1.25 (1.02, 1.52), P = .029], driven by excess cardiac mortality [HR 2.51 (1.21, 5.22), P = .014]. A trend towards a higher risk for major adverse cardiac events [HR 1.42 (1.0, 2.0), P = .05] along with increased risk of incident heart failure [HR per standard deviation increase of anti-CXCR3 aAbs: 1.26 (1.02, 1.56), P = .03] may contribute to this observation. Targeted proteomics revealed a molecular signature of anti-CXCR3 aAbs reflecting immune cell activation and cytokine–cytokine receptor interactions associated with an ongoing T helper cell 1 response. Finally, ApoE(−/−) mice immunized against CXCR3 displayed increased anti-CXCR3 aAbs and exhibited a higher burden of atherosclerosis compared to non-immunized controls, correlating with concentrations of anti-CXCR3 aAbs in the passive transfer model.

Conclusions
In individuals free of autoimmune disease, anti-CXCR3 aAbs were abundant, related to CV end-organ damage, and predicted all-cause death as well as cardiac morbidity and mortality in conjunction with the acceleration of experimental atherosclerosis.

https://academic.oup.com/eurheartj/article/44/47/4935/7370225

2023

A Systematic Review of Lipid-Focused Cardiovascular Disease Research: Trends and Opportunities

Abstract
Lipids are important modifiers of protein function, particularly as parts of lipoproteins, which transport lipophilic substances and mediate cellular uptake of circulating lipids. As such, lipids are of particular interest as blood biological markers for cardiovascular disease (CVD) as well as for conditions linked to CVD such as atherosclerosis, diabetes mellitus, obesity and dietary states. Notably, lipid research is particularly well developed in the context of CVD because of the relevance and multiple causes and risk factors of CVD. The advent of methods for high-throughput screening of biological molecules has recently resulted in the generation of lipidomic profiles that allow monitoring of lipid compositions in biological samples in an untargeted manner. These and other earlier advances in biomedical research have shaped the knowledge we have about lipids in CVD. To evaluate the knowledge acquired on the multiple biological functions of lipids in CVD and the trends in their research, we collected a dataset of references from the PubMed database of biomedical literature focused on plasma lipids and CVD in human and mouse. Using annotations from these records, we were able to categorize significant associations between lipids and particular types of research approaches, distinguish non-biological lipids used as markers, identify differential research between human and mouse models, and detect the increasingly mechanistic nature of the results in this field. Using known associations between lipids and proteins that metabolize or transport them, we constructed a comprehensive lipid–protein network, which we used to highlight proteins strongly connected to lipids found in the CVD-lipid literature. Our approach points to a series of proteins for which lipid-focused research would bring insights into CVD, including Prostaglandin G/H synthase 2 (PTGS2, a.k.a. COX2) and Acylglycerol kinase (AGK). In this review, we summarize our findings, putting them in a historical perspective of the evolution of lipid research in CVD.

https://www.mdpi.com/1467-3045/45/12/618

2022

Bi-level variable selection with the sparse group penalty framework, Talk, 67th GMDS Annual Conference / 13th TMF Annual Congress

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Introduction:
Bi-level selection methods account for grouped predictors in the selection process to identify relevant variable groups and highlight their predictive members. This property is particularly helpful when analyzing omics datasets, as such data is often characterized by a natural group structure due to high correlations or contextual similarities of features. One of the best known bi-level selection approaches combines the absolute shrinkage and selection operator (LASSO)1 with the group LASSO2 in an additive manner: sparse group LASSO (SGL)3.
A generalization of SGL that enables combinations of other shrinkage terms is desirable, as the LASSO components have some shortcomings that can be addressed by using alternative penalties.

Methods:
To enable the combination of various shrinkage conditions as in SGL, a framework for sparse group penalties (SGP) is proposed. Within this framework, we have combined the minimax concave penalty (MCP)4, the smoothly clipped absolute deviation (SCAD)5, the exponential penalty (EP)6 and their group versions analogous to SGL. The emerging methods are the sparse group MCP (SGM), the sparse group SCAD (SGS) and the sparse group EP (SGE). A local linear approximated coordinate descent7 was implemented in C++ to solve their objective functions for linear and logistic regressions. Simulated datasets were used to determine optimal values for the tuning parameter α, a mixing parameter that determines the influence of the group information in the selection process. The performance of the new methods in variable and group selection was compared with other bi-level selection methods (group exponential LASSO6, composite MCP7 and group Bridge8) in simulation studies. Finally, the novel approaches were applied to the problem of detecting regulated lipids in an interventional trial (EmDia study, ClinicalTrials.gov Identifier: NCT02932436).

Results:
Low values for α such as 1/10 lead to a group-level emphasized selection of the SGPs, while higher values such as 1/2 lead to better results at the variable-level. Setting α to 1/3 provides a balanced performance at both levels. Using this value, SGE was superior for variable and group selection in almost all cases where the number of variables was less than that of observations. In settings where there were more variables than observations, SGE was the best approach when few groups were relevant, SGM when a moderate number of groups were predictive, and SGS when many groups contained predictive signals. Classical SGL was consistently inferior to the other bi-level selection methods in regard to variable and group selection, but its predictive performance was strong in some situations. In the applied example, the results of the SGPs differ especially in their sparsity on the group and variable level. SGE generated the most parsimonious model followed by SGM and SGS, while SGL created the largest model.

Conclusions:
Replacing the LASSO components in SGL with other shrinkage terms provides improvements in multiple performance criteria, making methods such as SGM, SGS, and SGE preferable over SGL. The advantages of these novel techniques are underscored by their ability to achieve better performance than alternative bi-level selection approaches, which the original SGL fails to do.

2022

Sparse group penalties for bi-level variable selection, Poster, MSCoreSys Status Meeting 2022

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

"Introduction
An important characteristic of many omics data sets is their intrinsic group structure due to high correlations or contextual similarities of features. Bi-level selection methods account for such groupings in the selection process to identify relevant variable groups and highlight their predictive members. One of the best known approaches of this kind combines the absolute shrinkage and selection operator (LASSO) with the group LASSO in an additive manner: sparse group LASSO (SGL). Since LASSO has some shortcomings that can be addressed by using alternative penalties, a generalization of SGL that enables combinations of other shrinkage terms is desirable.
Methods
We propose a framework for sparse group penalties (SGP) that allows the combination of different SGL-style shrinkage conditions. Within this framework, we have combined the minimax concave penalty (MCP), the smoothly clipped absolute deviation (SCAD), the exponential penalty (EP) and their group versions analogous to SGL: sparse group MCP (SGM), sparse group SCAD (SGS) and sparse group EP (SGE). Corresponding objective functions were solved using the locally approximated coordinate descent, which we implemented in C++. The performance of the new methods in variable and group selection was compared with other bi-level selection methods (group exponential LASSO, composite MCP and group Bridge) in simulation studies.
Results
SGE demonstrated superiority for variable and group selection in almost all settings where the number of observations exceeded the number of variables. In cases where there were fewer observations than variables, SGE was the best method when few groups contained predictive signals, SGM when a moderate amount of groups were relevant, and SGS when many groups were predictive. The classical SGL was always inferior to the other bi-level selection techniques in terms of variable and group selection, but its predictive performance was convincing in some situations.
Conclusions
Replacing the LASSO components in SGL with other penalties offers advantages with respect to several performance criteria, making approaches such as SGM, SGS, and SGE advisable over SGL. The benefits of these novel techniques are underlined by their ability to achieve better results than alternative bi-level selection methods, which SGL fails to do."

2022

Proteolizard - A python-based framework for access, processing and visualization of timsTOF raw data, Poster, Status meeting

Author: "David Teschner, Konstantin Bob, Jennifer Leclaire, Thomas Kemmer, Mateusz K.Łącki, Michał Startek, David Gomez-Zepeda, Bertil Schmidt, Stefan Tenzer, Andreas Hildebrandt

Abstract:
Valuable insight into high-dimensional driving factors of diseases like heart failure are to be gained by analysing samples using high-throughput omics technologies. The newly introduced timsTOF mass spectrometer is a notable device implementing such technology. Here, peak capacity and acquisition speed are of the greatest experimental interest but at the same time increase the dimensionality of generated datasets through the addition of ion mobility measurements.
It is crucial for the processing of the underlying data not to be constrained by its increased complexity and volume while retaining the ability to be flexibly integrated into existing workflows. We therefore present Proteolizard: a collection of software tools integrating high-performance C++ code with user friendly Python bindings. They enable seamless integration of timsTOF raw data into the Python-centric stack of machine learning libraries such as TensorFlow, PyTorch or scikit-learn. This allows for an effective utilization of multi-core systems or accelerators such as GPUs and implementation of new algorithms based on e.g., deep learning.

2022

Protective behavior and SARS-CoV-2 infection risk in the population - Results from the Gutenberg COVID-19 study

During the SARS-CoV-2 pandemic, preventive measures like physical distancing, wearing face masks, and hand hygiene have been widely applied to mitigate viral transmission. Beyond increasing vaccination coverage, preventive measures remain urgently needed. The aim of the present project was to assess the effect of protective behavior on SARS-CoV-2 infection risk in the population.

https://pubmed.ncbi.nlm.nih.gov/36316662/

2022

Cardiovascular profiling in the diabetic continuum: results from the population-based Gutenberg Health Study.

The study sample comprised 15,010 individuals aged 35-74 years of the population-based Gutenberg Health Study. Subjects were classified into euglycaemia, prediabetes and T2DM according to clinical and metabolic (HbA1c) information. The prevalence of prediabetes was 9.5% (n = 1415) and of T2DM 8.9% (n = 1316). Prediabetes and T2DM showed a significantly increased prevalence ratio (PR) for age, obesity, active smoking, dyslipidemia, and arterial hypertension compared to euglycaemia (for all, P < 0.0001). In a robust Poisson regression analysis, prediabetes was established as an independent predictor of clinically-prevalent cardiovascular disease (PRprediabetes 1.20, 95% CI 1.07-1.35, P = 0.002) and represented as a risk factor for asymptomatic cardiovascular organ damage independent of traditional risk factors (PR 1.04, 95% CI 1.01-1.08, P = 0.025). Prediabetes was associated with a 1.5-fold increased 10-year risk for cardiovascular disease compared to euglycaemia. In Cox regression analysis, prediabetes (HR 2.10, 95% CI 1.76-2.51, P < 0.0001) and T2DM (HR 4.28, 95% CI 3.73-4.92, P < 0.0001) indicated for an increased risk of death. After adjustment for age, sex and traditional cardiovascular risk factors, only T2DM (HR 1.89, 95% CI 1.63-2.20, P < 0.0001) remained independently associated with increased all-cause mortality.

https://link.springer.com/article/10.1007/s00392-021-01879-y

2022

A systematic review and evaluation of statistical methods for group variable selection

Abstract
This review condenses the knowledge on variable selection methods implemented in R and appropriate for datasets with grouped features. The focus is on regularized regressions identified through a systematic review of the literature, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A total of 14 methods are discussed, most of which use penalty terms to perform group variable selection. Depending on how the methods account for the group structure, they can be classified into knowledge and data-driven approaches. The first encompass group-level and bi-level selection methods, while two-step approaches and collinearity-tolerant methods constitute the second category. The identified methods are briefly explained and their performance compared in a simulation study. This comparison demonstrated that group-level selection methods, such as the group minimax concave penalty, are superior to other methods in selecting relevant variable groups but are inferior in identifying important individual variables in scenarios where not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as group bridge. Two-step and collinearity-tolerant approaches such as elastic net and ordered homogeneity pursuit least absolute shrinkage and selection operator are inferior to knowledge-driven methods but provide results without requiring prior knowledge. Possible applications in proteomics are considered, leading to suggestions on which method to use depending on existing prior knowledge and research question.

https://onlinelibrary.wiley.com/doi/full/10.1002/sim.9620

2022

Subtype-specific plasma signatures of platelet-related protein releasate in acute pulmonary embolism

There is evidence that plasma protein profiles differ in the two subtypes of pulmonary embolism (PE), isolated PE (iPE) and deep vein thrombosis (DVT)-associated PE (DVT-PE), in the acute phase. The aim of this study was to determine specific plasma signatures for proteins related to platelets in acute iPE and DVT-PE compared to isolated DVT (iDVT).

https://pubmed.ncbi.nlm.nih.gov/36274391/

2022

Locality-sensitive hashing enables efficient and scalable signal classification in high-throughput mass spectrometry raw data

Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: first, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Furthermore, existing approaches for signal detection usually rely on strong assumptions concerning the signals properties.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-022-04833-5

2022

Mass Spectrometry to investigate the Heart Failure Syndrome within the DIASyM Core, Poster, Status meeting, Heidelberg

Author: Thierry Schmidlin, Elisa Araldi, Laura Bindila, Miguel A. Andrade-Navarro, Andreas Hildebrandt, Stefan Kramer, Philipp S. Wild, Stefan Tenzer

Heart failure (HF) affects 15 million people in Europe and is associated with a substantial public health burden and poor survival prognosis. Its molecular mechanisms are largely unknown, requiring a systems-medicine approach utilizing machine learning methods to decipher molecular pathophysiologic sub-phenotypes based on a comprehensive multi-OMICs phenomapping. The DIASyM research core develops, optimizes and standardizes ion-mobility enhanced mass spectrometric workflows for multi-OMICs-based biomaterial characterization on the proteome, lipidome and metabolome level using primarily data-independent acquisition workflows. We generate unbiased mass spectrometric sample records on multi-OMICS level, which provide a rich resource for data mining and modelling. DIASyM develops the software and methods for data processing, interpretation and high-level integration, which will be independently validated in other research cores to ensure cross-center compatibility and reproducibility. This poster describes the activities of the DIASyM groups developing novel mass spectrometry-based approaches to unravel the molecular mechanisms of the heart failure syndrome.

2022

ROSS PLATFORM COMPARISON OF PEPTIDE ION MOBILITY SEPARATION, Poster, ASMS, Minneapolis.

Author: Hein D., Distler U., Gomez-Zepeda D., Łącki M.K., Tenzer S.

Introduction
Currently, several types of ion mobility separation devices are integrated into commercial instrument platforms which are used routinely in proteomic analyses, including field asymmetric ion mobility separation (FAIMS), trapped ion mobility separation (TIMS) and traveling wave (TWIMS). We analyzed the separation capabilities of the three ion mobility separation devices and investigated correlations between ion mobility separation of tryptic peptides derived from complex proteomic samples on three different instrument platforms, including Waters Synapt G2-S, Bruker TimsTOF Pro 2 and Thermo Orbitrap Exploris 480 with FAIMS Pro.

Mass spectrometric analysis
Tryptic digests (200 ng) of HeLa cells were analyzed by LC-IMS-MS on three different platforms. On the TimsTOF Pro 2, samples were injected on a nanoElute system using a 20 cm Aurora column and separated by a 47 min gradient, analyzed in PASEF-DDA mode and processed with MaxQuant. For the Synapt G2-S HDMS, samples were injected on a nanoAcquity UPLC system and separated using a 2-hour method on a HSS-T3 column. Samples were measured in triplicates in UDMSE mode and data were analyzed with PLGS and IsoQuant. To investigate the separation capabilities of the FAIMS Pro device, 31 FAIMS voltages were covered in individual injections each on a Ultimate3000-Exploris 480 platform.

Methods
The whole data analysis was performed in R. For a first overview of the data Fig. 1 sketches the dependence between peptide appearance in successive FAIMS voltages. In the next step we wanted to clarify if we can see this property in the inverse ion mobility (observed on other platforms) as well. With the distribution plot of the inverse ion mobility per FAIMS voltage (Fig. 2) we show that the FAIMS voltage correlates with the inverse ion mobility. In Fig.3 we considered additional dependence of other well-known factor correlated with inverse ion mobilities - the mass to charge ratios. A natural question to ask, if the two can simultaneously correlate with the appearance of peptides at a given FAIMS voltage. Fig. 3 showed that this was the case.

Conclusion
By comparing observed ion mobilities and respective optimal FAIMS compensation voltages for more than 40.000 tryptic peptides across three different types of ion mobility separation devices, we analyzed their distinct separation properties for peptides in the gas phase. All three IMS separators apply slightly different separation conditions in the gas phase, which are reflected in systematic differences in the observed ion mobilities. Our analysis reveals higher correlation between traveling wave and trapped ion mobility separators, while lower correlations are observed between optimal compensation voltages in a FAIMS-Pro device and ion mobility values reported by both TIMS and TWIMS-based instruments. Our correlation analysis revealed positive correlation between the optimal FAIMS voltages for each peptide and their respective ion mobility measurements on other platforms. In detail, the correlation between the Orbitrap and the Synapt G2-S HDMS was around R=0.86 and between the Orbitrap and the TimsTOF Pro-2 R=0.70. Highest correlation was observed between ion mobility measurements of TimsTOF Pro-2 and Synapt G2-S HDMS (R=0.95).

2022

High-Throughput human plasma proteome analysis using FAIMS pro interface, Poster, IMSC, Maastricht

Author: Hein D., Distler U., Kumm E., Łącki M., Gomez-Zepeda D., Tenzer S.

Due to the composition and associated properties of plasma, mass spectrometry-based proteomic analysis of human plasma is challenging. Compared to other tissues, the otherall number of proteins is lower and their concentrations vary considerably. For these reasons, high-throughput analysis methods of plasma samples need to be constantly monitored and adapted in order to reach maximal
proteome coverage.
To achieve a high number of detected peptides and proteins in high-throughput analysis we decided to use the FAIMS Pro Interface. The FAIMS Pro Interface uses an asymmetric electric field on the cylindrical electrode to separate ions by their ion mobility, including only ions with a specific ion mobility. For optimal results, different voltages of the electrode and mass over charge filter combinations were tested.
DIA-NN2 and MaxQuant were used for raw data analysis. DIA-NN employs neural networks and interference correction and is thus particularly well suited for high-throughput set up, allowing fast analysis and deep coverage of proteomes. MaxQuant is a quantitative proteomic software which is able to analyze large datasets and uses serval labeling techniques.
The number of identified peptides and proteins in human plasma obtained with FAIMS Pro Interface differs by an order of magnitude compared to standard methods. For specific voltages, our data shows a significant increase of identified peptides and proteins compared to the standard method.
We demonstrate that a meticulous selection of two or three FAIMS voltages results in significantly higher numbers of uniquely identified proteins and peptides. If paired with additional mass to charge filtering, the number of identified proteins and peptides drops slightly, while boosting the quality of findings.

2022

Identifying predictive signals in grouped datasets with the novel Sparse Group Smoothly Clipped Absolute Deviation, Talk, 17th Annual Conference of the DGEpi

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Introduction:
Many datasets exhibit a group structure, like lipid markers that are interrelated based on chemical and biochemical principles. Considering such groupings in a model identification task improves selection, but existing methods are not sufficient for the need, especially when the dataset contains more variables than observations. We propose the Sparse Group Smoothly Clipped Absolute Deviation (SGS), to improve selections in such settings.

Methods:
A simulation study was conducted to optimize α, the tuning parameter of SGS. To evaluate their performance in model selection, SGS, Sparse Group LASSO (SGL), composite Minimax Concave Penalty (cMCP), and Group Exponential LASSO (GEL) were compared in artificial datasets. The approaches were then applied to data from a randomized clinical, phase-IV trial in individuals with diabetes mellitus to identify lipids and lipid groups regulated by empagliflozin intake (EmDia; NCT02932436). Correct and permuted groupings were provided to investigate the impact of groupings on the selection process of the approaches.

Results:
SGS with tuned parameters (α value set to 1/3, λ determined with 10-fold cross-validation) was superior to other model selection techniques in many of the simulated settings. Especially when many variables were related to the response, SGS convinced in variable and group selection performance. When applied to the use case, SGS identified more lipids (selected features, N=16) compared to cMCP (2) and GEL (1) when grouping was correct and obtained similar results when grouping was incorrect (2, 2, 1). SGL created the largest model in both situations (36, 6).

Conclusions:
SGS incorporates groupings stronger than cMCP and GEL in the selection process without the risk of selecting suspicious signals in settings with incorrect group formations. Since these findings are based on simulation studies and a real-world use case, SGS can be recommended for selection tasks with prior knowledge of groupings and datasets with more features than observations.

2021

Interpretability of bi-level variable selection methods, Poster, 16th Annual Conference of the DGEpi

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Background: Many datasets possess a natural group structure due to high correlations or contextual similarities of variables. Incorporating this information in a selection process enables the identification of relevant variable groups and also relevant members of those groups. It has been argued that incorporating such prior knowledge can improve the interpretability of the selection output, but this hypothesis has not yet been investigated for bi-level selection methods. A comparison of bi-level selection methods with the gold standard LASSO for variable selection can provide insights into the interpretability of the selection results.
Methods: Composite Minimax Concave Penalty (cMCP), Group Exponential LASSO (GEL), Sparse Group LASSO (SGL), and LASSO as reference method were used to select predictors in a time-to-event (survival), regression (linear trait) and classification (binary trait) task. For this purpose, three group formations based on prior knowledge, correlation structure, or random assignment were provided. Selections were done in 1.000 bootstrap samples derived from a cohort of 1.001 patients (MyoVasc-study; NCT04064450). Interpretability of the generated models was assessed by selection accuracy, group consistency, and collinearity tolerance.
Results: Bi-level selection methods outperformed LASSO in all three dimensions of interpretability, for most selection tasks considered. Here, cMCP demonstrated superiority in selection accuracy in most applications, while GEL and SGL were superior in group consistency and collinearity tolerance. The performance of bi-level selection methods was maintained even when group formation was inaccurate.
Conclusions: If there is interest in interpreting the selection results and information on relationships between variables is available, the use of bi-level selection methods seems to be recommended over LASSO. This is due to their ability to treat variables of a group consistently and the tendency to select correlated variables together.

2021

A systematic review and evaluation of methods for group variable selection, Talk, 66th GMDS Annual Conference / 12th TMF Annual Congress

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Introduction:
Many datasets have a natural group structure due to high correlations or contextual similarities of variables, like in proteomics. Group variable selection methods are able to account for such structure in the selection process to identify variables that are related to each other and share a common and traceable relationship with the response variable.
To date, only selective comparisons of group variable selection methods are available, but a review is needed that systematically identifies and evaluates the wide range of existing approaches.

Methods:
A structured literature search was conducted, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses recommendations, to identify group variable selection methods that were sufficiently programmed and suitable for studying Gaussian, binomial, or time-to-event data types.
The selection performance of the identified methods was evaluated based on the correlation between true and generated models within simulation studies defined by a varying number of variables associated with a Gaussian distributed response variable.

Results:
The systematic literature review revealed 14 methods for selecting variable groups, which can be classified into knowledge-driven and data-driven approaches. The first category includes group-level and bi-level selection methods that use pre-defined group formations, while two-step and collinear tolerant approaches constitute the second category, which use the correlation structure of the data to select related variables. Group-level and two-step approaches select all or none of the variables in a group, while bi-level and collinear tolerant methods propose sparsity even within groups of variables.
Simulation studies demonstrate that group-level selection methods, such as Group MCP, are superior to other methods in selecting relevant variable groups, but are inferior in identifying important individual variables once not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as Group Bridge. Two-step and collinearity tolerant approaches such as the Elastic Net and Ordered Homogeneity Pursuit LASSO are inferior to knowledge-driven methods but provide comparable results without prior knowledge.

Discussion:
Methods in all four categories are suitable for analyzing data with variables that have a natural group structure. The choice of the appropriate method depends on the objective and the availability of prior information. If the interest is to identify related variables associated with a response variable, group-level selection, and two-step methods are recommended, while bi-level selection and collinear tolerant methods are appropriate, when identifying variables associated with a given response from a structure of related variables is of interest.
Since the results of the simulation study indicate that inclusion of prior information improves the selection process, such information should be used when available. A potential application for analyzing omics datasets could use the information on coexpression or biological function to group variables.

Conclusions:
A variety of methods can incorporate a natural group structure of predictors in selection. This improves selection, especially when the group structure is known and does not need to be estimated via the correlation structure. Since the identified methods are specialized for different situations, the choice of an appropriate method strongly depends on the research question.

2021

Variable selection using regularized regressions, Workshop, MSCoreSys Summer School – “Mass spectrometry meets systems medicine”

Author: Buch, G.

The course will cover statistical learning methods that extend classical regression with regularization terms to identify predictive variables for a dependent variable in omics data. The underlying theory of the methods will be discussed, as well as practical aspects for their application, such as adjusting for confounders and accounting for interrelated variables. Hands on training will be integrated.

2021

Interpretability of bi-level variable selection methods, Poster, Status Meeting of the MSCoreSys Initiative

Author: Buch, G., Schulz, A., Schmidtmann, I., Strauch, K.,Wild, P. S.

Many datasets possess a natural group structure due to high correlations or contextual similarities of variables. Incorporating this information in a selection process enables the identification of relevant variable groups and also relevant members of those groups. It has been argued that incorporating such prior knowledge can improve the interpretability of the selection output, but this hypothesis has not yet been investigated for bi-level selection methods. A comparison of bi-level selection methods with the gold standard LASSO for variable selection can provide insights into the interpretability of the selection results.
Methods: Composite Minimax Concave Penalty (cMCP), Group Exponential LASSO (GEL), Sparse Group LASSO (SGL), and LASSO as reference method were used to select predictors in a time-to-event (survival), regression (linear trait) and classification (binary trait) task. For this purpose, three group formations based on prior knowledge, correlation structure, or random assignment were provided. Selections were done in 1.000 bootstrap samples derived from a cohort of 1.001 patients (MyoVasc-study; NCT04064450). Interpretability of the generated models was assessed by selection accuracy, group consistency, and collinearity tolerance.
Results: Bi-level selection methods outperformed LASSO in all three dimensions of interpretability, for most selection tasks considered. Here, cMCP demonstrated superiority in selection accuracy in most applications, while GEL and SGL were superior in group consistency and collinearity tolerance. The performance of bi-level selection methods was maintained even when group formation was inaccurate.
Conclusions: If there is interest in interpreting the selection results and information on relationships between variables is available, the use of bi-level selection methods seems to be recommended over LASSO. This is due to their ability to treat variables of a group consistently and the tendency to select correlated variables together.

2021

LipiDisease: associate lipids to diseases using literature mining

Lipids exhibit an essential role in cellular assembly and signaling. Dysregulation of these functions has been linked with many complications including obesity, diabetes, metabolic disorders, cancer and more. Investigating lipid profiles in such conditions can provide insights into cellular functions and possible interventions. Hence the field of lipidomics is expanding in recent years. Even though the role of individual lipids in diseases has been investigated, there is no resource to perform disease enrichment analysis considering the cumulative association of a lipid set. To address this, we have implemented the LipiDisease web server. The tool analyzes millions of records from the PubMed biomedical literature database discussing lipids and diseases, predicts their association and ranks them according to false discovery rates generated by random simulations. The tool takes into account 4270 diseases and 4798 lipids. Since the tool extracts the information from PubMed records, the number of diseases and lipids will be expanded over time as the biomedical literature grows.

https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btab559/6343440

2021

MaxDIA enables library-based and library-free data-independent acquisition proteomics

MaxDIA is a software platform for analyzing data-independent acquisition (DIA) proteomics data within the MaxQuant software environment. Using spectral libraries, MaxDIA achieves deep proteome coverage with substantially better coefficients of variation in protein quantification than other software. MaxDIA is equipped with accurate false discovery rate (FDR) estimates on both library-to-DIA match and protein levels, including when using whole-proteome predicted spectral libraries. This is the foundation of discovery DIA-hypothesis-free analysis of DIA samples without library and with reliable FDR control. MaxDIA performs three- or four-dimensional feature detection of fragment data, and scoring of matches is augmented by machine learning on the features of an identification. MaxDIA's bootstrap DIA workflow performs multiple rounds of matching with increasing quality of recalibration and stringency of matching to the library. Combining MaxDIA with two new technologies-BoxCar acquisition and trapped ion mobility spectrometry-both lead to deep and accurate proteome quantification.

https://www.nature.com/articles/s41587-021-00968-7

2021

Right atrium size in the general population

Echocardiography is the most common routine cardiac imaging method. Nevertheless, only few data about sex-specific reference limits for right atrium (RA) dimensions are available. Transthoracic echocardiographic RA measurements were studied in 9511 participants of the Gutenberg-Health-Study. A reference sample of 1942 cardiovascular healthy subjects without chronic obstructive pulmonary disease was defined. We assessed RA dimensions and sex-specific reference limits were defined using the 95th percentile of the reference sample. Results showed sex-specific differences with larger RA dimensions in men that were attenuated by standardization for body-height. RA-volume was 20.2 ml/m in women (5th-95th: 12.7-30.4 ml/m) and 26.1 ml/m in men (5th-95th: 16.0-40.5 ml/m). Multivariable regressions identified body-mass-index (BMI), coronary artery disease (CAD), chronic heart failure (CHF) and atrial fibrillation (AF) as independent key correlates of RA-volume in both sexes. All-cause mortality after median follow-up-period of 10.7 (9.81/11.6) years was higher in individuals who had RA volume/height outside the 95% reference limit (HR 1.70 [95%CI 1.29-2.23], P = 0.00014)). Based on a large community-based sample, we present sex-specific reference-values for RA dimensions normalized for height. RA-volume varies with BMI, CHF, CAD and AF in both sexes. Individuals with RA-volume outside the reference limit had a 1.7-fold higher mortality than those within reference limits.

https://pubmed.ncbi.nlm.nih.gov/34795353/

2021

OpenTIMS, TimsPy, and TimsR: Open and Easy Access to timsTOF Raw Data

The Bruker timsTOF Pro is an instrument that couples trapped ion mobility spectrometry (TIMS) to high-resolution time-of-flight (TOF) mass spectrometry (MS). For proteomics, lipidomics, and metabolomics applications, the instrument is typically interfaced with a liquid chromatography (LC) system. The resulting LC-TIMS-MS data sets are, in general, several gigabytes in size and are stored in the proprietary Bruker Tims data format (TDF). The raw data can be accessed using proprietary binaries in C, C++, and Python on Windows and Linux operating systems. Here we introduce a suite of computer programs for data accession, including OpenTIMS, TimsR, and TimsPy. OpenTIMS is a C++ library capable of reading Bruker TDF files. It opens up Bruker's proprietary codebase. TimsPy and TimsR build on top of OpenTIMS, enabling swift and user-friendly data access to the raw data with Python and R. Both programs are available under a GPL3 license on all major platforms, extending the possibility to interact with timsTOF data to macOS. Additionally, OpenTIMS is capable of translating Bruker data into HDF5 files that can be easily analyzed from Python with the vaex module. OpenTIMS and TimsPy therefore provide easy and quick access to Bruker timsTOF raw data.

https://pubs.acs.org/doi/10.1021/acs.jproteome.0c00962

2021

Pyproteolizard - A Python interface for high-performance processing of timsTOF raw data, Poster, Status Meeting

Author: David Teschner, Konstantin Bob, Jennifer Leclaire, Thomas Kemmer, Mateusz K.Łącki, Michał Startek, Bertil Schmidt, Stefan Tenzer, Andreas Hildebrandt

Abstract:
Valuable insight into high-dimensional driving factors of diseases like heart failure are to be gained by analyzing samples using high-throughput omics technologies. The newly introduced timsTOF mass spectrometer is a notable device implementing such technology. Here, peak capacity and acquisition speed are of the greatest experimental interest but at the same time increase the dimensionality of generated datasets through the addition of ion mobility measurements.
It is crucial for the processing of the underlying data not to be constrained by its increased complexity and volume while retaining the ability to be flexibly integrated into existing workflows. We therefore present (Py)proteolizard: a mix of high performance processing tools written in C++ together with user-friendly Python bindings. It allows for a seamless integration of timsTOF data with algorithms from the locality-sensitive hashing and deep-learning family. Furthermore, it enables a fast visual inspection of data slices such as mass spectrometry (MS|) features.