Researchers from the BIRD team in the attoworld-group led by Tarek Eissa, have developed an innovative strategy to improve machine learning applications in molecular diagnostics. Variability in biological samples and measurement techniques has long hindered prediction accuracy when models encounter out-of-distribution data—observations that deviate from the scope the model was trained on.
In their latest article, published in PNAS Nexus, the authors developed a method called Contextual Out-of-Distribution Integration (CODI) to address this challenge. CODI is a novel approach designed to enrich training datasets with simulated variations that reflect real-world complexities. This strategy allows machine learning models to better recognize and interpret a broader array of real-world samples, transforming data variability from an obstacle into an asset. The team successfully applied CODI to infrared spectroscopic data of human blood samples, demonstrating significant improvements in prediction accuracy.
This advancement offers new potential for more robust diagnostic tools and practical applications of machine learning in molecular analytics, particularly useful in the realm of medical diagnostics.
Illustration: Tarek Eissa
Original Publication:
CODI: Enhancing machine learning-based molecular profiling through contextual out-of-distribution integration
T. Eissa, M. Huber, B. Obermayer-Pietsch, B. Linkohr, A. Peters, F. Fleischmann & M. Žigman
PNAS Nexus 3, pgae449 (2024)