1st EUOS/SLAS Joint Challenge: Compound Solubility

The EU Open Screen Data Mining Competition Results

Robert Harmel

Scientific Project and Industry Liaison Manager

EU-OPENSCREEN

Robert Harmel is a scientific project manager responsible for academic screening projects and industry engagement at EU-OPENSCREEN ERIC, a non-profit European research infrastructure for drug discovery. He studied organic chemistry at Nijmegen University and received a PhD in chemical biology from the Berlin Leibniz Institute for molecular pharmacology in 2020. Robert has experience in organic synthesis and the development of new analytical tools and assays for mass spectrometry, NMR and fluorescence. He is a co-author on 15 peer-reviewed scientific publications and comes with a broad understanding of chemistry, biology and analytics.

Wenyu Wang

Researcher

University of Helsinki

Wenyu Wanag is a researcher and data scientist working at the University of Helsinki. He has seven years of project experience in applying modern data science techniques to help cancer drug discovery. He is the winner of three international data science competitions on drug targets, drug sensitivity, and patient stratification.

Andrea Kopp

Student

LMU

Andrea Kopp has been interested in the life sciences since they studied chemistry and biochemistry at the Ludwigs-Maximilians-Universität (LMU) in Munich in 2016. Later, they joined the computer science program at LMU, where they recently finished their Bachelor of Science. In a joint project with Helmholtz Munich and LMU, Andrea compared the performance of different models within the Kaggle challenge: "1st EUOS/SLAS Joint Challenge: Compound Solubility", and developed the winning model. At the SLAS Europe 2023, they present their contribution to the challenge. Currently, Andrea is at the pharmacy department of LMU with a focus on finding novel potent drug candidates with a generative recurrent neural network.

Key:

Complete
Failed
Available
Locked
EU-OPENSCREEN and Open Data
Open to view video.
Open to view video. Most Life Science researchers do not have easy access to suitable drug screening platforms and compound collections which are generally expensive to purchase and maintain and require specialist expertise for operation. This often represents a major limitation in the field of chemical tool development in academia and slows down innovative drug discovery projects. With the aim to address these needs, researchers from more than 20 academic institutes in Europe have joined forces to launch EU-OPENSCREEN in 2018. Under the legal entity of an ERIC (European Research Infrastructure Consortium), this new organization is designed for long-term scientific and financial sustainability and is open to collaborations with academic and industry scientists from all over the world. It offers access to a wide range of screening platforms for biochemical, biophysical and cellular screening, commercial and academic compound libraries, and medicinal chemistry expertise. One of our founding principles is to provide all our high-quality screening data in our open-access European Chemical Biology Database (ECBD) that is based on the FAIR principles. Along these lines we created a large solubility data set from our rationally selected collection of 100.000 commercial compounds. To harness the value of this data set, we ran a data mining competition together with the SLAS on the Kaggle platform where 100 teams participated to develop accurate prediction methods for such data sets.
The Challenge of Predicting Compound Solubility
Open to view video.
Open to view video. In this section, I will give an overview of the challenge we are facing for solubility prediction with EU-Openscreen data, the result evaluation, and the winner selection.
Winning model: OCHEM-generated consensus model
Open to view video.
Open to view video. We participated in the 1st EUOS/SLAS joint challenge on Kaggle, which was a three-way classification task based on the solubility of 100k small drug-like compounds. During our participation, we intensely used the platform OCHEM (On-line chemical database with modeling environment) to generate the best descriptor-based method throughout the competition. The final predictions were generated by a consensus model using the average over 28 single models.
Round Table Discussion
Open to view video.
Open to view video.