2019 AI in Process Automation Symposium

The SLAS 2019 AI in Process Automation Symposium course package contains 18 presentations on: 

AI in Drug Discovery Today 

Data Automation 

AI in Screening 

AI in Chemistry 

Based on presenter permission, 18 of the 23 total SLAS 2019 AI in Process Automation Symposium presentations are available on-demand. The SLAS AI in Process Automation Program Committee selects conference speakers based on the innovation, relevance and applicability of research as well as those that best address the interests and priorities of today’s life sciences discovery and technology community. All presentations are published with the permission of the presenters.

Alán Aspuru-Guzik, Ph.D.

Alán Aspuru-Guzik, Ph.D.

University of Toronto

Alán Aspuru-Guzik’s research lies at the interface of computer science with chemistry and physics. He works in the integration of robotics, machine learning and high-throughput quantum chemistry for the development of “self-driving laboratories”, which promise to accelerate the rate of scientific discovery. Alán also develops quantum computer algorithms for quantum machine learning and has pioneered quantum algorithms for the simulation of matter. Jointly appointed Professor of Chemistry and Computer Science at the University of Toronto. Previously, full professor at Harvard University. Co-founder of Zapata Computing and Kebotix, two early-stage ventures in quantum computing and self-driving laboratories respectively.

Jeremy Jenkins, Ph.D.

Executive Director

Novartis Institutes for BioMedical Research

Jeremy is Executive Director and Head of Data Science in Chemical Biology & Therapeutics at the Novartis Institutes for BioMedical Research. He joined Novartis 16 years ago, following a postdoc at Harvard Medical School and a PhD in Molecular Genetics from The Ohio State University. His global data science team collaborates in early discovery areas of Computational Biology, Cheminformatics, Imaging, and Machine Learning. Jeremy received the 2011 Corwin Hansch Award for contributions to the field of QSAR, and is a recipient of the Novartis Leading Scientist award.

Casandra Mangroo, Ph.D.

Head of Science

BenchSci

Casandra is Head of Science at BenchSci. She applies research experience from her Ph.D. in Virology from the University of Toronto as the Science Team lead and product manager of BenchSci’s knowledge graph. Casandra is closely involved with developing the machine learning training sets and is responsible for the integrity and comprehensiveness of the scientific data on the BenchSci platform. She also works closely with BenchSci commercial leadership, R&D and engineering teams to implement new data initiatives, drive innovation and expand platform applications.

Casimir Wierzynski, Ph.D.

Senior Director AI Products Group

Intel

Casimir Wierzynski is Senior Director, Office of the CTO, in the Artificial Intelligence Product Group at Intel. He leads research efforts to identify, synthesize, and incubate emerging technologies that will enable the next generation of AI systems. Before joining Intel in 2017, Cas led research teams in neuromorphic computing and autonomous robotics at Qualcomm.

Cas received his BS and MS in electrical engineering at MIT, a BA in mathematics at Cambridge University as a British Marshall Scholar, and a PhD at Caltech in Computation and Neural Systems.

Steven Kuntz, Ph.D.

Sr. Scientist

Ionis Pharmaceuticals

Steven Kuntz conducts sequencing library creation, analysis, and protocol development at Ionis Pharmaceuticals in support of Functional Genomics and Drug Discovery. He received his PhD working on the robustness of muscle differentiation networks in the labs of Barbara Wold and Paul Sternberg at Caltech and did a postdoc on temperature-responsive developmental regulatory control with Michael Eisen at UC Berkeley.

Joshua Kangas, Ph.D.

Assistant Teaching Professor

Carnegie Mellon University

A successful Computational Biology researcher will have a strong understanding of computational techniques and the biological processes and techniques underlying the development of the data they are analyzing. In the courses Joshua teaches at Carnegie Mellon University, students generate experimental data in wet-lab experiments and learn to apply computational techniques for the analysis of those data. Joshua constantly looks for ways to make our course offerings as modern as possible in both the wet-lab and computational techniques learned by students. At times, this has included modernizing classic labs or generating new modules based on novel research. He teaches courses for high school students, undergraduates, and graduate students (M.S. and Ph.D.).

He was also integral in the design and setup of the Automation Lab used by the M.S. Automated Science program at Carnegie Mellon University.

Juan Caicedo

Research Fellow

Broad Institute of Harvard

Juan Caicedo is a Research Fellow at the Broad Institute of MIT and Harvard, where he investigates the use of deep learning to analyze microscopy images. Previous to this, he studied object detection problems in large scale image collections also using deep learning, at the University of Illinois in Urbana-Champaign. Juan obtained a PhD from the National University of Colombia and completed research internships in Google Research, Microsoft Research, and Queen Mary University of London as a grad student, working in problems related to large scale image classification, image enhancement, and medical image analysis. His research interest include computer vision, machine learning and computational biology.

Oren Kraus, BASc, MASc, Ph.D.

Cofounder CTO

Phenomic AI

Oren Kraus co-founded Phenomic AI after completing his Ph.D. in Dr. Brendan Frey's lab at the University of Toronto. His research focused on applying deep learning to high-throughput microscopy screens used in drug discovery and cell biology research. Together with Jimmy Ba and collaborators at the Donnelly Centre for Cellular and Biomolecular Research (CCBR), Oren was one of the first to publish the application of deep learning to microscopy data. Oren founded Phenomic AI with the goal of accelerating the interpretation of phenotypes in bio-medical images with machine learning for applications in drug discovery and cellular diagnostics.

Lakshmi Akella, Ph.D.

Senior Scientist

Biogen

Lakshmi Akella is a computational chemist and is currently working as a Senior Scientist at Biogen in the Biotherapeutic & Medicinal Sciences Division. Prior to working at Biogen she worked at H3 Biomedicine, Broad Institute and Tripos.

She holds advanced degrees in Organic Chemistry & Software Engineering and conducted postdoctoral research at University of Minnesota.

She has several years of experience in enabling molecule design. She is specifically involved in ligand and structure-based design, virtual screening, library design, algorithm development, predictive modeling, high throughput screening technologies and cheminformatics in the industry.

Stuart Chambers

Sr. Scientist

Amgen

Stuart Chambers is a Senior Scientist in the Genome Analysis Unit at Amgen with a background in stem cells, development, and neuroscience. His group uses induced pluripotent stem cells to model aspects of disease for drug development and works to create, evaluate, and disseminate technology for Amgen.

Connor Coley, Ph.D.

Postdoctoral Associate

MIT

Connor W. Coley is a postdoctoral associate at the Broad Institute of MIT and Harvard. His work in computer assistance and automation for organic synthesis has included the development of a data-driven synthesis planning program and in silico strategies for predicting the outcomes of organic reactions. For his work in this field, Connor has been named one of C&EN’s “Talented Twelve” and one of Forbes Magazine’s “30 Under 30” for Healthcare. His continuing research interests are in how data science and laboratory automation can be used to streamline discovery in the chemical sciences. He received his B.S. and Ph.D. in Chemical Engineering from Caltech and MIT, respectively. In 2020, he will return to MIT as an Assistant Professor in the Department of Chemical Engineering.

Sebastian Steiner, Ph.D.

Postdoctoral Research Fellow

University of British Columbia

I completed my MSc degree in Vienna, Austria, in the field of natural product synthesis, and then moved on to do my PhD in Glasgow, Scotland, in synthesis automation. I am interested in designing tools and writing software to enable lab automation, with a special focus on robotic reaction monitoring. In my spare time I train Historical European Martial Arts (sword fighting!), and I enjoy cooking and baking.

Jonathon Grob

Investigator

Novartis Institutes of Biomedical Research

I am a chemist who is passionate about leveraging my experience in Medicinal Chemistry & technology development to instigate a transformation toward automation & digital enabled drug hunting. 18 years at Novartis. Worked at 3 sites in my first 4 years. Currently working with a global group of entrepreneurial, automation loving, data science enabled team members to deliver MicroCycle 1.0 a new integrated drug discovery platform for NIBR.

Key:

Complete
Failed
Available
Locked
Keynote Speaker
Keynote Address: The Materials for Tomorrow, Today
Open to view video.
Open to view video. In this talk, I argue that for materials discovery, one needs to go beyond simple computational screening approaches followed by traditional experimentation. I have been working on the design and implementation of what I call “materials acceleration platforms” (MAPs). MAPs are enabled by the confluence of three disparate fields, namely artificial intelligence (AI), high-throughput quantum chemistry (HTQC), and robotics. The integration of prediction, synthesis and characterization in an AI-driven closed-loop approach promises the acceleration of materials discovery by a factor of 10, or even a 100. I will describe our efforts under the Mission Innovation umbrella platform around this topic.
AI in Drug Discovery Today
Artificial and Augmented Intelligence in Early Drug Discovery
Open to view video.
Open to view video. Drug discovery data continues to grow in size and diversity due to technological advances and automation. Correspondingly, opportunities are emerging for applying machine learning to classification, regression, clustering, prediction, and recommenders, in an effort to make each decision in the process more predictive. Examples are provided of where AI is being applied to support decision-making, ranging from imaging to medicinal chemistry to ‘omics, in addition to implications for changing our approach to data stewardship and data science as a discipline in the pharmaceutical industry.
AI-Assisted Antibody Selection: The Application Of Machine Learning To Accelerate The Drug Discovery Process
Open to view video.
Open to view video. With a 50% failure rate, inappropriate antibodies used in scientific experiments waste millions of research dollars and can delay the drug discovery process by months. Research scientists across industry and academia turn to scientific publications and related original research documents to find evidence for antibodies proven to work in their specific experimental context. However, existing publication and antibody search tools are limited in their ability to decode scientific experiments and scientists are forced to manually comb through hundreds, if not thousands of results, to find the information they need to choose the best antibody for their research. BenchSci’s AI is trained by Ph.D. researchers in the life sciences to identify which antibodies have been successfully used in specific experimental contexts. Advances in the fields of computation and machine learning paved the way for the development of BenchSci’s proprietary image and text-based machine learning algorithms along with bioinformatics ontologies, to extract relevant experimental data from original research documents. This information is contextualized within a knowledge graph that powers an AI-assisted antibody selection platform, enabling scientists to rapidly select appropriate products, improve the efficiency of target validation experiments and drive projects forward. Our goal in the development of this technology, is to shorten the R&D and pre-clinical phases by giving research scientists the ability to leverage the power of AI technology to streamline the experimental design process and ultimately bring treatments to market faster.
Data Automation
Privacy Preserving Machine Learning and Analytics
Open to view video.
Open to view video. One of the key challenges in deploying machine learning (ML) at scale is how to help data owners learn from their data while protecting their privacy. This issue has become more pressing with the advent of regulations such as the General Data Protection Regulation. It might seem as though "privacy-preserving machine learning" would be a self-contradiction: ML wants data, while privacy hides data. Researchers from academia and industry have been marrying ideas from cryptography and machine learning to provide the seemingly paradoxical ability to learn from data without seeing it, and to learn aggregate properties of populations without learning about any particular individual. In this talk we will review three privacy-preserving ML techniques--homomorphic encryption, multi-party computation, and differential privacy--and review the recent rapid progress that has been made in all three as they transition from research topics into production tools for data scientists working with sensitive data.
Increasing Sequencing Throughput By Making Robots Work Harder, But Always Checking Their Work
Open to view video.
Open to view video. Next-generation sequencing cost declines and capacity increases for the first time make possible high-replicate genome-wide gene expression profiling during drug discovery. This makes library-creation, rather than sequencing, a limiting factor in data generation. The reagent and time costs of sequencing library creation, along with substantial data processing requirements, must be combated with infrastructure to process and analyze more samples. Miniaturization, multiplexing, and automation -- both mechanical and computational -- enable us to process an order of magnitude more samples than benchtop approaches, at nearly half the cost per sample. However, to guarantee robust data at scale, we require careful precautions, randomizations, and embedded controls to spot systematic and sporadic errors. We pre-process samples using Python scripts to organize complex studies with double-barcoding, allowing for pseudo-randomization and integration of controls from the very first step. Our process takes advantage of robotic plate control and both acoustic and displacement liquid handlers. Fully implemented, our process can generate over 2000 sequencing libraries per week. Co-Author: Sagar Damle, PhD
Automated Science Education and AI Driven Closed Loop Experimentation
Open to view video.
Open to view video. Until recently, the primary mechanism for training scientists in the use of automation was by providing experience working with automated systems in research labs currently using automation technology. At Carnegie Mellon University, we have started the first Automated Science Master’s Degree program in the country. Students in this program are trained to design and implement AI-driven closed-loop experimentation processes. They receive hands-on experience using modern automation equipment to perform experiments. They are trained to use machine learning and related methods for data analysis and they are trained to use artificial intelligence methods to automatically decide what experiments to run next in a campaign. To this end, we have spent the last year identifying automation equipment supplier(s) for our Automation Teaching Lab. This effort has put us in a unique position to have surveyed the current state of a lot of hardware and software in industry from the perspective of both AI-driven automation and education. For this presentation, we will discuss our findings from the perspective of education and AI-driven experimentation. From an educational perspective, we were primarily concerned with the ability to have multiple students use the system and its software in parallel. Furthermore, we considered the ability for software to be used to simulate protocols developed by students. The ability to test in simulation is particularly crucial when there are throughput or consumables limitations. Lastly, since extensive machine learning/AI computation typically cannot be performed locally on hardware included with most integrated automation systems, we also considered the ability for remote control through an application program interface (API). Related to this, we considered to what extent we could automatically move data generated on the system to an external environment for processing. To the best of our knowledge, no single company currently offers a comprehensive package addressing all of these needs. As such, we will be discussing strengths and weaknesses of industry offerings in these areas (and more) during the presentation, but we will not be discussing the offerings of specific companies. Co-Authors: Christopher J. Langmead, Robert F. Murphy, Ph.D.
AI in Screening
Image-based Profiling Using Deep Learning
Open to view video.
Open to view video. Biological images have been used in research for a long time. In the past, microscopy images were used as qualitative data to display example phenotypes or for visual inspection of experiments. Now, images are used as quantitative sources of information, providing hundreds of parameters of cell state that discriminate different conditions. In this talk, I will cover the main steps of an image analysis workflow that transforms images into single-cell measurements, and will discuss computational approaches that include modern deep learning methods that we have developed in our lab. I will also present biological applications of image-based profiling, including my own research work to study genetic mutations in lung cancer.
Automating Phenotypic Screening with AI
Open to view video.
Open to view video. Imaging-based phenotypic screening of cell-based disease models has become an indispensable tool for modern drug discovery. These assays are used in primary and secondary screening and can even be used to identify novel drug targets. Despite the adoption of automated microscopy-based screening, typically referred to as high-content screening (HCS), analyzing and interpreting the complex imaging data produced by these systems remains a challenging bottleneck. Although only requiring an hour or less to acquire images for each multi-well plate, analyzing the imaging data can take weeks and typically requires hands-on programming by data scientists and computer vision experts. Advances in machine learning, specifically deep learning, have enabled the development of software platforms that can automate this process and provide valuable insights to scientists within hours of completing experiments. Here we describe a cloud-enabled end-to-end automated platform for storing, managing, analyzing, and visualizing HCS data using recent advancements in deep learning. The workflow for a typical screening experiment involves seamlessly uploading raw screening data from the HCS system to a cloud-based storage instance. From there, the data is automatically imported into a compute instance running an open-source image database and viewer developed primarily for microscopy data. The experimental metadata, including the assay plate layout, is imported simultaneously and is used to annotate all the wells in the screen. Once the screening data is in the database, two deep learning-based workflows are launched to automatically analyze the screen. The first workflow clusters single cell phenotypes in the screen and allows researchers to explore and annotate the data using an interactive scatterplot. The workflow uses weakly supervised learning, a recently developed method in which a deep convolutional neural network (CNN) is trained to classify which unique experimental condition (i.e. well) each single cell belongs to. After this network is trained, activations from intermediate layers of the network are used as a powerful lower dimensional feature representation to cluster and visualize single cells in the screen. Researchers then use the interactive visualization tool to explore and annotate phenotypes of interest in the screen. After phenotypes and treatments of interest are identified, the second deep learning workflow is used to classify them using a segmentation-free approach. Here, a deep convolutional multiple instance learning model is trained to classify entire fields-of-view in the screen based on control treatments. This classifier is then used to score the rest of the treatments screened, typically identifying hits from a drug library. This end-to-end system has been deployed on internal projects at Phenomic AI focused on assay optimization, including selecting informative immunofluorescent probes and cytokine or drug concentrations. It’s also been used to identify functional phenotypic hits from small scale drug and antibody screens. Additionally, it’s been used to explore 3D tumor spheroids and complex cell-cell interactions with data provided by our industry partners.
Application of Machine Learning Methods To Improve Hit Calling Accuracy For High Content Image Based Screens
Open to view video.
Open to view video. Formation of α-synuclein inclusions is linked to pathogenesis of Parkinson’s disease. A high-content imaging screen was conducted to probe Biogen’s chemogenomic set for inhibitors of α-synuclein inclusion formation in M17 neuroblastoma cell line stably expressing α-syn 3K-GFP. Confidence in hits was hindered by high hit rate and apparent unwanted mechanism of inclusion reduction by many compounds (e.g. cytotoxicity). Improved hit-calling accuracy is much needed in this high-information content space. We generated a work flow to process complex, large number of image-based descriptors obtained from a high-content image-based screening campaign to identify inhibitors of α-synuclein inclusion formation. In this study, along with the user defined image analysis sequence in Columbus software, we applied a combination of unsupervised and supervised machine learning models to high-content image-based screening data of Biogen’s chemogenomic set of 3070 compounds. We demonstrated the value of this approach in confirming the hits in relation to current hit-calling and evaluate additional hits.
AI in Chemistry
Label Free Induced Pluripotent Stem Cell Counting With Deep learning For Automation
Open to view video.
Open to view video. Artificial intelligence (AI) and deep learning can build smarter and faster automated systems while eliminating the number of steps required in the process. We wanted to evaluate how well an AI could predict cell numbers of human induced pluripotent stem cells (iPSCs) from phase or bright field images for the purposes of automating iPSC culture maintenance. The shift to an automated process can result in several advantages over traditional tissue culture techniques. Current state-of-the-art in stem cell culture is 100% manual labor and is rife with operator-to-operator variation limiting the experimental scale, complexity, and precision. We have designed a fully automated method that obviates the need to conduct cell counts, reduces the number of tasks needed to passage iPSCs, increases iPSC culture consistency, enables portability of protocols from one research site to another. AI has gained the most traction in image analysis and pattern recognition. Based on the supposition iPSCs look like heads in a crowd we adapted an image-based deep learning neural network successful in crowd counting. The deep learning was conducted using a training dataset that consisted of paired bright field and Hoechst stained images, where ground truth was determined by nuclear object detection. The output from the neural network transforms an image into a topological density map, added as an image channel where the total map density is equal to the total cell count across the image. This relationship of cell number to density holds true from the entire dish down to a small user-defined region of the dish. Once trained, the AI enables spatial representation of cells across the dish and at a very wide range of accuracy and cell densities encountered with iPSC cultures without the need for cell labeling. The image-based AI algorithm can determine a precise number of iPSCs within the dish with a percent average error of 5.6%. We can visually observe model training across epochs within the neural network while it is learning to create the correct density maps from the phase contrast images. It is capable of ignoring well edges, dust particulate, and intensity effects due to the meniscus, often times superior to fluorescence labeled object detection. It enables us to measure cell density more rapidly and ‘in line’ with the automation greatly reducing the time and number of steps required for automated iPSC culturing. We are now engineering a working proof-of-concept prototype capable of making a go/no go decision for stem cell passaging and split ratios based on bright field image captures.
AI Assistance for Chemical Synthesis
Open to view video.
Open to view video.
Creating and Deploying Self Driving Laboratories For Process Optimization
Open to view video.
Open to view video. Optimizing multicomponent reactions is a challenge which lends itself well to automation due to the large number of experiments usually required. In addition to classic high throughput and Design of Experiment methodologies, the application of machine learning algorithms has recently come into focus; coupled with a robotic platform and on-line analytical tools, the ideal of a fully autonomous robotic researcher comes within reach. While historic examples of this concept have often been tailored to answer rather narrow questions, we herein present a powerful and versatile platform for interrogating a wide range of reaction systems. Using an N9 robot by North Robotics as our central platform, we have developed a range of tools enabling us to formulate reactions, take samples, and perform on-line HPLC and MS/MS analysis, all under full automation. Results are passed on to ChemOS, a machine learning engine, which evaluates collected data and provides a new set of experimental conditions, generating a closed feedback loop enabling rapid optimization of complex reaction systems. To showcase the power of our platform, we have deployed several variations of this autonomous toolkit to tackle a diverse range of chemical applications. In this first demonstration, these include optimization of reaction parameters for catalytic transformations and discovery of champion thin-film materials for organic photovoltaic devices. Current progress and development will be discussed.
Development of MicroCycle 1.0: A New Tool for Drug Discovery
Open to view video.
Open to view video. Drug discovery requires developing a holistic understanding of the interaction between chemical and biological space. In the early phase, it is critical to develop this knowledge in a rapid and unbiased way, while benefiting from 150 years of drug hunting experience. We envisaged an iterative approach for project teams to navigate early drug discovery. Each iteration begins with a digital - human collaboration to select chemical starting points (scaffolds & BBs), leading to synthesis on a biologically relevant scale, followed by purification & characterization of these compounds, and testing in a range of integrated, automated and relevant assays. The process concludes with a machine learning based multi-parameter analysis of data in order to select next round of compounds. The first generation platform - MicroCycle 1.0 - has been developed and delivered in our Basel and Cambridge Labs. We look forward to sharing with you the story of platform development along with some key early results.
Lightning Session
Rapid Automated Computation, Coupling, Cleavage, Chromatography Execution (RAC4E) Platform Driving Drug Discovery
Open to view video.
Open to view video. Synthetic peptide therapeutics have played a notable role in medical advances over the past two decades with a more recent focus on diagnostics and personalized therapeutics, however, the field has struggled with a slow pace of discovery, optimization, and manufacturing due to limitations in the manufacturing processes – coupling, cleavage, and chromatography. Advances in laboratory automation have helped improve the productivity of expert scientists; however, these technologies have not been able to meet the demand for custom peptide sequences to enable rapid, cost-effective drug discovery, especially when compared to DNA synthesis. Inefficient synthesis techniques often take weeks, if not months, to produce the target molecule, becoming a major bottleneck in the drug discovery pipeline. Given that peptides are assembled from one reaction and a discrete number of building blocks, machine learning (ML) has the ability to drastically increase the speed of manufacturing by enabling sequence-specific optimization, significantly reducing the overall drug discovery time. Mytide’s Rapid Automated Computation, Coupling, Cleavage, and Chromatography Execution (RAC4E) platform takes a holistic approach to peptide manufacturing starting with the use of ML models to inform sequence-specific strategies while closing the loop with data collection at every process step, which feeds back into future synthesis plans. Our solid-phase slug flow (SPSF) technology offers the opportunity to combine real-time process analytics with ML models to provide a tailored coupling process for each and every individual sequence produced. Integrating optimization strategies into RAC4E serves as a foundation for efficient library development, leading the way for direct synthesis to bioassays driving innovation with our collaborators, partners and internal drug discovery pipelines. Authors: Dale Thomas, PhD; Justin Lummiss, PhD; Kevin Shi, PhD; Chase Olle, MEngM
Employing Supervised Learning Methods To Optimize pH Targeting In Chemical Solutions
Open to view video.
Open to view video. Chemical solutions are ubiquitous in research, development and production within the fields of chemistry, biology, pharmaceuticals and many others. The pH of a solution is an essential property to be able to control for a large number of experimental use cases, but that control should not come at any cost: ionic strength and other experimentally-relevant criteria also need to be controlled. We present LabMinds’s Solution Recipe Platform (SolReP), which was developed using performance data gathered from LabMinds’s Revo (SLAS New Product Award winner, 2016). The aim of the Platform’s continual development is to optimize solution compositions in order to reliably control pH in the presence of varied solution compositions. We can extrapolate control of pH outside of training regions of parameter space, including untrained combinations of chemicals. Our product improves solution composition, bringing a robustness and accuracy that’s often lacking, and with that brings a growing preference for our solution recipes and in turn more data on which to further improve those recipes: a virtuous cycle for all. We have targeted a core set of buffer systems with pharmaceutically relevant use cases (Acetate, Citrate, Histidine, Phosphate, Tris), across a broad range of concentrations and pHs. The uptake of these solution recipes among our customers have been great, replacing their in-house solution SOPs, where regulations allow. We target an area of pH and concentration space that is larger than simple, popular calculators are suited to and develop our platform using a growing data set that is much larger than an individual scientist can rely on, when developing solution compositions in an ad hoc manner. Solution recipe compositions have historically been developed by making use of either simple theoretical calculators, commonly with a small region of applicability and little empirical input, or trial-and-error development, at individual points of interest. Our Platform extends the complexity and validity of the calculator approach and introduces a suite of supervised learning methodologies to ensure that continual performance data improves the power of its predictions. The pH of a solution can be predicted by mechanistic, semi-mechanistic, empirical and model-free approaches. Many of the mechanistic and semi-mechanistic models are a century old and parameters for the few chemical species that have been published are of variable quality. Using supervised learning, we can train these models, empirical models and model-free methodologies to improve their predictive power. Characterizing the identifiability of parameters is necessary both to assess the structural suitability of the models and to aid with experimental design. Parameter identifiability for the mechanistic and semi-mechanistic models can be achieved only by training in regions of pH, ionic-strength and temperature space where the various parameters dominate the dynamics. Unidentifiable parameters do not mean that a model has no predictive power, but reinforces the importance of characterizing the sensitivity of its predictions to those parameters, which we achieved by simulating experiments sampling the posterior distribution. LabMinds’s SolReP provides a significant step toward a data-centric approach to laboratory work.
Data-driven Prediction of Battery Cycle Life Before Capacity Degradation
Open to view video.
Open to view video. Accurately predicting the lifetime of complex, nonlinear systems such as lithium-ion batteries is critical for accelerating technology development. However, diverse aging mechanisms, significant device variability and dynamic operating conditions have remained major challenges. We generate a comprehensive dataset consisting of 124 commercial lithium iron phosphate/ graphite cells cycled under fast-charging conditions, with widely varying cycle lives ranging from 150 to 2,300 cycles. Using discharge voltage curves from early cycles yet to exhibit capacity degradation, we apply machine-learning tools to both predict and classify cells by cycle life. Our best models achieve 9.1% test error for quantitatively predicting cycle life using the first 100 cycles (exhibiting a median increase of 0.2% from initial capacity) and 4.9% test error using the first 5 cycles for classifying cycle life into two groups. This work highlights the promise of combining deliberate data generation with data-driven modelling to predict the behavior of complex dynamical systems.