2022 AI Data Pipelines for Life Sciences Symposium

This two day symposium will allow participants to explore how AI data pipelines are integrated into the life sciences. Attendees will learn about MLOPS, applications, techniques, and architectures of data and their uses in the life sciences.

Topics included:

  • Explore the tools and techniques to collect and prepare data sets to train a ML model.
  • Discuss methods for curating pre-existing data sets for use in machine learning model training.
  • Discuss the role of AutoML on the future of training machine learning models.
  • Discuss current tools and techniques used for high-throughput screening – how can automated lab equipment improve the processes involved in drug discovery and manufacturing?
  • Explore how using ML/AI connections in the cloud and the internet of things can improve life science manufacturing.



You built a machine learning model that gives great predictions, but now what? How do you deploy it so that it can be used by others in your company? How do you update your model as new data becomes available? How do you provide traceability as to which model (and data) were used to make which predictions that led to decisions? Machine Learning Operations, or MLOps, focuses on these topics and more, helping you maintain the product of your macine learning data pipeline.

Navigating Data Substrates

Data ingestion and preparation enables downstream learning, ranging from statistical analysis to ML/AI. This session explores available tools and systems used to aggregate, store, and search for scientitfic datasets. Attendees can expect to learn about how to "prime the pump" to accept high-quality, near-real-time data to support a wide array of use cases and desired experimental outcomes.


The internet of things (IoT) and AI offer substantial value for research and development, drug discovery, and life sciences manufacturing. This session will discuss the application of IoT/AI tools and techniques in these topics and provide specific attention to efforts in wet lab automation and high-throughput screening. Attendees can expect to learn about how these applications intend to increase efficiency in routine and novel laboratory functions, supply greater control over equipment operation, and pave the way for more real-time analysis and data exploration.


The first computer neural network was created in the 1950's, but a lot has happened since then. This session will focus on both machine learning basics and the latest developments in machine learning and AI techniques. Advanced topics discussed may include AutoML, active learning, reinforcement learning, and data visualization.


This session will focus on systems architectures for AI-driven laboratories, especially clould labs. Representative topics include: automated sample tracking and data capture; data management, scalability, and security; programming languages and user interfaces enabling remote execution and user training; semantic layers; and AI/ML-based analysis and control.

Matt Rasmussen

VP Software Engineering


Matt Rasmussen, Ph.D., serves as the vice president of software engineering at Insitro, a machine learning-driven drug discovery and development company, where his team develops data pipelines and infrastructure for machine learning and high-throughput biology to transform the way medicines are created. Previously, as the vice president of engineering for Myriad Genetics, Rasmussen led engineering teams focused on software automation and genomic data pipelines to make high complexity genetic testing routine in clinical practice. During his time at Counsyl, he developed and scaled the software behind several successful prenatal genetic testing products. In 2010, Rasmussen acquired his Ph.D. in computer science from the Massachusetts Institute of Technology (MIT), where he developed efficient bioinformatic algorithms with applications in evolutionary genomics and population genetics.

Naim Matasci

Director of Bioinformatics and Computational Biology

Lawrence J. Ellison Institute for Transformative Medicine

Naim Matasci leads the Ellison Institute’s computational lab as the Director of Bioinformatics and Computational Biology. In his role, he supports the Institute's researchers by providing analytical guidance and expertise across the entirety of the project life cycle. Dr. Matasci earned his MSc in Molecular Biology from the University of Zurich, Switzerland. Later, Dr. Matasci joined the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, to study the evolution of human protein expression for his PhD. He then joined the iPlant Collaborative team at the University of Arizona, now CyVerse, where he helped design and develop a data-centric computational infrastructure for the Life Sciences. His current research areas are digital pathology and genomics, in particular for cancer diagnosis.

Kevin C. Cassidy

Application Scientist

Dassault Systèmes, BIOVIA

Kevin C Cassidy Ph.D Application Scientist BIOVIA Kevin Cassidy has a scientific focus in computational biophysics, biology, and chemistry. He is a part of the biosciences technical sales group at Dassault Systèmes BIOVIA and supports several solutions within Small Molecule Therapeutics Design, including Generative Therapeutics Design, Discovery Studio Simulation, and Insight.

Ilya Goldberg

Chief Science Officer


Dr. Ilya Goldberg has spent the majority of his career at the intersection of biology and imaging, and played a leading role in the development of image informatics and machine learning for bio-medical imaging since the emergence of these fields in the late '90s. At ViQi, Ilya leads development of high-throughput imaging assays using AIs. Prior to ViQi, Ilya co-founded a company that developed the first medical device to receive regulatory clearance that uses an AI to help diagnose lung cancer in CT screening exams. Prior to this, Ilya led a research group at the NIH National Institute on Aging, where his group developed machine learning software for image processing in biology and medicine, and studied the molecular mechanisms of aging in humans and model organisms. As a postdoc at MIT, Ilya co-founded the OME project, which continues to be used for imaging infrastructure in large image repositories. Ilya has over 60 peer-reviewed scientific articles from his years at Johns Hopkins, Harvard, MIT, and NIH in molecular and cell biology, pattern recognition, image informatics and the basic biology of aging.

Sree Vadlamudi

Vice President, Head of Business Development EU, RoW


As Vice President, Head of Business Development EU and RoW, Iktos AI, Sree leads strategic business development outreach, alliance management, marketing & communication activities. Sree has a successful track record in Strategic Business Development, Technology Commercialisation and Alliance Management positions with increasing responsibility in the bio-pharma industry. Sree was instrumental in securing and managing partnerships with biotechs and pharma as well as maximising the value of AI-based technology platforms and the growth of the business. Sree also held senior research management positions at biotechs and virtual biotechs. Named author of multiple patents and publications, delivered numerous invited talks at international conferences. Sree holds PhD in Medicinal Chemistry, and MBA with merit from Lancaster Business School.

Berke Buyukkucak

CEO & Founder


Berke Buyukkucak is the CEO and Co-founder at Superbio.ai. He received BSc & MSc degrees in Biomedical Engineering from Brown University. He was part of a research group that won the Healthcare Design Competition at Johns Hopkins University. He is passionate about democratizing machine learning for life sciences and decreasing the cost of adoption.

Beth Cimini, Ph.D.

Group Leader, Image Analysis, Imagine Platform

Broad Institute

Beth Cimini leads the image assay development team within the Imaging Platform of the Broad Institute of MIT and Harvard. Her team works with biologists to help them create image analysis workflow and makes the open-source image analysis software CellProfiler.

Cimini joined Anne Carpenter’s lab at the Broad Institute in June 2016 after completing her doctoral studies in biochemistry and molecular biology at the University of California-San Francisco in the Blackburn lab. She also holds a B.A. in biochemistry and molecular biology from Boston University.

Ian M. Kerman

Director of Customer Success


Ian Kerman studied bioinformatics and molecular biology at the University of California, San Diego. Soon after starting as a research associate at a biotech startup, Ian began applying machine learning techniques to his company’s screening and assay data. Ian later joined a life science-focused data science company, helping laboratory scientists process, analyze, and extract insights from their data. More recently, Ian studied machine learning at the Georgia Institute of Technology and is the Director of Customer Success at LabVoice, an AI-powered digital assistant company for scientists. When he isn’t helping scientists analyze and automate their processes, he spends time with his husky-pug or SCUBA diving with sharks.

Steven van Helden, Ph.D

Pivot Park Screening Centre

Steven van Helden studied chemistry at Utrecht University and, after obtaining his Ph.D. , worked in various roles in pharmaceutical industry for 20 years. Since 2003 he has been responsible for High Throughput Screening (HTS) operations and strategy at Organon/MSD. After the closure of those research facilities he developed a business plan for continuation of the screening activities in Oss, The Netherlands. This led to the formation of the Pivot Park Screening Centre (PPSC) and a central role of this company in the European Lead Factory. Steven is now Chief Technology Officer at PPSC.

Jie Li

Graduate Student Researcher

University of California, Berkeley

Jie (Jerry) Li is a 5th year PhD student from Teresa Head-Gordon Lab in University of California, Berkely. During his PhD he has developed several machine learning models that encompass multiple fields of theoretical and computational chemistry, including predicting NMR chemical shifts of crystaline small molecules and aqueous proteins, developing equivariant message passing neural networks for molecular energies and forces prediction, and designing generative model - reinforcement learning workflows for proposing small molecule inhibitors that have strong interactions with a given protein. He is broadly interested in applying cutting edge AI technology to solve chemical biology questions that are difficult to tackle.

Rupert R. Dodkins

Machine Learning and Image Analysis Scientist

ViQi, Inc

During his DPhil at the University of Oxford, and postdoc at University of California, Dr. Dodkins applied machine learning to exoplanet imaging with single-photon cryogenic-detectors. At ViQi, Inc, he develops machine learning algorithms for high-content virology assays. Dr. Dodkins jointly owns the Guinness World Record for Most Skateboard Heelflips in One Minute.

Paul Jensen

Assistant Professor

University of Michigan

Paul Jensen an assistant professor of biomedical engineering at the University of Michigan. Paul earned bachelor’s degrees in chemical and biomedical engineering from the University of Minnesota and a Ph.D. in biomedical engineering from the University of Virginia. His research group studies the oral microbiome using artificial intelligence, laboratory automation, and high-throughput genomics.

Toby Blackburn, MBA

Head of Business Development and Strategy

Emerald Cloud Lab

Toby Blackburn serves as the Head of BD and Strategy at Emerald Cloud Lab (ECL), a physical laboratory which scientists can access remotely via the internet that allows them to run, analyze, and interpret experiments without setting foot in the lab. He holds an MBA from Duke University’s Fuqua School of Business, and a B.S. in Chemical Engineering from North Carolina State University.

Leah McGuire

Machine Learning Engineer


Leah McGuire is the tech lead for the Automation and Analytics teams at Benchling, working on integrating with laboratory equipment and providing tools to analyze the results of scientific experiments. Before joining Benchling, Leah was a Machine Learning Architect at Salesforce, building AutoML capabilities for Salesforce Einstein. She got her start in datascience at LinkedIn, after completing a PhD and a Postdoctoral Fellowship in Computational Neuroscience at the University of California, San Francisco, and at University of California, Berkeley, where she studied the neural encoding and integration of sensory signals.

Mike Tarselli

Chief Scientific & Knowledge Officer


Mike Tarselli, Ph.D., MBA is the Chief Scientific Officer for TetraScience, a Boston-based start-up building the Scientific Data Cloud. He has held scientific and leadership roles at SLAS, Novartis, Millennium, ARIAD, and Biomedisyn. Mike has received awards and fellowships from IUPAC, Wikipedia, ACS, NSF, and the Burroughs-Wellcome Trust. He volunteers in roles promoting scientific education and diversity, including the National Science Foundation, the Pistoia Alliance, the NIH Assay Guidance Manual, and the UMass College of Natural Sciences Advisory Board.


Orchestrating lab, data pipelines, and ML to automate pooled optical CRISPR screening at scale
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available At insitro, we are developing a lab and software platform for cell-based disease modeling and drug discovery at scale. Within the field, pooled CRISPR screening has emerged as a powerful method of uncovering entire gene networks and modulators of critical biomarkers. Previous methods of CRISPR screening were either limited to fitness and FACS-sortable phenotypes or were cost prohibitive at higher scales, such as perturb-seq. More recently developed screening methods, such as Pooled Optical Screening in Human (POSH) (Feldman, et al. 2019), allow perturbagens (gene targeting gRNAs) to be amplified and directly measured via in situ sequencing while maintaining cellular structure and spatial features, thus enabling a wide range of potential imaging assays (e.g. cell migration, calcium signaling, CellPaint, protein aggregation, multicellular/cell-cell interaction, etc). Building on this research, we have developed an automated platform to enable pooled optical screening at industrial capacity. The platform includes multiple automated workcells to perform POSH in-situ sequencing by synthesis, a custom Python-based driver for complex microscope acquisition protocols, and automated sync to cloud storage. Once in the cloud, we perform highly parallelized multistage image processing with rich data provenance recording, as well as self-supervised ML model training (e.g. DINO) for phenotype feature extraction. In this talk, we discuss our initial results validating this platform using a druggable-genome scale screen for phospho-S6 (pS6) expression in A549 cells, which recovers known genes in the mTOR pathway. We also share specific lessons learned from developing complex lab data generation systems. In particular, we find that expressive data workflows and automated data lineage recording are critical for managing rapidly developing systems with multiple data modalities. These insights led to a new data science framework, redun, which we’ve recently open sourced (https://github.com/insitro/redun). Overall, this work highlights the importance of coordinated design of lab automation and data infrastructure in order to generate datasets optimized for the latest AI/ML techniques.
Protein design using deep learning
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available Proteins mediate the critical processes of life and beautifully solve the challenges faced during the evolution of modern organisms. Our goal is to design a new generation of proteins that address current-day problems not faced during evolution. In contrast to traditional protein engineering efforts, which have focused on modifying naturally occurring proteins, we design new proteins from scratch to optimally solve the problem at hand. We now use two approaches. First, guided by Anfinsen’s principle that proteins fold to their global free energy minimum, we use the physically based Rosetta method to compute sequences for which the desired target structure has the lowest energy. Second, we use deep learning methods to design sequences predicted to fold to the desired structures. In both cases, following the computation of amino acid sequences predicted to fold into proteins with new structures and functions, we produce synthetic genes encoding these sequences, and characterize them experimentally. In this talk, I will describe recent advances in protein design using both approaches.
An interactive 5D image data management and analysis system to support AI applications
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available Microscopy imaging has become an essential component of drug discovery and testing in cancer research. The introduction of more advanced biomimetic models such as organoids provides novel challenges across the entirety of the workflow, from sample preparation and processing to data management and analysis. On the analysis side, it is especially important to adopt systems that will ensure that the collected images and derived data are stored and structured in a way that makes them suitable for machine learning and artificial intelligence applications. At the same time, the system will need to maintain the ability for researchers to review these images and provide analytical and technical insights. We have built and deployed an image management system that bridges and connects disparate instruments across sites and makes images and the associated metadata available to both researchers and downstream cloud-based advanced analytics in a secure and organized fashion.
Driving Efficiencies with AI and Machine Learning in R&D Labs
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available Science-driven pharmaceutical and biotech companies are increasingly recognizing the value of AI and Machine Learning to better leverage existing knowledge resources, to accelerate R&D, and to improve operations. Traditionally, computational scientists and data scientists were the primary users of these tools in this space. Only recently have research organizations been making these tools available directly to end users (“democratization”) greatly enhancing the value of these tools to the organization. Combining domain expertise with AI (the “human-in-the-loop” approach) promises to deliver much improved results as opposed to the situation in which computational experts are the AI gatekeepers. Generative Therapeutics Design (GTD) is a recently developed solution which offers domain experts access to these powerful tools. In a customer case study using a SAR data set pertaining to the discovery of SYK inhibitors we show how several common types of problems in medicinal chemistry can be effectively addressed via GTD. Although this is a cloud-based solution, computational experts can incorporate in house or 3rd-party algorithms to augment the solution. Insights gleaned from advanced models and analytics are fueling discoveries, automating complex tasks and transforming an array of industries.
AutoHCS: Automated AI-based scoring of dose-response high-content screens
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available AutoHCS™ is an AI-based system developed by ViQi Inc. that automatically detects and scores dose-response in high-content compound screens. This system is cloud-based so there is no software or specialized computing hardware to install locally. The inputs to the analysis are limited to images from automated plate imagers using one or more brightfield or fluorescence channels, and a plate map specifying the locations of negative controls, one or more positive controls, and each compound, concentration, and replicate. These inputs are sufficient to produce a report consisting of several violin plots for each compound, scoring the cellular response to compound concentration as a phenotype similarity to each of the positive controls. In addition, each compound is scored for a dose-dependent phenotypic change without regard to controls, to allow discovery of novel phenotypes. Each compound concentration is also compared to the set of positive and negative controls processed together in a multi-way classifier and presented as a set of dendrograms. Finally, the phenotypes induced by the highest concentrations of all compounds are compared to each other in a dendrogram to visualize any phenotypic clusters. This system does not depend on accurately segmenting cells, but on consistent patterns in cellular response correlated with control phenotypes and compound concentrations. Eliminating the dependence on accurate segmentation allows this system to work equally robustly and nonparametrically on any fluorescence or brightfield channel without loss of sensitivity. The automation of searching for appropriate AI training parameters eliminates any additional criteria or assumptions for interpretation, making the analysis free of subjectively chosen imaging or AI training parameters. This system uses the robust pattern recognition abilities of modern AIs to enable scoring of high-content screens in an entirely automated, objective manner.
Navigating Data Substrates
Synergistic Drug Design Using AI and Automation
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available
Mobilizing Machine Learning Research Community for Life Sciences
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available The open source and research community had an incredible impact on fields like natural language processing, computer vision, and augmented reality. Why not for life sciences? What are the challenges and opportunities in mobilizing the machine learning community for life sciences research?
The CellPainting Gallery - Resources and Lesson Learned
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available As high content, high throughput microscopy becomes more common, biologists now must figure out how and where to store their imaging data. Likewise, as ML experts learn to extract meaningful biological data from these images, where to find them and how to interpret them becomes an increasing challenge. We herein present two image repositories we have been involved in creating - the Broad Bioimage Benchmark Collection and the Cell Painting Gallery, as well as lessons learned in creating them.
Zip Codes, the NYC MTA, and COVID-19
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available
Expanding HTS hitlist – An in silico data-mining tool to report on qualitative gene-compound relationships of HTS hits and their corresponding Nearest Neighbours
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available High Throughput Screening (HTS) is an important tool for finding starting points for the development of new medicines for a wide variety of diseases. HTS assesses large libraries of small molecules with drug-like properties of unknown biological function. When hits are found, the selection of the most promising compounds for follow-up testing is often based on limited available biological information. To improve selection of the best compounds we have developed an in silico tool to assess the literature for similar compounds for which biological properties have been reported. Combining this information with actual test results and a chemical assessment of the compounds facilitates hit optimization and hit-to-lead efforts. The AI tool creates a reference set of ChemBL compounds related to the HTS hits (Tanimoto > 0.6) to perform a literature keyword based search in a knowledge base with > 200 million relations between drugs, targets and diseases. The tool creates a configurable dashboard-type report. The current version shows two visualizations: a heatmap and a network graph displaying literature matches with genes, diseases, and pathways. The heatmap shows the strength of these matches whereas the network graph provides more insight into the clustering of the relationships. Concise tables show relevant data from literature like assay information and literature ranking. In this presentation we will present our fully automated ultra High Throughput Screening workflow and show examples how the new in silico AI-tool helps to add value to hit lists.
Reinforcement Learning with Real-time Docking of 3D Structures to Cover Chemical Space
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available We propose a novel framework that generates new inhibitor molecules for target proteins by combining deep reinforcement learning (RL) with real-time molecular docking on 3-dimensional structures. We illustrate the inhibitor mining (iMiner) approach on the main protease (MPro) protein of SARS-COV-2 that is further validated via consensus of two different docking software, as well as for drug-likeness and ease of synthesis. Our proposed molecules exhibit an optimized, precise, and energetically consistent fit within the catalytic binding pocket, illustrating the effectiveness from a theoretical standpoint. Moreover, our approach is expected to work for other protein targets, and the similarity of the generated molecules compared to a given starting structure can be tuned to allow optimizing hit structures found by experiments.
A Cloud-based Rapid and Scalable Viral Infectivity Assay for Vaccine and Antiviral Screening
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available Viral infectivity assays are an essential step for automated screening of viral vaccines and antiviral drugs. However, the incubation periods of plaque and TCID50 assays can be as long as 14 days, and the alternatives like fluorescent focus assay (FFA) require antibodies and extensive sample preparation or GFP-labeled viruses. Further, automated image analysis tools for interpreting FFA require manual parameter selection, which can make the assay subjective. In response, ViQi, Inc. has developed AVIATM (Automated Viral Infectivity Assay). This assay uses machine learning on brightfield images to detect signs of viral infection. It does so by identifying subtle phenotypic changes within cells long before these changes are detectable by manual inspection. Infection phenotypes can be identified within a few hours of exposure to the virus, and can be done on live cells without any sample preparation or fluorescence imaging. The output of this assay is an infectivity measurement similar to a multiplicity of infection (MOI). The assay does not require any parameter tuning by the user, ensuring objectivity and ease of use. The ViQi cloud automatically completes a one-time AI training from a single 96-well training plate containing cells infected with virus dilutions. Once trained, this AI can then process assay plates containing multiple dilutions and replicates of the infection conditions being tested. The assay reports a quantitative result for each well with an infection rate within the linear range of the assay. Consequently, the number of samples per plate is typically 8-16 times greater than a typical TCID50 assay plate layout. Initial training reports are typically returned within a day, and assay reports are emailed back in under an hour. Thus far, our machine learning models have been successfully trained on ten viruses including DNA, RNA, enveloped, and non-enveloped virus types. This includes viruses that do not reliably have manually observable cytopathic effects, such as HIV and Adenovirus. AVIA is deployed on ViQi, a cloud-based analysis platform with integrated workflow management, input and output traceability, and a suite of data visualization tools. Together, this system provides researchers with a scalable and reproducible analytic tool for measuring viral infectivity in automated screens.
Learning to solve biological puzzles with automated experiments
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available Artificial intelligence (AI) has stunned the world of competitive gaming, besting human experts in chess, Go, StarCraft, and other complex, multi-player games. All of the superhuman game-playing systems rely on reinforcement learning—a branch of AI where agents learn through practice without any prior knowledge. To capitalize on these developments, our team has “gamified” science by casting biological research questions as combinatorial puzzles. These puzzles can then be solved by reinforcement learning using the techniques invented for board games or video games. The downside of reinforcement learning is that agents must be allowed to “play” and learn through trial and error. Playing biology requires designing, executing, and learning from wet-lab experiments. We developed an automated system that performs combinatorial experiments on demand. Our system, called Deep Phenotyping, executes and analyzes up to 10,000 independent experiments per day. To solve biological puzzles, we gave control of the Deep Phenotyping system to a reinforcement learning agent. Each morning, the agent analyzes the incoming results, updates its strategy, and plans a new batch of experiments. The experiments are executed in the afternoon and incubated overnight. Combining AI and Deep Phenotyping creates a closed-loop system where agents can learn without prior knowledge or human input. We challenged our AI system with a combinatorial puzzle: identifying the essential amino acids for bacteria in the oral microbiome. Although there are over one million possible combinations of the 20 amino acids, our reinforcement learning agent tested 0.32% of these combinations before discerning the metabolic logic of the bacterium. Moreover, the agent’s solution scored better than a human expert at predicting growth. We also show how transfer learning allows agents to repurpose information from previous games and reduce the number of wet-lab experiments required for future games with different organisms. These results demonstrate that AI and laboratory automation can answer complex biological questions even when the AI agents lack prior biological knowledge.
Architectures: Cloud Labs Provide the Infrastructure to Enable Deployment of AI/ML in Science at Scale
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available The biggest impediments to applying advancements in AI to physical lab infrastructure have been the disparate data flows, lack of interoperability between instrumentation, and no common language for everything in the lab (people, instruments, materials, storage, etc.) to communicate through. As a by-product of developing a cloud lab which must be able to faithfully execute any sequence of tasks and experiments that a user submits, these challenges have been addressed. In this talk, I will walk through the key features and design choices of a cloud lab that solve these problems, without an inordinate effort to wrangle data after the fact. Additionally, I will cover some of the forward looking features that ECL is developing to fully leverage the wealth of data a cloud lab generates to streamline the use of AI and ML tools to both interpret data. Finally, we will look at how a cloud lab can be used as a closed loop system to fully deliver on the promise of AI-driven experimentation.
Humans helping machines help humans run machines: Combining automation and domain knowledge to enable productized modeling of the growth of biologics
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available The biopharma industry is increasingly leveraging innovative technologies such as automation, ML, AI and NLP to aid in research. However, for these models to be successful, the underlying data needs to be standardized (FAIR), high-quality (validated), and secure. Benchling R&D Cloud provides the right data foundation and tools to capture and centralize the data needed to build and deploy models. Within Benchling, data models are customized for each customer, allowing for flexibility to capture nuances in distinct research processes. However, where commonalities in processes exist, we can leverage industry knowledge to provide sophisticated insights about data with built in modeling. In this talk, I will highlight how Benchling R&D Cloud enables the capture of FAIR data and highlights its application in one ML-driven use cases: growth optimization. We have built a modeling pipeline tailored to the growth of biologics. This modeling pipeline automatically adjusts to different data formats and distributions while capturing commonalities of the underlying systems being modeled. We can produce high quality models across a range of customers without having a specialist per customer to build the models. I will describe both the automation steps (and libraries) used and the domain specific modeling adjustments needed to create this growth of biologics modeling pipeline.
FAIR Data Ingestion and Harmonization
Open to view video.  |   Closed captions available
Open to view video.  |   Closed captions available This talk will discuss three major aims: The scale of Big Scientific Data, Challenges encountered in Data Engineering and ML Ops, and some specific use case explorations.