Machine learning models to support chemical R&D recognised with Best Paper Award

Jose Pablo Folch
Jose Pablo Folch

A team from Imperial and BASF has won the Computers & Chemical Engineering Best Paper Award 2023 for AI techniques that could boost chemical R&D.

The prestigious journal in process systems engineering rated the paper as the best of over 280 published that year.

The process of trial and error in chemical R&D is costly, with some experiments taking weeks. So chemists need to find optimal manufacturing settings with as few experiments as possible.

The paper by PhD candidate Jose Pablo Folch and colleagues from BASF and Imperial adapts a set of classical Bayesian statistical techniques, which are used to obtain the most useful possible information from a finite number of experiments, to the specific research and development (R&D) methods used by chemical companies.

"The editorial team appreciated the way that the authors’ work spanned theory and practice by developing new Bayesian optimization approaches and applying them to an industrially relevant application," said Professor Stratos Pistikopoulos of Texas A&M, Computers & Chemical Engineering Editor in Chief.

"It’s fantastic to receive recognition from the flagship journal in the process systems engineering community for work that is both academically pioneering and has important practical value to industry," said co-author Ruth Misener , a professor in the Department of Computing and the EPSRC Centre for Doctoral Training in Statistics and Machine Learning.

An R&D challenge

Experimentation is key to R&D in the chemical industries. Before setting up a new production line or facility, industrial chemists test a range of manufacturing parameters such as temperature settings and raw materials in order to maximise the purity of the product and minimise economic and environmental costs.

"It’s fantastic to receive recognition from the flagship journal in process systems engineering for work that is pioneering and important to industry." Professor Ruth Misener Computing

This process of trial and error is itself costly, with some experiments taking weeks or months and significant resources to carry out. The chemists therefore need to find near optimal manufacturing settings with as few iterations of their experiments as possible.

The Imperial and BASF team behind the winning Computers & Chemical Engineering paper have created a new algorithm based on Bayesian optimisation, a statistical technique that can be used to help obtain the best manufacturing parameters in a small number of experiments.

Bayesian optimisation uses a small set of initial experimental data to produce a curve that models the relationship between a given manufacturing parameter (for example temperature) and performance (e.g. purity), and assigns varying levels of certainty to different parts of the curve based on the data available.

It tells experimenters which parameter value to test next by finding a compromise between testing values that are already expected to yield a strong performance, and higher risk experiments whose results are highly uncertain but could yield better performance still. It updates the curve and the confidence intervals for each data point after each iteration before recommending the next iteration. 

Chemistry advances

The algorithm devised by the researchers improves upon classical Bayesian optimisation by better accommodating the particular experimental practices used in chemical R&D.

"Machine learning is growing fast, and in this paper we’ve shown how to apply some state of the art techniques from machine learning to chemistry. The challenge is making sure the maths actually describes the real-world problem - the collaboration with chemists and data scientists from BASF has allowed us to do this," said Mr Folch. 

"There are several things in chemistry that don’t work well with Bayesian optimisation," added Professor Misener. "This paper deals with two of those - ’multi-fidelity’, the fact that some data-sources return more reliable data than others. The other, ’asynchronous batch’, is that experiments take varying amounts of time, and you might be running multiple experiments at once."

One widely used approach in chemical R&D is to perform experiments using relatively quick and cheap approximations of the manufacturing processes under development. For example, researchers developing techniques for manufacturing electric vehicle batteries might complement their slow and costly experiments on the pouch batteries used in vehicles with approximations using easy-to-produce coin batteries.

The new algorithm is designed to recommend these cheaper but less accurate experiments to test parameter values where the predicted results are highly uncertain, and more costly but accurate experiments where the results are less uncertain.

It is also designed to accommodate the fact that the quicker experiments return results more quickly than the slower and more accurate ones. It helps avoid resources sitting idle by basing its recommended experiments on the data currently available and the results that are unknown but expected to come in.

The researchers used an empirical test using battery data supplied by BASF to show that their algorithm can identify more optimal manufacturing settings than classical Bayesian optimisation with a limited amount of experimentation. 

Real-world innovation

The research was carried out as part of a large-scale partnership between Imperial and BASF, the world’s largest chemical company, that is working to develop advanced forms of chemistry and bring them to the outside world to create a more efficient and sustainable chemical sector.

The partnership has recently resulted in the formation of a spinout company, SOLVE , founded by Dr Linden Schrecker with Mr Folch as Chief Scientific Officer, which is bringing novel experimentation and AI techniques like those outlined in the winning CACE paper into active use by the chemical and pharmaceutical industries, creating benefits for the economy and environment.

    Combining Multi-Fidelity Modelling and Asynchronous Batch Bayesian Optimization by Jose Folch, Robert M Lee, Behrang Shafei, David Walz, Calvin Tsay, Mark van der Wilk, and Ruth Misener.

Partnership opportunities for industry

Companies that wish to learn about collaborating on or commercialising university research can discover opportunities by contacting Imperial Enterprise.