The Digital Molecular Design and Fabrication ( DigiFAB ) Institute and the Faculty of Natural Sciences Data Science Theme gathered 47 students from across different Departments in Imperial to compete in their latest hackathon, inspired by large language models (LLMs).
Multidisciplinary teams of undergraduate and postgraduates students as well as early career researchers competed against one another to solve science challenges with LLM technologies, such as OpenAI’s ChatGPT, as well as open-source packages.
"This year, large language models like ChatGPT have been really topical, so we thought it might be interesting to do a hackathon on the topic," said Professor Kim Jelfs from the Department of Chemistry.
"We were excited to see teams from the Departments of Chemistry, Life Sciences, Mathematics, Physics, Computing, Chemical Engineering and Aeronautics," said Professor Jelfs.
From deconstructing chemical targets to answering exam questions
Teams were tasked with tackling two challenges with LLMs: one was to predict retrosynthesis routes for new molecules, the other was to extract knowledge from pre-existing literature to answer chemistry and physics exam questions.Participants had to find the synthetic pathways to around 50 unique molecules. Dr Alexander Ganose , a lecturer from the Department of Chemistry, said that the best-performing teams utilised open-source software - such as the chemical synthesis package known as Molecular Transformer, a machine-learning model inspired by language translation.
Other packages, such as paper-qa, that extracts and organises information, also helped teams tackle the next challenge. Participants were asked to answer 10 Imperial exam questions from past papers, when given textbooks to extract information from.
Discovering the best way to pose questions or requests to LLMs, such as getting them to assume the role of a chemist or providing important contextual information, also improved the results that teams got.
We gave them some example code that they could start from, but the teams were developing their own solutions using completely new technologies... Dr Alexander Ganose Department of Chemistry
"We gave them some example code that they could start from, but the teams were developing their own solutions using completely new technologies," said Dr Ganose, "People were getting very creative."
Teams were also able to hear from keynote speakers: Dr Kevin Jablonka (Helmholtz Institute for Polymers in Energy Applications of the University of Jena and the Helmholtz Centre in Berlin) and Dr Michael Pierler (OpenBioML and StabilityAI). Both speakers were involved in developing natural language processing based software packages, such as ChemNLP as part of OpenBioML.
Congratulations to the winning teams
FIRST PLACE: Team 7Ruiqi Wu, Yuchen Lou, Shirui Wang (Department of Chemistry) and Chin Yong Tan (Department of Mathematics).
"It was a fun competition overall, and I think most of the fun came from having the liberty to do whatever we wanted with the code," said Yuchen Lou, an undergraduate student from the Department of Chemistry.
Each team member received £100 in prize money.
SECOND PLACE: Team 2
Tanuj Karia, Shubhani Paliwal, Lingfeng Gui, Benjamin Tan and Gustavo Chaparro, who are all PhD students from the Molecular Systems Engineering Group in the Department of Chemical Engineering.
"We are not the most prominent experts on machine experts on machine learning or LLMs, so the DigiFAB Hackathon was a challenging and fun experience," Chaparro said.
"We see a lot of potential in LLMs for our field, like predicting the thermophysical properties of fluids, which could lead to the design of better, more efficient, and greener chemical processes! We really value this experience," he said.
Each team member received £75 in prize money.
THIRD PLACE: Team 10
Suchaya Mahuttanatan (Department of Chemistry), Xiaoyi Sun and Jason Li (Department of Physics).
Each team member received £50 in prize money.