The Expert Toxicologist has left the building. Or have they? With generative Artificial Intelligence (AI) technology advancing rapidly, we look into AI generated toxicology content and decide whether or not we should “get our coats”.
A hot topic across the world
Generative AI systems apply machine learning to data sets, using algorithms to identify patterns and structures in the source data that can be used to create new and original content which feels like it has been created by a human. And they are a “hot topic” in a variety of workplaces, not just ours.
The benefits of machine learning in toxicology are undeniable. By leveraging machine learning and AI approaches, it is now possible to rapidly develop physiologically-based pharmacokinetic (PBPK) models for hundreds of chemicals, and to create in silico models capable of rapidly predicting toxicity for a large number of chemicals with accuracy approximating in vivo animal experiments for some endpoints. Not to mention the ability to rapidly analyse large amounts of different types of data (i.e., high-throughput screening data, toxicogenomic data, imaging) to generate new insights into toxicity mechanisms. It wasn’t impossible in the “paper era”, but it would have been a mammoth and probably thankless undertaking.
Putting toxicology AI to the test
We asked ourselves, can the human expert toxicologist be entirely replaced by a robot? We tried out one of the latest generative AI toxicology reporting models to find out.
First off, the model is easy to use and very user-friendly – just copy in your CAS RN and off you go. There are several report options from a general brief toxicity report (maximum 16,000 words) to more detailed reports on specific endpoints, for example carcinogenicity, reproductive/developmental toxicity and ecotoxicity but strangely not genotoxicity. Reports are generated rapidly, within 5 minutes so, an entire suite of reports for a chemical can be produced in around half-an-hour. So, first impressions are that the model does “what it says on the tin” – provides a first draft in seconds rather than hours allowing “you to focus your professional skills on perfecting the final product.”
And professional skills are most definitely still required. The output is a hazard characterisation. There is no facility to enter the data that are required to provide a contextualised human health risk assessment. And while the hazard characterisations appear largely accurate, they are brief, lacking focus on health-critical endpoints and in some cases, lacking consistency. Two reports generated on the same day on the plasticiser bis(2-ethylhexyl) phthalate (DEHP) differed, not just in terms of format but, more concerningly, also in content and references cited. The reference lists for DEHP and another well-studied chemical, aluminium, were suspiciously short and primarily cited individual studies. A lack of input from Expert Groups such as the OECD, the WHO, and the EFSA to name but a few was a major weakness. The references cited were also quite old; for DEHP the most recent reference cited in the AI-generated report was from 2013, unusual for a chemical which has been subject to intense scrutiny in recent years because of its endocrine disrupting potential. But, at least they were real references on the target chemical.
For less well-studied chemicals, the output ranged from disappointing (returning a null result) to concerning. Reports generated on 1,3-di-tert-butyl benzene (1,3-DTBB), a chemical for which little experimental data exists and most human health risk assessment rely on read-across, the model generated a brief toxicity report, specific reports covering carcinogenicity and reproductive toxicity and a No-Observed-Effect Level (NOEL). We have to say, this had us scratching our heads – had we missed something? Where had this data come from? Unfortunately, we were unable to determine the source of the data or verify the output – the cited references, which nominally look authentic, a real toxicity journal, with volume and page numbers, simply do not exist – a total, and potentially dangerous figment of the model’s imagination.
Our final thoughts
Clearly generative AI toxicology reporting models for human health risk assessment are still very much in their infancy. There is no doubt that the model we tested, as it consumes and processes more of the available toxicity data in the world at large, will at some point perform more impressively. As of now having a rapidly generated first draft to work from does seem tempting, but as we know, one shouldn’t yield to temptation. The current limitations are obvious, this first draft may be a complete fantasy, or at best, a totally arbitrary and inconsistent selection of very old data. The inability to contextualise evidence, deal with complex and/or related datasets, problem solve, think critically or creatively or to bring nuance and narrative to a dataset are just a few of its present weaknesses. And, as with all “black box algorithms” the lack of transparency means that a human sanity check is probably always going to be required; and that will not be a cursory task, as the toxicologist will need to do their own searches and select and evaluate the key studies. It is hard to see what the model is currently reliably bringing to this particular party.
We are confident that our staid consultancy of 60+ years can outperform the current AI technology. Our database, TRACE, along with our vast library of chemical hazard and risk assessment monographs, provide not only a much superior initial search and far deeper immediate level of understanding, but we are safe in the knowledge that our clients are getting actual real data, understood by actual real humans. Well, actual real scientists, at least.
Phew (puts coat back on the peg).