The study, published in BMJ Health & Care Informatics and funded by UKRI, recreated four AI models documented in previous research as having a greater than 70% success rate in identifying liver disease from the results of blood tests.
After rebuilding the algorithms and demonstrating that they achieved the same results as in earlier studies, the research team looked at how they performed by sex, and found that they missed 44% of the cases of liver disease among women, compared to 23% among men.
The researchers found that the two algorithms that were judged to be best at screening for disease among patients overall had the biggest gender gap - that is, they performed the worst for women compared to men.
Lead author and PhD candidate Dr Isabel Straw (UCL Institute of Health Informatics) said: "AI algorithms are increasingly used in hospitals to assist doctors in diagnosing patients. Our study shows that, unless these algorithms are investigated for bias, they may only help a subset of patients, leaving other groups with worse care."
Dr Straw said: "We need to be really careful that medical AI doesn’t worsen existing inequalities in healthcare. When we hear of an algorithm that is more than 90% accurate at identifying disease, we need to ask: accurate for who? High accuracy overall may hide poor performance for some groups."
The AI models investigated by the researchers had been trained using the Indian Liver Patient Dataset (ILPD), which is used extensively to create algorithms that predict liver disease.
The researchers noted that disparities in the performance of AI models for men and women likely reflected existing inequalities in care. The biochemical markers of disease used by the algorithms, such as lower albumin levels, are already widely used by clinicians and appear to be a more effective indicator of disease in men.
Past research shows women are less likely to be diagnosed with liver disease and also more likely to have more severe disease with worse outcomes.
For the latest paper, the researchers looked at 30 studies describing algorithms that screened for liver disease and, despite disparities in care likely to lead to bias in AI, found that none of them discussed sex differences.
The researchers were unable to assess the performance of the AI models for different ethnicities, as they did not have the relevant data, but said there was evidence that markers used to predict liver disease were less effective for marginalised racial groups.
Liver cirrhosis - scarring of the liver caused by long-term damage - accounts for an estimated 1.8% of deaths in Europe. Common causes include drinking too much alcohol over many years; being infected with hepatitis for a long time, particularly hepatitis B or hepatitis C; and a severe form of non-alcoholic fatty liver disease (NAFLD).
- University College London, Gower Street, London, WC1E 6BT (0) 20 7679 2000