Machine learning systems are already shaping everyday parts of our lives, from spam filters to product recommendations and social media feeds. New initiatives are currently underway. Generative AI is built into these systems to write code, label data, explain decisions, and even support decision-making.
It may seem efficient. Michael Lowndes isn’t convinced that’s wise.
In a paper published in the journal Cell Press Patterns, Heriot-Watt University computer scientists argue that incorporating large language models into machine learning workflows can make those systems harder to understand, harder to audit, and more vulnerable to security flaws, legal issues, bias, and bad decisions. His central point is not that generative AI is useless for machine learning. Rather, it is that the trade-off is underestimated.
“Machine learning developers need to be aware of the risks of using GenAI in machine learning and find a smart balance between increased functionality and the associated risks,” says Lowndes. “Given the current limitations of generative AI, I think this is a clear example of how just because something can be done doesn’t mean it should be done.”
His paper is structured as a tutorial and practical warning for people building machine learning systems, rather than as a report on a single new experiment. We also discuss how generative AI is currently being used and where problems can arise with their usage.
Where generative AI is making inroads into workflows
Lowndes discusses the four main roles of generative AI in machine learning. Generative AI can be placed within a machine learning pipeline as part of the decision-making process. Helps you design and code your pipeline. You can generate synthetic training data or preprocess and label existing data. You can also analyze the results and create reports on what the model did.
Each role comes with its own challenges. When you combine several of them, Lowndes argues, the risks start to add up.
“If GenAI is operating in different ways within machine learning workflows and systems, they can interact in ways that are unpredictable and difficult to understand,” he says. “My advice at this point is to avoid adding too much complexity to how you use GenAI in machine learning, especially if you are working in an area with high risks that impact people’s lives and livelihoods.”
This warning is most important in fields such as medicine and finance, where mistakes are not trivial. In his paper, Lowndes uses two real-world examples: a hospital triage system and a loan approval system to illustrate how these risks play out.
This medical example includes an in-hospital tool that uses language models to determine the severity of a case and which specialist should respond. The banking example relies on commercial generative AI services. Determine whether to approve the loan, referring to internal policy documents and other tools along the way.
In either case, the system is attractive for the same reasons that many companies are currently attracted to generative AI: speed, automation, and reduced labor costs.
These are also precisely the kinds of systems that can cause real harm if they fail.
Opaque systems, uncertain decisions
One of the big problems is that large language models introduce errors, and finding those errors isn’t always easy.
They can hallucinate facts, produce flawed code, produce weak designs, or give different answers to the same prompt. Lowndes argues that this is especially dangerous in machine learning. Developers may rely on AI-generated suggestions for steps that impact everything downstream, including training, evaluation, deployment, and monitoring.
The paper also emphasizes that newer or larger models are not automatically better. Lowndes points out that in some cases, older or simpler models can outperform fancy generative systems for certain tasks. Before adding generative AI, developers should ask themselves if they need it at all.
This question becomes more pressing when explainability becomes an issue.
“In fields like health care and finance, there are laws about machine learning systems being able to prove that they are reliable and being able to explain how they arrive at decisions,” Lowndes says. “LLM is so opaque that it becomes very difficult as soon as you start using it.”
His concern is not just that these systems are difficult to interpret. That means people may be overestimating how much the apparent explanation, including the model’s “inference” trace, actually tells us. While these traces are unreliable, they can also sound convincing.
Data breaches and technical debt
This paper spends a lot of time on security and governance risks.
Remotely hosted models often require data to be sent to an external server. This can lead to data breaches and cybersecurity issues, especially when sensitive information such as medical, financial, or internal business information is involved. Additionally, systems that use search tools, external databases, or agent AI capabilities can further magnify the risk.
Lowndes also warns that generative AI may exacerbate rather than solve classic machine learning problems. Synthetic data can contain hidden biases from the original training data used to build the model. The generated labels or preprocessing steps can distort the dataset. Code created by AI may contain mistakes, outdated packages, or fabricated dependencies.
This kind of convenience can later turn into technical debt. Teams may have to maintain code that they didn’t fully understand in the first place.
problem of prejudice
Bias is also a recurring problem. Many generative models are trained on large datasets collected from the internet, allowing them to absorb uneven representation, stereotypes, and unfair patterns. These biases can spill over into data generation, feature engineering, model decisions, and written explanations.
“It is important that the public is aware of the limitations of the GenAI system,” Lowndes said. “Companies deploy these systems for purposes such as cost savings, and while they can potentially improve the end-user experience, they can also have negative consequences such as bias and inequity.”
He argues that developers need to manually review code and output, document exactly where generative AI is used, and carefully consider whether the supposed efficiency gains are worth the risks.
The paper also makes clear that the risks are not limited to development. Problems may also occur after deployment. This especially occurs if the remote model changes over time, the prompts no longer behave the same way, or the user learns how to game the system.
Practical implications of the research
This paper offers a simple message to developers, businesses, and the general public: Treat generative AI in machine learning as a source of trade-offs, not magic.
For low-stakes applications, some risk may be acceptable. Lowndes argues that systems that impact health, finances and access to services require greater attention. That means leveraging human oversight, limiting unnecessary complexity, manually checking output, being aware of bias and security issues, and resisting the urge to delegate much of your workflow to reliable but confident-sounding systems.
His broader point is that automation can make machine learning both more powerful and more vulnerable. The more generative AI is integrated into a system, the more difficult it can be to understand what that system is actually doing. It also makes it difficult to know who is responsible if something goes wrong.
