Move legacy lab data to production-ready AI

Decades of legacy laboratory information management systems (LIMS), disconnected instrument software, and nested Excel spreadsheets have left laboratories with mountains of unstructured data. Despite constant pressure to implement artificial intelligence (AI) and machine learning (ML) for rapid discovery and automation, bridging the gap between messy spreadsheet ecosystems and scalable models remains a major challenge. James Smagala, bioinformatics practice manager at Yahara Software, provides a practical framework for untangling legacy data, gaining the trust of skeptical science veterans, and avoiding common infrastructure traps that cause promising AI pilots to fail.

Professional portrait of a man wearing glasses and a blazer — James Smagala, Bioinformatics Practice Manager at Yahara Software.

Should we fix our data architecture first, or can AI act as a cleaner?

As lab managers stare down a maze of legacy LIMS data and fragmented spreadsheets, they can feel paralyzed by the sheer volume of cleanup work. That raises the classic “chicken or the egg” question. Do you need to spend months or even years manually rebuilding your data architecture first, or can modern AI tools act as a “cleaner” to clean up the mess?

Given that manual documentation in spreadsheets introduces significant data integrity vulnerabilities and compliance risks, it’s important to find an efficient method. In a regulated environment, electronic records systems must adhere to strict oversight guidelines, such as the FDA’s 21 CFR Part 11 regulations, to ensure data reliability, security, and full auditability. The answer, according to Smagala, is a strategic, multi-part approach. Labs can’t simply throw raw, chaotic data at an AI and expect magic. However, achieving a perfect initial data architecture is not a prerequisite for taking the first steps.

Blueprint must be created first

“First, you need to have a little bit of organization and architecture in place, and you need to know exactly what it’s going to look like when it’s finished,” Smagala explains. Before implementing tools, laboratories must define standard classifications, data schemas, and data guidelines. With global compliance frameworks such as ISO/IEC 17025 governing competency and quality standards for test and calibration laboratories, establishing standardized and validated operational baselines is a non-negotiable step. Without a concept map of how to classify data, it is impossible to train an AI to perform this task.

Release the AI cleaner

Once this basic foundational blueprint is established, labs can leverage AI as an iterative mechanism. “Once you have a basic foundation, you can use something like the AI janitor idea as a mechanism to start iterating quickly on your data, get it to a cleaner state, and archive it into a slightly longer-term repository depending on how you want to organize it,” Smagala said.

Comparison of GenAI and machine learning

The cleanup strategy must also be consistent with the end goals of the project. When preparing data for traditional machine learning (ML) applications such as predictive analytics and automated assay design, data quality standards must be extremely high.

for machine learning: ML algorithms are very sensitive to bias and noise. “Before embarking on a full AI implementation or machine learning implementation, it is absolutely important to organize your data up front. They rely heavily on starting with the right data and data of a certain quality,” Smagala warns. This requirement is largely reflected in regulatory expectations. For example, FDA’s AI/ML Software as Medical Devices (SaMD) Action Plan places great emphasis on using highly representative, robust, and unbiased datasets to build safe clinical algorithms.
About generative AI (GenAI): If the goal is to use large-scale language models (LLM) or search augmentation generation (RAG) to query a laboratory’s standard operating procedures (SOPs) or historical reports, the laboratory can operate with slightly less structural rigidity. Although GenAI can parse unstructured text more dynamically, it still requires an underlying organizational framework to extract trusted values.

Proving that AI respects biological nuances, not just mathematics

One of the biggest cultural hurdles lab managers face when implementing AI is pushback from veteran scientists. These people have spent decades honing their expertise in their field. To them, neural networks often seem like “black boxes” that perform blind mathematical matrix multiplications without any real understanding of the biological or chemical context.

To be fair, Smagala agrees with the skeptics. “A model that isn’t perfectly tuned, an AI that isn’t perfectly tuned, might not actually respect the biological nuances, and I think a lot of the skeptics who say, ‘I want it to behave a certain way,’ have a valid concern about that,” he says.

Gaining buy-in from highly trained professionals requires active efforts to build trust in the workplace and foster open, two-way communication. To demonstrate that AI tools respect the underlying science, lab managers must pivot from “set-it-and-forget” implementations and embrace active governance. This proactive stance aligns closely with broader global trends in technology legislation, most notably the risk-based compliance layer outlined in the European Union’s AI law. The law requires close human oversight, transparent data recording, and rigorous validation of high-stakes software applications.

Human participation model

AI models don’t work alone. To convince skeptical scientists, lab managers need to demonstrate guardrails. This means implementing the following:

Explicit guidelines and constraints for hard-coding biological rules, standard molecular structures, or thermodynamic limits directly into the preprocessing pipeline.
Rigorous feedback loops and review processes, with seasoned scientists serving as the primary “editors.”
A workflow system where AI accelerates the tedious and repetitive elements of data analysis while human experts retain final approval.

Beyond anecdotes to statistical verification

As labs move toward more complex ML systems, the path to convincing skeptics lies in rigorous scientific validation. A great example of this is the hybrid pipeline developed by Yahara Software in collaboration with human oncology researchers at the University of Wisconsin-Madison.

Previously, fluorescence in situ hybridization (FISH) microscopy evaluation required graduate students to spend hours manually tracing cell boundaries and counting chromosomes to detect genetic abnormalities, a process that was highly susceptible to fatigue and subjective bias. To automate this without losing scientific integrity, the team built a two-tier hybrid pipeline.

Tier 1 (base model segmentation): To identify cell boundaries, the pipeline utilized MicroSAM (μSAM), a microscopy-specific adaptation of Meta’s open-source Segment Anything model. The model had powerful out-of-the-box general segmentation capabilities that allowed the lab to delineate cell boundaries without first having to manually label large proprietary training datasets.
Tier 2 (classical deterministic logic): Once the boundaries were defined, the software completely moved away from machine learning. To count individual fluorescent signals within these boundaries, the pipeline applied a classical and highly stable bright spot detection algorithm.

By using deterministic mathematics for the actual counting step, the analysis process was completely transparent and auditable. If the results are in doubt, the developer or scientist can inspect the exact mathematical parameters of the spot detection algorithm.

“It returns expected results with higher accuracy and consistency rates than most humans can perform,” Smagala said of the validated pipeline. “Now you’re ready to use that model in production. Until that point, it’s natural to be skeptical.”

What are the main red flags that indicate a successful AI pilot cannot scale?

It is relatively easy to make an AI pilot successful. In a controlled, isolated environment with one talented scientist and a clean, well-curated dataset, AI almost always shows great results. But the majority of lab AI pilots fail to scale, according to Smagala. He identified two major red flags that spell pilots’ doom when introduced into a high-volume, operationally ready environment.

Operations and infrastructure blind spots

Successful pilots are inherently small-scale. It’s often run with a personal access key to an API or leverages a curated static dataset.

“These infrastructures used for full-scale operations require a thorough construction plan and typically require support from multiple teams across the enterprise organization,” Smagala points out. Without early strategic alignment, a laboratory’s digital transformation efforts and the introduction of new technologies can create severe change fatigue among technical staff. To keep workflows running smoothly, operations leaders must be proactive in protecting bench technicians from change fatigue during major transitions.

The solution is to involve cross-functional stakeholders, including IT, DevOps, cybersecurity, and data engineers, before writing the first line of pilot code.

“Happy Path” Validation Trap

In pilot mode, developers often establish value with one or two important, clean use cases where nothing goes wrong. No noisy data input, corrupted files, or edge cases.

“If you don’t have a sufficiently rich and robust program for how the data interacts once you scale this up, you’re going to fail,” Smagala warns.

To scale your operations without the administrative attrition that comes with rapid expansion, validation and testing cannot be treated as a simple checklist at the end of the process.

The solution is to treat AI pilots like full-fledged scientific experiments. This includes the incorporation of explicit validation protocols, quality control checkpoints, and edge case test plans to ensure that by the time the pilot is complete, sufficiently diverse data has been sampled to prove production-ready.

What is a practical roadmap to lab AI integration?

step	Priority areas	practical deliverables
1	Data architecture blueprint	Establish a standardized taxonomy of inspection data before purchasing AI tools
2	Targeted data cleanup	Clean, categorize, and archive legacy spreadsheets based on established blueprints using basic AI algorithms
3	Human-participatory verification	Incorporate veteran scientists as key reviewers to build guidelines and constraints into your AI workflows
4	Cooperation between departments	Coordinate proactively with IT, data engineering, and compliance departments during the pilot phase
5	edge case stress test	Run explicit experiments to stress test your AI models and ensure resiliency in large-scale production environments.

Moving a lab from a chaotic spreadsheet-dependent state to a powerful AI-enabled facility is not a purely technical challenge, but an organizational and scientific challenge. By defining a data blueprint up front, respecting the deep expertise of scientific teams, and treating AI implementation with the same scientific rigor applied to physical experiments, labs can avoid common pitfalls and create truly modern, scalable environments.

Source link