Lab AI Data Barriers: How to Maximize ROI

Machine Learning


Sensational headlines highlighting multi-million dollar AI breakthroughs, such as generative drug discovery and AI-assisted interpretation of genomic variations, often make advanced computing feel unattainable for mid-sized labs without large budgets. In a session at the 2026 Leadership Summit on Deploying AI in the Laboratory, Yahara Software’s bioinformatics practice manager James Smagala and chief technology officer Adam Steinert demystified AI in the modern laboratory. Their core message emphasizes that while large-scale drug discovery grabs the headlines, operational AI offers an immediate and high-value return on investment (ROI) for midsize, clinical, and industrial laboratories.

A professional headshot of a smiling man wearing a blazer.

James Smagala, Bioinformatics Practice Manager at Yahara Software.

According to research highlighted during the session, a significant percentage of organizations leveraging AI report moderate to high ROI. The trick to capturing this value is to shift the focus from speculative scientific advances to the fundamental data generated every day.

How is operational AI different from generative tools?

To successfully implement AI, lab managers must first understand where the technology actually fits into their workflow.

Generative AI (Gemini, ChatGPT, Claude, etc.) and advanced bioinformatics represent one end of the spectrum, but labs are seeing much more direct value in targeted machine learning and machine vision. These operational tools address practical bottlenecks.

  • Workflow automation: Bridge the gap between disparate systems (such as instruments and LIMS) to perform seamless laboratory workflow automation.
  • Predictive maintenance of equipment: Analyze service records and technician usage to predict equipment failures through predictive maintenance strategies before interruptions occur.
  • Quality control (QC) automation: Streamline raw data validation, automate quality control routines, and identify anomalous batches early in the analysis pipeline.
  • Machine vision and object detection: Automate cell or particle counting, measurement, and morphological profiling.

By automating these processes, laboratories will not replace human experts. In return, those experts are promoted. Smagala pointed out that automation removes tedium from workflows, improves quality, increases sample throughput, and gives experts the opportunity to focus on critical thinking.

Lead accelerator background

Why does data quality hinder machine learning?

The most important baseline for any AI initiative can be summed up in one of the Yahara team’s favorite sayings: “AI science is data science, and data science is science.” Many labs have been operating for 15, 20, or 25 years and believe they have a treasure trove of training data. But when you try to run an AI project, you run into what Steinert and Smagala call the “unfortunate surprise of AI.” Operational data is rarely cleaned, normalized, or linked to validated results through standard method validation practices. In the real-world project discussed during the session, the lab had 220,000 sample records, but only 800 had enough consistent metadata to be used for machine learning.

A white man with facial hair smiles in a headshot.

said Adam Steinert, Chief Technology Officer, Yahara Software.

Over a decade, SOPs have changed, equipment has been upgraded, software has been patched, and different technicians have different ways of entering data. This causes “temporal drift”. When unnormalized data is fed into an AI model, it introduces significant bias and produces highly consistent but fundamentally inaccurate results.

For lab managers, this means that an AI project is first and foremost a data cleanup project. Cleansing, normalizing, and structuring data can often take up to a year, but if the data foundation is strong, actual AI implementation takes only a fraction of that time.

How can I perform image analysis without a custom model?

Steinert presented a case study that addressed a common bottleneck: manual microscopy analysis.

In collaboration with Dr. Pippa Cosper of the Department of Human Oncology at the University of Wisconsin-Madison, the team investigated a workflow that included FISH (fluorescence in situ hybridization) microscopy. The study required graduate students to manually view cell images, identify boundaries, and count chromosomes to find duplications and deletions. This process was time-consuming, highly iterative, and subject to individual human bias.

To automate this, the team designed a hybrid pipeline.

  • Cell segmentation with pre-trained AI: Leverage MicroSAM, a pre-trained cell image model, to identify cell boundaries within dense clusters without a custom training dataset.
  • Traditional logic for counting: Instead of complex AI models, we implement a classical and highly stable bright spot detection algorithm within the identified boundaries.

This hybrid approach presents important lessons for laboratory managers. Don’t use complex AI models when simple mathematics or classic computer science is sufficient. By combining pre-trained AI vision with traditional algorithms, they built a highly consistent and reproducible pipeline that outputs structured JSON data, allowing Dr. Cosper’s lab to scale their research without manual bottlenecks.

Lab Manager Academy Logo

Advanced Lab Management Certificate

The Advanced Lab Management certification is more than just training; it offers professional benefits.

Earn critical skills and IACET-certified CEUs that make a measurable difference.

What are the long-term validation requirements?

For lab managers accustomed to purchasing commercial off-the-shelf (COTS) software such as LIMS, AI requires a major mindset shift in project management and budgeting.

AI efforts work differently. Although they often start small with minimal upfront costs, operational expenditures (OpEx) grow as systems are aligned, integrated, and scaled throughout the organization. AI models operate on real-world, ever-changing data and require continuous monitoring.

During the session, the team cautioned that systems tend to evolve and iterate over time, so these should be treated as evolving entities rather than set-and-forget projects. Lab managers must plan for ongoing change control, ongoing data quality validation, and periodic model recalibration to ensure long-term accuracy.

This continuous monitoring is also important for regulatory coordination. Agencies such as the Food and Drug Administration (FDA) and the Centers for Medicare and Medicaid Services (CMS) are becoming increasingly adept at evaluating software platforms. For example, the FDA has provided specific guidance on artificial intelligence in medical devices, focusing on lifecycle management, good machine learning practices, and a change management plan in place.

Interested in lab leadership?

Register for free laboratory manager Account to subscribe Laboratory Leadership Digest Newsletter.

Subscribe for free

Similarly, clinical laboratories must align software validation with established CLIA program standards managed by CMS to ensure test accuracy and reliability. Validation of AI tools therefore follows the same rigorous procedures of documented validation as traditional laboratory equipment.

How should teams build their initial roadmap?

If your executive team is pushing for AI adoption, or your lab has identified bottlenecks, the Yahara team recommended the following steps to get your initial efforts underway.

  • Assess data readiness: Catalog your existing data assets and map where they reside, whether they are structured or semi-structured, and how their format has changed over the past five years.
  • Prioritize data quality over problem size: Initial AI pilots target the cleanest, most organized datasets rather than the most complex bottlenecks to ensure a faster, smoother proof of concept.
  • Focus on consistency over accuracy: We prioritize pipeline reproducibility first, then work with scientists to fine-tune parameters to achieve absolute precision.
  • Involve IT and data science early: Engage immediately with technology partners to address security, data privacy (particularly avoiding PII disclosure), and hardware infrastructure as day-one requirements.

AI is no longer just a tool for big tech and big pharma. By focusing on operational data, cleaning up existing pipelines, and leveraging pre-trained models, labs can eliminate manual drudgery, optimize equipment usage, and free scientists to focus on what they do best: real science.



Source link