What do you really need for advanced testing?

Greg Prewitt and Marc Jacobs

Advanced testing has become one of the semiconductor industry’s most promising frontiers. Adaptive binning, feedforward models, and real-time analysis to extract signals from mountains of measurement data. But beneath all that ambition lies a problem. It’s not computing or algorithms. It’s data. More specifically, it’s the humble, fundamental question of whether the data flowing throughout the fab-to-test chain is clean, complete, and properly correlated in the first place.

At our recent PDF Solutions User Conference, we showcased some of the solutions we offer to ensure accurate data is available throughout the testing supply chain.

In this blog post, we identify where the real bottlenecks are, what a good data infrastructure looks like, and why the industry’s aspirations for machine learning have outpaced the data plumbing required to support it.

1. How often is adaptive testing and binning compromised by poor data correlation?

Simply put, it happens more often than engineers realize, and as a result, it’s much harder to detect than simple measurement errors.

When human analysts perform exploratory analysis, intuition allows them to avoid incomplete metadata. They realize that the data is disparate, recognize anomalies, and move in the right direction. That tolerance disappears the moment automation is introduced. Even when test results from one operation are input into the model, or when voltage thresholds and current indicators from a previous step are passed to a subsequent test operation, the computer cannot intuitively avoid broken associations. If the metadata is not aligned, downstream operations will not receive the context they are designed to use.

There are limits to what computers can intuitively understand, so failures will occur if metadata is inconsistent. There are test operations that expect previous measurements or predictions, but they don’t exist because there are no associations in the data.

This is not a special case. Both extracted parameter feedforward and model-driven feature engineering are becoming increasingly mainstream in testing complex advanced packages. The customers who take data quality most seriously make it a formal key performance indicator (KPI) and monitor data standardization metrics and data health scores on a weekly basis. This level of vigilance suggests that data correlation failures are common and require sustained, structured attention rather than reactive debugging.

2. What does a good data infrastructure look like?

Where are manufacturers falling short? The most common is metadata. Obtaining all the correct identity information still seems to be a challenge for people. When test data is fed back into design and process decisions, missing or mismatched identifiers are more than just an inconvenience. The link that enables feedforward and adaptive testing is broken.

There are two fundamental requirements for a good data infrastructure that most manufacturers either lack investment in or incompletely implement.

The first is that the tool collects data. Relying on outsourced semiconductor assembly test (OSAT) to bundle and deliver data after the fact creates opportunities for omissions that compromise the integrity of the data semiconductor companies receive. Conversely, collecting directly at the measurement point provides the best data quality and the most complete image.

The second requirement is a comparison to a system of record (typically a manufacturing execution system (MES) or enterprise resource planning (ERP) system) that knows which lots have been released for build and packaging, which devices should be grouped together, and what the expected lot structure is. Cross-referencing incoming data with its ground truth serves two purposes. One is to allow the data to be augmented and corrected if something is missing or wrong, and the other is to provide a benchmark for continuously evaluating data quality. This is better than relying on manual input or OSAT-provided fields, which may be inconsistent or incomplete.

3. Distinguish between process measurements and interconnect variations

This is an area we call “test process control” and involves two levels of logic.

The first question that needs to be answered is whether the measurement deviation is a measurement problem. These include poor contacts, worn probe cards, leaks in the test setup, or other instrumentation artifacts that can generate false alarms or completely invalidate certain results. These anomalies due to the setup must be ruled out before drawing any conclusions about the process.

Once the setup is confirmed to be suitable, the second step is to monitor a small set of critical parameters that have the greatest impact on the diagnosis. Not all, but most diagnostically relevant, as it is not realistic. For example, one of PDF Solutions’ customers is particularly focused on the die temperature being measured, treating it as a plausibility check before interpreting electrical results. Automated rules can monitor these monitoring parameters and trigger alerts when excursions occur.

Can analytics automatically determine whether an escape occurred in the process or in the test chain? The system can recognize that something is wrong and aggregate metrics to reveal it (daily, weekly, lot-by-lot trends), but automatically narrowing down the root cause is limited by the available data. Distributed manufacturing rarely has access to specific data types across the supply chain, which limits the scope of automated attribution. The system makes suggestions, but engineers still need to close the loop.

Essentially, the value of analysis in this context is not to make a final judgment, but to generate a ranked list of plausible root causes. Trained engineers can quickly reject physically nonsensical proposals and focus their investigation on what remains. If the machine comes up with five possibilities, three of which require further investigation, and two of which are complete nonsense, the engineer will ignore two without spending much time getting there.

4. Where is machine learning being applied effectively and where is there still ambition?

After years of widespread enthusiasm and uncertain adoption, the industry has reached what it calls “de facto acceptance” for one particular use case: using models to generate data for feedforward between test steps.

Today, this mechanism is still largely asynchronous, with each step running the model, designing features, making predictions, and feeding the results back into the next operation. The effect is twofold. Improving testing efficiency and reducing quality risks. This is real, in production with multiple customers, and is the clearest demonstration of the value of machine learning (ML) in today’s test data chain.

What is not yet ready for widespread deployment is synchronous real-time model inference that runs inline with the test operations themselves. While some vendors can technically support this, most still cannot justify the investment or maintain the level of confidence required to process model output in real time without human review.

The issue of confidence is very important. When test data is fed back into design and process decisions, missing or mismatched identifiers are more than just an inconvenience. The link that enables feedforward and adaptive testing is broken. Large language models that advertise 90% accuracy are not performing at the level of accuracy required in semiconductor manufacturing. A 10% error rate may be acceptable for consumer applications, but can be devastating from a high-stakes yield and quality perspective. As a result, human oversight remains essential, not because the model is bad, but because the cost of addressing the output of a bad model is too high to accept unchecked.

If we look more closely, we see a logical endpoint: a model that monitors a model. If the Tier 1 model’s predictions start to diverge from the actual test results, the monitoring model can detect the divergence and flag it before it propagates to downstream decision-making. So far, this is still not commonly done at scale, but the logic is sound since the amount of data generated already exceeds what human reviewers can actually inspect.

PDF Solutions has also developed a product line aimed at filling the gap. The model is used to make preliminary judgments on all the collected data, revealing subsets that require human attention. The goal is to make reviews easier for analysts to handle without having to manually sift through everything.

5. What are the biggest data quality problems manufacturers can actually solve?

There are two possible answers to this question.

The first is consistency in metadata, specifically consistent labeling of tests and measurements across products and facilities. This problem becomes acute when companies try to train models that generalize across product families. Even within a single organization, tests that measure the same underlying electrical characteristics may have different names across product lines, making it impossible for models to recognize correspondences. Usually a group of experienced engineers around a table can solve it. Machines can’t do it, at least not yet. This means investing in naming standards and test data models is not a bureaucratic task. This is a prerequisite for scalable ML.

A second, and perhaps more structural, source of problems is mergers and acquisitions. When two companies merge, they almost inevitably bring incompatible data standards, different MES systems, different lot naming conventions, and different test labeling schemes to the shared organization. Rationalizing these differences is technically demanding, politically complex, and is rarely treated as a top priority during integration. But it sits squarely on the critical path for intelligent testing initiatives that rely on cross-facility data.

There is also a temporal dimension to this problem. There are both future issues (upcoming standardization) and historical issues (what to do with legacy products that were built on older systems and may remain in production for another decade). To derive value from integrated analysis, both parts must be addressed.

Chiplets add an additional layer of complexity. Traceability challenges become acute when integrated devices from multiple companies or multiple internal teams are bundled into a single package. If a package fails, tracing the failure through multiple chiplets, each of which can have its own ID scheme, its own test history, and its own data format, requires an infrastructure that the industry is still building. Traceability within a single company’s chiplet ecosystem is difficult enough. Across company boundaries, you may need a neutral data intermediary that can analyze combined data without exposing proprietary process information to competitors. Although initial efforts in this direction exist, the problem is not yet resolved.

underlying theme

Across all these topics, a consistent theme emerges. That is, the value of intelligent testing is limited by the quality and structure of the data fed to it. Models, analytics, and adaptive algorithms are only as reliable as the measurements and metadata they consume.

The practical implication for manufacturers is that the most leveraged investments currently are not being made in more sophisticated algorithms. They’re in the infrastructure underneath. Direct data collection in the tool, comparison to systems of record, consistent metadata standards, and traceability links that allow testing history to be traced back to the device at every stage of the manufacturing process.

From our perspective, the greatest value that an analytics provider can provide is the ability to collect, align, and normalize data, and deploy models wherever you need them. While the model itself is important, the underlying data engineering platform is even more important.

The industry has overcome the computing bottleneck. The next mountain to climb is the data quality bottleneck, and unlike computing, it can’t be solved by buying better hardware.