Overcoming Regression Debugging Challenges with Machine Learning

Automatically discover the root cause of simulation regression errors.

popular

Modern semiconductor development requires many electronic design automation (EDA) tools to be run many times over the course of a project. Every step from architectural consideration and design to final implementation and readiness for manufacturing involves multiple methodological loops that must be repeated over and over.

Among such complex development flows, functional simulation stands out. It takes billions of simulation cycles to ensure that a chip design does everything it’s supposed to do without unintended behavior. This is not a one-time effort. Every time a part of the design is changed, the entire simulation test suite, or at least a significant portion of it, must be rerun. The suite expands across validation and development efforts, with additional tests added to validate new features and a focus on areas of the design where bugs are found.

Simulation regression requires a large number of tests to be run on a regular basis. Typically, it runs nightly for the sample set and weekly for the full set. Running these tests consumes a lot of resources, and every time a test fails, it creates a huge challenge. Engineers make mistakes when adding new features to their designs or enhancing test suites, and the resulting errors must be debugged and resolved.

Additionally, some tests that previously passed now fail in the updated regression run. New features often break existing functionality, and code edits can have ramifications. All tests may fail, especially after significant changes have been made to the validation environment. Debugging these failures is primarily manual and requires multiple steps.

Check in latest changes to design and testbench code
Run a regression simulation
Analyze log files containing thousands of test failures
Categorize the errors and sort them into “bins” based on the type of error.
Prioritize each bin to identify where the problem is most likely
Perform Root Cause Analysis (RCA) to try to pinpoint the actual bug
Try to fix the bug by changing the design or verification code
Start the loop over from the beginning.

This process is highly dependent on the expertise of the development engineers. Years of experience allow us to develop a sense of how best to classify, prioritize, and assign defects to the appropriate design and verification engineers for root cause analysis and remediation. However, due to the difficulty of finding enough experts, this manual approach adds significant time and resources to the project. Chip development teams have long sought better ways to manage and debug regression loops.

Recently, artificial intelligence (AI) using machine learning (ML) technology can automatically analyze, binning, triage, investigate, and discover the root causes of regression failures. By leveraging the vast amount of information gleaned from thousands of regression runs on a project, AI can serve as a complement to traditional engineering expertise. By automating and speeding up his three steps for each loop, ML techniques can provide faster and more accurate debugging than manual methods. By enabling engineers to find, understand, and fix bugs faster, ML improves overall debugging efforts by up to 30x.

The Regression Debug Automation (RDA) feature of the Synopsys Verdi Automated Debug System uses such ML techniques to automatically find the root cause of simulation regression failures. RDA classifies and analyzes raw regression failures to identify root causes of failures in designs and testbenches. Automating regression log analysis, binning, triage, and RCA significantly reduces manual effort.

RDA begins by collecting data from regression runs, such as simulation log files, value change dump (trace) files, and simulation databases compiled with the design and testbench. Use ML to explore relationships between validation log failures and bin the results. This process has been shown to accurately determine relevant outcomes 90% of the time and reduce overall triage time. After binning, RDA performs failure analysis and prioritization. Get the failure bins and determine if the problem is design or testbench based based on the failure characteristics.

RDA uses multiple technologies to find the root cause of failures. The design compares test pass and fail signal values to identify different points of failure near the test failure. The visualization shows the RCA paths and changes in signal values in the design. To resolve the root cause of testbench failures, the RDA Debug Facilitator automatically collects debug data for each failure bin. It displays protocol transactions and related details, and uses the reverse debugging capability to go back in time to see the root cause of problems.

Synopsys Verdi RDA includes additional features to further save engineers time and effort.

Failed tests are automatically rerun in simulations with reverse debugging and other debugging features enabled.
Testbench RCA includes recognition of the widely used Universal Verification Methodology (UVM)
RCA is performed on test failures associated with unknown (X) values to reduce the number of groups.
Test failures due to simulation are excluded

All these automated techniques harness the power of ML to accelerate the three most difficult steps of the regression loop. More accurate debugging means that fixes are much more likely to be correct the first time, and the overall number of loops in the project is greatly reduced. Verdi RDA reduces the number of failed tests that need to be debugged, while saving a lot of time and effort to debug all the failed tests. This maximizes regression utilization, focusing manual effort on high-value debugging instead of automatable tasks, and cutting the overall debugging regression effort in a chip project in half.

For more information, please refer to the white paper.

Robert Lewis

(all posts)

Robert Ruiz is Director of Product Management at Synopsys, Inc. Ruiz has held various marketing and technical positions with test automation and functional verification products at Synopsys, Novas Software, and Viewlogic Systems. His background includes over 17 years of experience in advanced test design methodologies and several years as an ASIC designer. Ruiz has his BSEE from Stanford University.

Source link