
Science aims to discover concise, explanatory formulas that are consistent with background theory and experimental data. Traditionally, scientists have derived laws of nature through the manipulation of equations and experimental verification, but this approach could be more efficient. Although the scientific method has advanced our understanding, the rate of discovery and its economic impact have stagnated. This slowdown is due in part to an exhaustion of easily accessible scientific insights. To address this, integrating background knowledge and experimental data is essential to discover complex laws of nature. Recent advances in global optimization methods, driven by improved computational power and algorithms, provide a promising tool for scientific discovery.
Researchers from Imperial College Business School, Samsung AI, and IBM propose a solution to scientific discovery by modeling axioms and laws as polynomials. They use binary variables and logical constraints to solve polynomial optimization problems via mixed integer linear or semidefinite optimization verified with Positivstellensatz proofs. Their method can derive well-known laws such as Kepler's laws and the Radiation-Gravitational-Wave Power Equation from hypotheses and data. This approach ensures consistency with background theory and experimental data and provides formal proofs. Unlike deep learning methods that may produce unverifiable results, their technique ensures scalable and reliable discovery of new scientific laws.
The study establishes basic definitions and notations for scalars, vectors, matrices, sets, etc. Key symbols include b for scalar, x for vector, A for matrix, and Z for set. Various norms and cones found in the SOS optimization literature are defined. Putinar's Positivstellensatz is introduced to derive new laws from existing ones. AI-Hilbert aims to find a low-complexity polynomial model q(x)=0 that is consistent with the axioms G and H, fits the experimental data, and is bounded by degree constraints. The formulated optimization problem uses a hyperparameter λ to balance the fidelity of the model to the data and hypotheses.
AI-Hilbert is a paradigm of scientific discovery that identifies polynomial laws consistent with experimental data and a background knowledge base of polynomial equalities and inequalities. Inspired by David Hilbert's work on the relationship between sums of squares and nonnegative polynomials, AI-Hilbert ensures that discovered laws are axiomatically correct given the background theory. When the background theory is inconsistent, the approach identifies the source of the inconsistency through optimal subset selection and determines the hypothesis that best explains the data. This methodology contrasts with current data-driven approaches that produce erroneous results in limited data settings and are unable to distinguish between valid and invalid findings or explain their derivation.
AI-Hilbert integrates data and theory to formulate hypotheses, and uses theory to reduce the search space and compensate for noisy or sparse data. In contrast, data helps to deal with inconsistent or incomplete theories. The approach formulates a polynomial optimization problem from background theory and data, reduces it to a semidefinite optimization problem, and solves it to obtain candidate formulas and their formal derivations. The method incorporates hyperparameters to control the complexity of the model and defines a distance metric to quantify the relationship between the background theory and the discovered laws. Experimental validation has demonstrated that AI-Hilbert can derive correct symbolic formulas from a complete and consistent background theory without numerical data, handle inconsistent axioms, and outperform other methods in a variety of test cases.
In this work, we introduce an innovative method for scientific discovery that integrates real algebraic geometry and mixed-integer optimization to derive new scientific laws from incomplete axioms and noisy data. Unlike traditional methods that rely solely on theory or data, this approach combines both, enabling discovery in situations where data is scarce and theory is limited. The AI-Hilbert system identifies implicit polynomial relationships between variables, offering advantages in handling non-explicit representations common in science. Future directions include extending the framework to non-polynomial contexts, automating hyperparameter tuning, and improving scalability by optimizing the underlying computational techniques.
Please check paper and detailAll credit for this research goes to the researchers of this project. Also, don't forget to follow us. twitter And our Telegram Channel and LinkedIn GroupsUp. If you like our work, you will love our Newsletter..
Please join us 47,000+ ML subreddits
Check out our upcoming AI webinars here

Sana Hassan, a Consulting Intern at Marktechpost and a dual degree student at Indian Institute of Technology Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, she brings a fresh perspective to the intersection of AI and real-world solutions.