Mathematical background
Mixed-integer optimization
Mixed-integer programming is a type of constrained optimization problem that involves decision variables that can be both integer and continuous. A mixed-integer programming problem can be defined as follows:
$$\begin{array}{ll}\mathop{\min }\limits_{{\bf{x}}\in {{\mathbb{R}}}^{n},\,{\bf{y}}\in {{\mathbb{Z}}}^{m}}&f({\bf{x}},{\bf{y}})\\ \,\text{subject to}\,&{g}_{i}({\bf{x}},{\bf{y}})\le 0,\quad i=1,2,\ldots ,k,\\ &{h}_{j}({\bf{x}},{\bf{y}})=0,\quad j=1,2,\ldots ,l.\end{array}$$
where \(f:{{\mathbb{R}}}^{n}\times {{\mathbb{Z}}}^{m}\to {\mathbb{R}}\) is the objective function, which involves both real and integer decision variables. The goal is to minimize (or maximize) this function, subject to inequality constraints gi(x, y) ≤ 0, i = 1, 2, …, k, where \({g}_{i}:{{\mathbb{R}}}^{n}\times {{\mathbb{Z}}}^{m}\to {\mathbb{R}}\), and equality constraints hj(x, y) = 0, j = 1, 2, …, l, where \({h}_{j}:{{\mathbb{R}}}^{n}\times {{\mathbb{Z}}}^{m}\to {\mathbb{R}}\).
If functions f, gi and hi are linear, the problem is a mixed-integer linear programming (MILP) problem. If all variables are continuous, the problem is a linear programming (LP) problem. Problems with integer variables are harder to solve as they are combinatorial in nature.
For these classes of problems, there exist many mathematical solvers that provide exact solutions with optimality guarantees, such as CPLEX, Gurobi, CBC, HIGHs and SCIP. We exploit the MILP representability of many network inference problems, allowing us to use these advanced solvers to find optimal solutions accurately.
Network flows
Network flows23,24 are a well-studied and general class of problems that can be used to solve a wide variety of scientific and engineering problems. They model the distribution of quantities across a network, subject to constraints such as flow conservation and capacity constraints.
Given its versatility, different types of problems can be reduced to or formulated using network flows64,65,66, for which efficient optimization methods have been developed. We build upon this idea to unify diverse network inference problems by reducing them to network flow-based problems. By leveraging this framework, we can extend existing methods to account for joint inference on multi-sample scenarios.
We first introduce the basic network flow problem on a directed graph \({\mathcal{G}}=({\mathcal{V}},{\mathcal{E}})\), where \({\mathcal{V}}\) is the set of vertices and \({\mathcal{E}}\) is the set of directed edges. Let \({\bf{x}}\in {{\mathbb{R}}}^{| {\mathcal{E}}| }\) be the vector of flows on the edges, \({\bf{b}}\in {{\mathbb{R}}}^{| {\mathcal{V}}| }\) be the vector of net flows at the vertices and \({{\bf{x}}}_{max}\in {{\mathbb{R}}}^{| {\mathcal{E}}| }\) be the vector of upper bounds (capacities) for the flows on the edges. Let \(A\in {{\mathbb{R}}}^{| {\mathcal{V}}| \times | {\mathcal{E}}| }\) be the vertex–edge incidence matrix of the graph \({\mathcal{G}}\) defined as
$${A}_{ij}=\left\{\begin{array}{ll}-1\quad &\,\text{if}\,\,{v}_{i}={\mathrm{tail}}({e}_{j}),\\ 1\quad &\,\text{if}\,\,{v}_{i}={\mathrm{head}}({e}_{j}),\\ 0\quad &\,\text{otherwise}\,.\end{array}\right.$$
(1)
where head and tail are functions that, given edge \(e=({v}_{t},{v}_{h})\in {\mathcal{E}}\), return the head (vh) and the tail (vt) of e.
The network flow problem aims to find the vector of flows x that satisfies the flow capacity constraints and flow conservation constraints. This can be expressed as a linear system
$$A{\bf{x}}={\bf{b}},\quad 0\le {\bf{x}}\le {{\bf{x}}}_{{\mathrm{max}}},$$
(2)
where the vector b is defined as
$${{\bf{b}}}_{i}=\left\{\begin{array}{ll}-B\quad &\,\text{if}\,\,{v}_{i}=s,\\ B\quad &\,\text{if}\,\,{v}_{i}=t,\\ 0\quad &\,\text{otherwise},\end{array}\right.$$
where s is the source vertex, t is the sink vertex and B is the total flow from the source to the sink. The vector b represents the net flow entering or leaving each vertex, and it is non-zero only for source and sink vertices, with sources having negative values (net outflow) and sinks having positive values (net inflow).
In typical network flow problems, such as the minimum-cost flow problem, we are interested in finding an optimal point of this set Ω with respect to a linear objective function
$${{\bf{x}}}^{* }=\arg \mathop{\min }\limits_{{\bf{x}}\in \Omega }f({\bf{x}}),$$
where f is frequently defined as f(x) = cTx, with c representing the cost per unit flow on each edge, and Ω is the feasible set of valid flows in the network, defined as
$$\varOmega =\{{\bf{x}}\in {{\mathbb{R}}}^{| {\mathcal{E}}| }| A{\bf{x}}={\bf{b}},\,0\le {\bf{x}}\le {{\bf{x}}}_{{\mathrm{max}}}\}.$$
As the flows are continuous values and the objective function and constraints are linear, the problem is an LP problem. LP problems are efficiently solved using algorithms such as the Simplex and Interior Point methods67, leveraging duality theory to provide bounds on the optimal value. Many mathematical solvers incorporate these algorithms to provide accurate solutions even for large-scale problems.
The canonical form of the network flow problem as an LP can be expressed as the minimization of a linear function subject to (s.t.) linear constraints
$$\begin{array}{ll}\,\text{min.}\,&{{\bf{c}}}^{T}{\bf{x}},\\ \,\text{s.t.}\,&A{\bf{x}}={\bf{b}},\\ &0\le {\bf{x}}\le {{\bf{x}}}_{{\mathrm{max}}}.\end{array}$$
More complex network problems can be modelled by extending this simple network flow model. In particular, network inference problems, in which the goal is not only to find the values of certain parameters but also to determine the structure of the network, can be modelled by incorporating binary indicator variables to decide whether a certain edge is included in the solution. For example, let \({\bf{y}}\in {\{0,1\}}^{| {\mathcal{E}}| }\) be a binary vector where yj = 1 if edge ej is included in the network and yj = 0 otherwise. This extension transforms the problem into a MILP, which can be formulated as follows:
$$\begin{array}{ll}\;\qquad{\text{min.}}&{{\bf{c}}}^{T}{\bf{x}}+{{\bf{d}}}^{T}{\bf{y}},\\ \,{\text{subject to}}\,&A{\bf{x}}={\bf{b}},\\ &0\le {\bf{x}}\le {{\bf{x}}}_{\rm{max}}\odot {\bf{y}},\\ &{\bf{y}}\in {\{0,1\}}^{| {\mathcal{E}}| },\end{array}$$
where d represents the fixed costs associated with including each edge, and ⊙ denotes hadamard (element-wise) product. The constraint 0 ≤ x ≤ xmax ⊙ y ensures that flow can only occur on edges that are included in the network.
The optimization problem represented by this MILP formulation is harder to solve than the standard LP problem owing to the presence of binary decision variables. However, it is more flexible and allows us to model a wide variety of problems that involve searching for the right combinatorial structures, such as paths, trees, acyclic graphs or other types of structures in an exact way. Moreover, MILP solvers are able to provide solutions along with certificates of optimality, ensuring that the solutions meet the user’s defined criteria.
Model
In this section, we present the core model behind CORNETO. Our framework redefines network inference by transforming it into a network flow-based problem, leveraging the mixed-integer optimization framework commonly used in such problems. This unified approach enables joint inference by utilizing multiple samples simultaneously, allowing the model to improve network inference by borrowing information across all samples. In addition, our framework offers several key advantages: (1) all methods are implemented consistently with a common vocabulary and API; (2) we identify and reuse common components across different methods, streamlining development; and (3) our framework provides exact algorithm implementations, allowing solvers to find optimal or suboptimal solutions with certificates of optimality, giving users control over the solution quality.
Hypergraphs
To support a wide variety of methods and PKNs, CORNETO operates on directed hypergraphs. A directed hypergraph \({\mathcal{H}}\) is defined as \({\mathcal{H}}=({\mathcal{V}},{\mathcal{E}})\), where \({\mathcal{V}}\) is a set of vertices, and \({\mathcal{E}}\) is a set of directed hyperedges. Each hyperedge \({e}_{i}\in {\mathcal{E}}\) is an ordered pair (Ti, Hi), with \({T}_{i}\subseteq {\mathcal{V}}\) as the tail and \({H}_{i}\subseteq {\mathcal{V}}\) as the head of the hyperedge.
To accurately model diverse network interactions, particularly those requiring signed coefficients such as stoichiometry in metabolic networks, the vertex–edge incidence matrix for hypergraphs is defined on the basis of these direct coefficients.
Let \({c}_{ij}\in {\mathbb{R}}\) be a coefficient that quantifies the specific, signed involvement of vertex vi in hyperedge ej. The sign of cij indicates the nature of the participation of vi in ej (for example, as a reactant or product, source or target of influence), while its magnitude represents the extent of this participation.
The vertex–edge incidence matrix \({\bf{A}}\in {{\mathbb{R}}}^{| {\mathcal{V}}| \times | {\mathcal{E}}| }\) is defined as
$${{\bf{A}}}_{ij}=\left\{\begin{array}{ll}{c}_{ij}\quad &\,\text{if vertex}\,\,{v}_{i}\,\,\text{is in hyperedge}\,\,{e}_{j},\\ 0\quad &\,\text{otherwise}\,.\end{array}\right.$$
(3)
For a hyperedge ej = (Tj, Hj), the sets Tj (tail) and Hj (head) define the structural components of the hyperedge. When considering a ‘forward’ flow or process through ej (that is, when the corresponding flow variable xj > 0):
-
Vertices vi ∈ Tj typically correspond to inputs or reactants, and their coefficients cij would often be negative (for example cij < 0).
-
Vertices vi ∈ Hj typically correspond to outputs or products, and their coefficients cij would often be positive (for example cij > 0).
For example, for a simple directed-graph edge ej = (vt, vh) representing flow from vt to vh, the typical coefficients would be \({c}_{{v}_{t},\,j}=-1\) and \({c}_{{v}_{h},\,j}=1\). All other \({c}_{{v}_{k},\,j}=0\) for this edge.
For a metabolic reaction ej such as M1 + 2M2 → M3, where Tj = {M1, M2} and Hj = {M3}, the coefficients would be \({c}_{{M}_{1},\,j}=-1\), \({c}_{{M}_{2},\,j}=-2\) and \({c}_{{M}_{3},\,j}=1\).
This definition allows the flow-conservation constraint Ax = 0 (where x is the vector of flows through hyperedges) to correctly represent complex balances, such as stoichiometric steady states in metabolic networks. A negative flow xj for a hyperedge ej would then signify that the process occurs in the reverse direction relative to the signs defined by the cij coefficients, as is commonly done in FBA.
In this framework, a column of A (representing one hyperedge) can contain multiple non-zero entries, with varying signs and magnitudes. This directly reflects hyperedges with multiple vertices in their tail and head sets, enabling the representation of complex relationships such as biochemical reactions (for example A + B ⇌ C + D) as well as or other multi-component processes. This differs from standard graphs, where each column in A (representing an edge) typically has exactly one −1 and one +1 entry (or is all zeros). In this way, the incidence matrix A plays a role directly analogous to the stoichiometric matrix in FBA when applied to metabolic networks.
Because a standard directed graph can be viewed as a particular instance of a directed hypergraph, in the following we will use \({\mathcal{H}}\) to denote hypergraphs (including graphs as a special case) and \({\mathcal{G}}\) to denote only graphs, when specific methods are not designed to operate on hypergraphs.
Data mapping function
Given a dataset \({\bf{D}}\in {{\mathbb{R}}}^{m\times n}\) (m features and n samples, such as genes and cells), and a PKN \({\mathcal{H}}=({\mathcal{V}},{\mathcal{E}})\), the function \(\phi :({\mathcal{H}},{\bf{D}})\to {{\mathcal{H}}}^{{\prime} }\) is defined as
$$\phi :({\bf{D}},{\mathcal{H}})\to {{\mathcal{H}}}^{{\prime} }=({{\mathcal{V}}}^{{\prime} },{{\mathcal{E}}}^{{\prime} }),$$
(4)
where
-
\({{\mathcal{V}}}^{{\prime} }=\{{v}^{{\prime} }\,| \,{v}^{{\prime} }=(v,{{\bf{d}}}_{v})\}\) is the new set of vertices. Each vertex \({v}^{{\prime} }=(v,{{\bf{d}}}_{v})\) is associated with a vector \({{\bf{d}}}_{v}\in {{\mathbb{R}}}^{n}\), which contains the computed data for that vertex across all samples in the dataset. Specifically,
$${{\bf{d}}}_{v}=\left[{d}_{v}^{(1)},{d}_{v}^{(2)},\ldots ,{d}_{v}^{(n)}\right],$$
where \({d}_{v}^{(i)}\in {\mathbb{R}}\) is the computed feature value for vertex v in the ith sample.
-
\({{\mathcal{E}}}^{{\prime} }=\{{e}^{{\prime} }\,| \,{e}^{{\prime} }=(e,{{\bf{d}}}_{e})\}\) is the new set of edges. Each edge \({e}^{{\prime} }=(e,{{\bf{d}}}_{e})\) is associated with a vector \({{\bf{d}}}_{e}\in {{\mathbb{R}}}^{n}\), which contains the computed values related to the interaction between the two vertices connected by edge e across all samples. Specifically,
$${{\bf{d}}}_{e}=\left[{d}_{e}^{(1)},{d}_{e}^{(2)},\ldots ,{d}_{e}^{(n)}\right],$$
where \({d}_{e}^{(i)}\in {\mathbb{R}}\) represents the computed value associated with the edge e for the ith sample.
The data mapping function generalizes the process of transforming omics data into network features by applying specific rules or transformations that align with a PKN. For example, in metabolic networks, gene–protein reaction rules are used to calculate reaction values (edges) based on the expression levels of specific genes from the dataset. Other operations might involve simpler transformations, such as discretization of values, where, for example, gene expression or protein levels are converted into binary states to represent activation or inhibition.
Graph transformation function
This function takes an annotated graph \({{\mathcal{H}}}^{{\prime} }\) as input and returns a new graph by modifying its structure, such as by adding or removing vertices or edges.
Given an annotated graph \({{\mathcal{H}}}^{{\prime} }=({{\mathcal{V}}}^{{\prime} },{{\mathcal{E}}}^{{\prime} })\), we define the graph transformation function \(\psi :{{\mathcal{H}}}^{{\prime} }\to {{\mathcal{H}}}^{{\prime\prime} }\) as
$$\psi ({{\mathcal{H}}}^{{\prime} })={{\mathcal{H}}}^{{\prime\prime} }=\left({{\mathcal{V}}}^{{\prime\prime} }=({{\mathcal{V}}}^{{\prime} }\setminus {{\mathcal{V}}}_{r})\cup {{\mathcal{V}}}_{a},{{\mathcal{E}}}^{{\prime\prime} }=({{\mathcal{E}}}^{{\prime} }\setminus {{\mathcal{E}}}_{r})\cup {{\mathcal{E}}}_{a}\right).$$
The function ψ modifies \({{\mathcal{H}}}^{{\prime} }\) by removing vertices in the set \({{\mathcal{V}}}_{r}\) and edges in \({{\mathcal{E}}}_{r}\), and by adding vertices from \({{\mathcal{V}}}_{a}\) and edges from \({{\mathcal{E}}}_{a}\), resulting in the new vertex set \({{\mathcal{V}}}^{{\prime\prime} }\) and edge set \({{\mathcal{E}}}^{{\prime\prime} }\). These transformations are used to adapt the structure of a given graph to be compatible with a network flow problems and to implement optimization steps to reduce the complexity of the methods.
Each method in CORNETO provides a specific implementation of ψ by specifying how the vertices and edges are added or removed to align with its reformulation based on network flows.
Network flows on bidirectional hypergraphs
We extend the basic network flow problem (equation (2)) to support the modelling of network inference methods on hypergraphs. This extension allows us to incorporate complex biological prior knowledge that is not compatible with simple graph representations68.
Instead of using special types of vertices s and t to maintain flow balance, as in the traditional source–sink model, we extend the concept by adding hyperedges with no tail to inject flow and no head to extract flow. Specifically, to inject a certain amount of flow into a vertex v, it suffices to add a hyperedge \({e}_{{\mathrm{source}}}=({{\emptyset}},\{v\})\), which has an empty tail and the vertex v in its head, with the flow value set to the desired amount (for example, 1 unit). Similarly, to extract flow from a vertex v, a hyperedge \({e}_{{\mathrm{sink}}}=(\{v\},{{\emptyset}})\) can be added to the graph, which has the vertex v in its tail and an empty head. Multiple edges to inject and remove flow can be added to the network.
The feasible set of flows is defined by the constraints on flow conservation and capacity limits, given by
$$\varOmega =\{{\bf{x}}\in {{\mathbb{R}}}^{| {\mathcal{E}}| }| {\bf{A}}{\bf{x}}=0,\,{{\bf{x}}}_{{\mathrm{min}}}\le {\bf{x}}\le {{\bf{x}}}_{{\mathrm{max}}}\},$$
(5)
where xmin represents the lower bounds on the flow values, and xmax represents the upper bounds. If xmin contains negative values, it indicates the possibility of negative flows, meaning that the flow can traverse the edge in the opposite direction. This modification allows bidirectional flows on hyperedges, controlled by the specified upper and lower bounds.
In this formulation, the vector b (which represents external flow demands) is implicitly incorporated into the structure of the hyperedges. By embedding the flow injection at vertex v and extraction at vertex v through special hyperedges esource and esink, the external demands b are integrated into the incidence matrix A of the extended hypergraph. This transformation allows the flow problem to be represented as a homogeneous system Ax = 0, where the flow balance constraints are maintained naturally through the hypergraph structure.
The reformulation of methods in CORNETO’s framework involve the use of graph transformations to manipulate the structure of the graph and to convert the original problem into a network flow-based problem. As a simple example, the shortest path problem from vertex s to vertex t can be reduced to a min-flow problem69 by adding an edge to inject flow through s and extracting the same amount of flow through t and then minimizing the sum of the cost of each edge multiplied by the flow (Supplementary Fig. 2).
Multi-flows on bidirectional hypergraphs
A key feature of CORNETO is its ability to model problems involving multiple samples. This capability requires not only learning individual networks for each sample but also linking flows across these networks to support joint inference. Building upon the previously defined concepts for single network flows, we now extend the notion to multi-flows on bidirectional hypergraphs. This approach allows us to handle multiple types of flow simultaneously, modelling different samples within the same framework.
Similar to multi-commodity network flows70, which involve multiple resources (commodities) moving through a network simultaneously with shared constraints, we extend the formulation to support multiple simultaneous flows. In the context of the framework, each sample or condition is associated with a flow, and each flow can be subjected to different constraints.
Let n be the number of samples (commodities). The flow vector x previously defined is now extended to a flow matrix \({\bf{X}}\in {{\mathbb{R}}}^{| {\mathcal{E}}| \times n}\), where xij represents the flow of sample j along hyperedge ei. Similarly, we define \({{\bf{X}}}_{{\mathrm{min}}},{{\bf{X}}}_{{\mathrm{max}}}\in {{\mathbb{R}}}^{| {\mathcal{E}}| \times n}\) to be the lower and upper bounds for the flow through the edges, for each sample.
Given X, Xmin and Xmax, the multi-flow problem on bidirected hypergraphs is given by
$$\begin{array}{rcl}&&{\bf{A}}{\bf{X}}=0\\ &&{{\bf{X}}}_{{\mathrm{min}}}\le {\bf{X}}\le {{\bf{X}}}_{{\mathrm{max}}}.\\ \end{array}$$
This formulation, however, treats each sample as an independent network flow problem. We will use this as a building block to define other problems, starting from this multi-flow problem. To model interconnected flows, we can introduce additional transformations and constraints on X.
As a simple example, in a multi-commodity problem, instead of having independent edge capacities per commodity, a single capacity constraint for each edge is imposed. The sum of absolute flows across all commodities cannot exceed these shared bounds. Given shared bounds xmin and xmax for all commodities, we transform the multi-flow problem into a multi-commodity flow problem by summing the total flows for each edge:
$$\begin{array}{l}{\bf{A}}{\bf{X}}=0\\ {{\bf{x}}}_{{\mathrm{min}}}\le {\bf{| X| }}{{\bf{1}}}_{n}\le {{\bf{x}}}_{{\mathrm{max}}}.\\ \end{array}$$
Here, the term ∣X∣ represents the absolute value of the flow matrix, ensuring that the sum of flows accounts for any potential negative values. The vector 1n is a column vector of n ones. The multiplication computes the sum of flows across all commodities.
To handle the nonlinearity introduced by the absolute value ∣X∣, it is possible to linearize it by introducing auxiliary variables for each xij in X. This involves defining two non-negative variables, \({x}_{ij}^{+}\) and \({x}_{ij}^{-}\), such that \({x}_{ij}={x}_{ij}^{+}-{x}_{ij}^{-}\) and \(| {x}_{ij}| ={x}_{ij}^{+}+{x}_{ij}^{-}\). This transformation replaces the absolute value constraint with linear constraints, making the problem solvable using standard integer LP techniques.
Constrained optimization problems
In CORNETO, network inference methods are implemented as constrained optimization problems. An optimization problem is defined as a tuple:
$${\mathcal{P}}=(\,f,\varOmega ),$$
(6)
where
-
\(f:{\mathcal{X}}\to {\mathbb{R}}\) is the objective function to be minimized, which operates on the decision variable space \({\mathcal{X}}\);
-
Ω is the feasible set, defined by equalities and inequalities on the variables
$$\varOmega =\left\{x\in {\mathcal{X}}\,\left\vert \,\begin{array}{l}{g}_{i}(x)\le 0,\quad i=1,\ldots ,k,\\ {h}_{j}(x)=0,\quad j=1,\ldots ,l\end{array}\right.\right\}.$$
We also define a trivial objective function as
$${f}_{0}:{\mathcal{X}}\to {\mathbb{R}},\quad {f}_{0}(x)=0\quad \forall x\in {\mathcal{X}}.$$
(7)
In this case, (f0, Ω) simply involves finding any feasible point in Ω.
All methods implemented in the framework use linear functions for f, g and h, over continuous and binary variables (with the exception of the multi-sample metabolic flux adjustment method, where the objective function is quadratic instead of linear).
Problem composition
Problems can be composed to define and reuse subproblems, thereby creating variations of existing methods without rewriting common pieces of code.
Let us consider two constrained optimization problems
$${{\mathcal{P}}}_{1}\,=\,\left(\,{f}_{1},\,{\varOmega }_{1}\right),\qquad {{\mathcal{P}}}_{2}\,=\,\left(\,{f}_{2},\,{\varOmega }_{2}\right),$$
where
-
f1 and f2 are objective functions to be minimized, and
-
Ω1 and Ω2 are their respective feasible sets (each expressed in terms of the variables that appear in the corresponding problem).
We build the composed problem \({{\mathcal{P}}}_{c}\,=\,\left({\,f}_{c},\,{\Omega }_{c}\right)\) as follows:
-
Combined objective. Choose non-negative weights α and β and set
$${f}_{c}\,=\,\alpha \,{f}_{1}+\beta \,{f}_{2}.$$
-
Combined feasible set.
$${\varOmega }_{c}\,=\,{\varOmega }_{1}\,\cap \,{\varOmega }_{2},$$
that is, all constraints from \({{\mathcal{P}}}_{1}\) and \({{\mathcal{P}}}_{2}\) must hold simultaneously. Variables that appear in both original problems are implicitly identified; variables that appear in only one remain independent.
The composed optimization problem therefore reads
$$\mathop{\min }\limits_{x\in {\varOmega }_{c}}\,{f}_{c}(x).$$
To model, compose and solve constrained optimization problems in Python, CORNETO implements a multi-backend strategy that decouples the high-level modelling of network inference problems from the low-level canonical forms used by solvers. It currently supports CVXPY71 and PICOS72. These backends also support sparse matrices and vectorization, which CORNETO leverages for efficient implementation of methods.
Building blocks
CORNETO defines a series of building blocks, composed by a set of variables and constraints, that can be used to construct subproblems. These subproblems can be combined to compose the feasible set of a given network inference method, representing the valid space of solutions with specific properties.
By combining these building blocks, it is possible to develop network inference methods without the need to reimplement common constraints, such as enforcing acyclicity on a graph. For example, Steiner tree problems explore the space of acyclic trees, and as such, we can easily build tree-based problems by imposing acyclicity on flows, which would serve as the foundation for all methods that infer tree-like networks. This approach not only simplifies the process of constructing new algorithms but also ensures that any improvements or extensions to the building blocks can be easily reused across multiple methods, allowing broader and more efficient enhancements without the need for redundant implementations.
Flow
Given a directed (hyper)graph \({\mathcal{H}}=({\mathcal{V}},{\mathcal{E}})\), and upper and lower bounds for each edge in each sample \({{\bf{X}}}_{{\mathrm{min}}},{{\bf{X}}}_{{\mathrm{max}}}\in {{\mathbb{R}}}^{| {\mathcal{E}}| \times n}\),
$${\varOmega }_{{\mathrm{Flow}}}({\mathcal{H}})=\{{\bf{X}}| {\bf{A}}{\bf{X}}=0,\,{{\bf{X}}}_{{\mathrm{min}}}\le {\bf{X}}\le {{\bf{X}}}_{{\mathrm{max}}}\},$$
(8)
where \({\bf{X}}\in {{\mathbb{R}}}^{| {\mathcal{E}}| \times n}\) is the variable representing the flows for each edge and sample. A is the vertex–edge incidence matrix of \({\mathcal{H}}\) (equation (1)).
Free variable indicator constraints
Given \({\bf{X}}\in {{\mathbb{R}}}^{m\times n}\) such that X ∈ [Xmin, Xmax],
$${\varOmega }_{{\mathrm{Free}}}({\bf{X}})=\{{\bf{Y}}| \,{{\bf{X}}}_{{\mathrm{min}}}\odot {\bf{Y}}\le {\bf{X}}\le {{\bf{X}}}_{{\mathrm{max}}}\odot {\bf{Y}}\},$$
(9)
where Y ∈ {0, 1}m×n is a binary variable such that Yi, j = 0 ⇒ Xi, j = 0.
Non-zero indicator constraints
Given \({\bf{X}}\in {{\mathbb{R}}}^{m\times n}\), and X ∈ [Xmin, Xmax],
$${\varOmega }_{\ne 0}({\bf{X}})=\left\{{\bf{Y}}\in {\{0,1\}}^{m\times n}\,| \,{\bf{Y}}\,\text{satisfies}\,({\rm{C}}1)-({\rm{C}}3)\right\},$$
(10)
where the constraints are defined as
$${\bf{Y}}={{\bf{Y}}}^{+}+{{\bf{Y}}}^{-}$$
(C1)
$${{\bf{X}}}_{\min }\odot {{\bf{Y}}}^{-}+\epsilon {{\bf{Y}}}^{+}\le {\bf{X}}\le {{\bf{X}}}_{\max }\odot {{\bf{Y}}}^{+}-\epsilon {{\bf{Y}}}^{-}.$$
(C3)
Here, Y+, Y− ∈ {0, 1}m×n variables are introduced for linearizing the samples on X, such that \({{\bf{Y}}}_{i,\,j}^{+}=1\ \iff \ {{\bf{X}}}_{i,\,j}\ge \epsilon\) (indicating that X is active in the positive direction if and only if \({{\bf{Y}}}_{i,\,j}^{+}=1\)), and \({{\bf{Y}}}_{i,\,j}^{-}=1\ \iff \ {{\bf{X}}}_{i,\,j}\le -\epsilon\) (indicating that X is active in the negative direction if and only if \({{\bf{Y}}}_{i,\,j}^{-}=1\)).
Acyclic
Given a graph \({\mathcal{G}}=({\mathcal{V}},{\mathcal{E}})\), and the associated X flow variables,
$${\varOmega }_{{\rm{Acyc}}}({\mathcal{G}})=\left\{{\bf{Y}}\in {\varOmega }_{\ne 0}({\bf{X}})\,\left\vert \,{\bf{Y}}\,\text{satisfies}\,({\rm{C}}4)-({\rm{C}}6)\right.\right\},$$
(11)
where the constraints are defined as
$$1\le {{\bf{L}}}_{v,:}\le | {\mathcal{V}}| ,\,\forall v\in {\mathcal{V}}$$
(C4)
$${{\bf{L}}}_{v,:}-{{\bf{L}}}_{u,:}\ge {{\bf{Y}}}_{e,:}^{+}+(1-| {\mathcal{V}}| )(1-{{\bf{Y}}}_{e,:}^{+})$$
(C5)
$${{\bf{L}}}_{u,:}-{{\bf{L}}}_{v,:}\ge {{\bf{Y}}}_{e,:}^{-}+(1-| {\mathcal{V}}| )(1-{{\bf{Y}}}_{e,:}^{-})$$
(C6)
for all edges \(e=(u,v)\in {\mathcal{E}}\). Here, \({\bf{L}}\in {{\mathbb{R}}}^{| {\mathcal{V}}| \times n}\) is a continuous variable matrix that encodes a valid topological ordering of the vertices in the graph for each of the n samples. \({\bf{Y}}\in {\{0,1\}}^{| {\mathcal{E}}| \times n}\) is a non-zero binary indicator for the flows (equation (10)). We use u and v to refer to the indices of the source and target vertices of an edge in L. The constraints are enforced for all columns (samples) in the matrices. Constraint (C4) ensures that, for each vertex v, its assigned order Lv,: is between 1 and \(| {\mathcal{V}}|\). Constraints (C5) and (C6) enforce that, if an edge \(e=(u,v)\in {\mathcal{E}}\) is directed from u to v (that is, \({{\bf{Y}}}_{e,:}^{+}=1\)), then the topological order of u must be less than that of v across all samples; specifically, Lv,: − Lu,: ≥ 1. Similarly, if the edge is directed from v to u (that is, \({{\bf{Y}}}_{e,:}^{-}=1\)), then the order of v must be less than that of u; that is, Lu,: − Lv,: ≥ 1. These conditions guarantee that the resulting topological ordering is consistent with an acyclic graph structure for all samples, depending on the directionality of the edges as defined by Y+ and Y−.
Note that the acyclicity constraint is specifically defined for graphs and not for hypergraphs. It is used in methods that aim to find directed acyclic graphs or tree structures within a PKN.
Tree
A tree is a directed acyclic graph where each vertex has at most one incoming edge. The tree constraint can be enforced by building on the acyclic structure \({\varOmega }_{{\rm{Acyc}}}({\mathcal{G}})\) and adding a constraint that ensures each vertex \(v\in {\mathcal{V}}\) has no more than one incoming edge across all samples. The constraint is formulated as follows:
$${\varOmega }_{{\rm{Tree}}}({\mathcal{G}})=\left\{{\bf{Y}}\in {\varOmega }_{{\rm{Acyc}}}({\mathcal{G}})\,\left\vert \,\sum _{e\in \,\text{In}\,(v)}{{\bf{Y}}}_{e,:}\le 1,\,\forall v\in {\mathcal{V}}\right.\right\},$$
(12)
where In(v) denotes the set of incoming edges to vertex v.
CORNETO’s generalized joint network inference problem from prior knowledge
We are now ready to introduce the general optimization problem that CORNETO proposes, which generalizes network inference methods from prior knowledge to a joint optimization with a common structure for all methods within the framework.
The objective of this joint optimization is to estimate the network structure for each sample by leveraging information from all samples, thereby improving accuracy and reducing variance in the predicted networks. CORNETO formulates this problem as a constrained optimization task, incorporating a structured sparsity-inducing penalty through network flows.
Given the union (annotated) graph \({{\mathcal{H}}}_{u}\), the general optimization problem is defined as
$$\begin{array}{ll}\mathop{\min }\limits_{{\bf{X}},{\bf{Y}}}&\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}f({{\bf{X}}}_{:,i},{{\bf{Y}}}_{:,i};{{\mathcal{H}}}_{u})+\lambda \parallel {\bf{Y}}{{\bf{1}}}_{n}{\parallel }_{0}\\ \,\text{s.t.}\,&{\bf{X}}\in {\varOmega }_{{\mathrm{Flow}}}({{\mathcal{H}}}_{u})\\ &{\bf{Y}}\in {\varOmega }_{{\mathrm{Free}}}({\bf{X}})\\ &({\bf{X}},{\bf{Y}})\in {\varOmega }_{{\mathcal{M}}}({\bf{X}},{\bf{Y}}),\end{array}$$
(13)
where
-
\(f({{\bf{X}}}_{:,i},{{\bf{Y}}}_{:,i};{{\mathcal{H}}}_{u})\) is the linear objective function term for sample i. The inclusion of \({{\mathcal{H}}}_{u}\) indicates that the specific values for the variables of the problem are derived from data corresponding to sample i within the annotations of the union graph \({{\mathcal{H}}}_{u}\). For example, if f involves edge costs for sample i, these costs are extracted from the ith component of the edge data vectors de (where \(e\in {{\mathcal{E}}}_{u}\)) that were established by the data mapping function ϕ.
-
\({\bf{X}}\in {{\mathbb{R}}}^{| {{\mathcal{E}}}_{u}| \times n}\) represents the network flows variable, where each row corresponds to the flow of an edge and each column corresponding to the flows of a sample.
-
\({\bf{Y}}\in {\{0,1\}}^{| {{\mathcal{E}}}_{u}| \times n}\) is a binary indicator matrix, such that Yi, j = 0 ⇒ Xi, j = 0. This variable can be used to block flows through specific edges in the network.
-
\({\varOmega }_{{\mathcal{M}}}\) are method-specific constraints involving X and Y.
-
λ is a regularization parameter controlling sparsity across samples.
-
1n is a vector of ones of length n.
The structured sparsity-inducing penalty λ∥Y1n∥0, where λ is a regularization parameter controlling the trade-off between sparsity and fit, enables joint inference by minimizing the number of active edges (non-zero entries in Y, which are edges potentially carrying flow) that are present in any sample or condition. The ℓ0 regularization can be implemented in a MILP problem by linearizing the expression (Supplementary Information).
CORNETO’s network inference method definition
Based on the previously defined components, a network inference method in this framework is defined as a tuple \({\mathcal{M}}=(\phi ,\psi ,{{\mathcal{P}}}_{{\mathcal{M}}})\), with \({{\mathcal{P}}}_{{\mathcal{M}}}=(\,f,{\varOmega }_{{\mathcal{M}}})\).
Given a method \({\mathcal{M}}\), a PKN \({\mathcal{H}}\) and a data matrix \({\bf{D}}\in {{\mathbb{R}}}^{m\times n}\), the general optimization problem is constructed by applying the following steps:
-
(1)
\({{\mathcal{H}}}^{{\prime} }=\phi ({\mathcal{H}},{\bf{D}})\)
-
(2)
\({{\mathcal{H}}}_{i}^{{\prime\prime} }=\psi ({{\mathcal{H}}}^{{\prime} }),\quad \forall i\in \{1,2,\ldots ,n\}\)
-
(3)
\({{\mathcal{H}}}_{u}=\left(\mathop{\bigcup }\nolimits_{i = 1}^{n}{{\mathcal{V}}}_{i}^{{\prime\prime} },\mathop{\bigcup }\nolimits_{i = 1}^{n}{{\mathcal{E}}}_{i}^{{\prime\prime} }\right)\)
-
(4)
Build \({\mathcal{P}}\) by plugging \({{\mathcal{P}}}_{{\mathcal{M}}}\) into the global problem template (equation (13)).
Shortest paths
We begin by illustrating how the simple shortest path problem can be implemented in this framework, and extended to multi-sample settings.
Given a directed graph \({\mathcal{G}}=({\mathcal{V}},{\mathcal{E}})\), with \({\bf{d}}\in {{\mathbb{R}}}_{+}^{| {\mathcal{E}}| }\) representing the non-negative costs associated with the edges, and two vertices \(s,t\in {\mathcal{V}}\), the shortest path problem can be reformulated as a flow optimization problem73. To do this, we need to transform the original graph into \({{\mathcal{G}}}^{{\prime} }\) by adding two edges \(({{\emptyset}},\{s\})\) and \((\{t\},{{\emptyset}})\) to inject or extract flow through the s–t vertices, and then we can use \({\varOmega }_{{\mathrm{Flow}}}({\mathcal{G}})\), defined previously:
$$\begin{array}{ll}\mathop{\min }\limits_{{\bf{x}}}&{{\bf{d}}}^{\top }{\bf{x}}\\ \,\text{s.t.}\,&{\bf{x}}\in {\varOmega }_{{\mathrm{Flow}}}({\mathcal{G}})\\ &{{\bf{x}}}_{({{\emptyset}},s)}=1\\ &{{\bf{x}}}_{(t,{{\emptyset}})}=1.\end{array}$$
(14)
The objective is to minimize the total cost of flow, where x represents the flow along the edges, constrained by xmin = 0 and xmax = 1. The constraints \({{\bf{x}}}_{({{\emptyset}},s)}=1\) and \({{\bf{x}}}_{(t,{{\emptyset}})}=1\) inject and extract one unit of flow at the source s and target t, respectively, and also guarantee that the solution is not the trivial zero flow. These constraints are specific to the shortest path method.
Multi-sample shortest paths
Given that we can reformulate the shortest path problem in CORNETO using flows, the method can be extended to handle multiple samples simultaneously, where each sample has its own source, target, and edge costs. In this case, instead of a single cost vector c, we have a matrix \({\bf{D}}\in {{\mathbb{R}}}_{+}^{| {\mathcal{E}}| \times n}\), where each column corresponds to the costs for a different sample, as well as a list of source vertices {s1, s2, …, sn} and a list of target vertices {t1, t2, …, tn} for the n samples.
The goal is to solve the shortest path problem for all n samples simultaneously, while also introducing a regularization term to promote edge sparsity across samples. This is formulated as a joint optimization problem:
$$\begin{array}{ll}\mathop{\min }\limits_{{\bf{X}}}&\frac{1}{n}\mathop{\sum }\limits_{i=1}^{n}{{\bf{D}}}_{:,i}^{\top }{{\bf{X}}}_{:,i}+\lambda \parallel {\bf{Y}}{{\bf{1}}}_{n}{\parallel }_{0}\\ \,\text{s.t.}\,&{\bf{X}}\in {\varOmega }_{{\mathrm{Flow}}}({\mathcal{G}})\\ &{\bf{Y}}\in {\varOmega }_{{\mathrm{Free}}}({\bf{X}})\\ &{{\bf{X}}}_{i,\,j}=1\quad \forall i| {e}_{i}=({{\emptyset}},\{{s}_{j}\}),\,j=1,\ldots ,n\\ &{{\bf{X}}}_{i,\,j}=0\quad \forall i| {e}_{i}=({{\emptyset}},\{{s}_{k}\}),\,\forall k\ne j,\,j=1,\ldots ,n\\ &{{\bf{X}}}_{i,\,j}=1\quad \forall i| {e}_{i}=(\{{t}_{j}\},{{\emptyset}}),\,j=1,\ldots ,n\\ &{{\bf{X}}}_{i,\,j}=0\quad \forall i| {e}_{i}=(\{{t}_{k}\},{{\emptyset}}),\,\forall k\ne j,\,j=1,\ldots ,n,\end{array}$$
(15)
where s1, s2, …, sn are the input vertices for the shortest path associated with each sample, and t1, t2, …, tn are the target or destination vertices.
The last four constraints involving the flow matrix X ensure that, for each sample j, flow is injected into the graph only through the edges connected to the specific source node sj and is extracted only through the edges connected to the specific target node tj. All other edges involving source or target vertices not associated with sample j have no flow. This guarantees that the flow is restricted to the relevant edges for each sample, maintaining the integrity of the source–target pair for the shortest path. It should be noted that, when there is one single sample, these constraints reduce to the ones in the single shortest path (equation (14)).
The single- and multi-sample shortest path problem can then be defined in CORNETO as \({{\mathcal{M}}}_{SP}=(\phi ,\psi ,{{\mathcal{P}}}_{SP})\) and \({{\mathcal{P}}}_{SP}=(\,f,{\varOmega }_{SP})\), where
-
\(\phi ({\bf{D}},{\mathcal{G}})=(\{{v}^{{\prime} }\,| \,{v}^{{\prime} }=(v,{\bf{0}})\},\{{e}^{{\prime} }\,| \,{e}^{{\prime} }=(e,{{\bf{d}}}_{e})\})\), that is, data mapping trivially maps the data (weights or cost of the edges) to the edges of the graph, and vertices do not have any cost associated to them.
-
\(\psi ({{\mathcal{G}}}^{{\prime} })=\left(\right.{{\mathcal{V}}}^{{\prime} },{{\mathcal{E}}}^{{\prime} }\cup \{({e}_{s},{\bf{0}}),({e}_{t},{\bf{0}})\}\), where \({e}_{s}=({{\emptyset}},\{s\})\) and \({e}_{t}=(\{t\},{{\emptyset}})\), and \(s,t\in {{\mathcal{V}}}^{{\prime} }\) are the provided source and target vertices.
-
f(x) = d⊤x, where d is a column vector with the costs of the edges for a given sample, and x is the variable vector of flows (for a given sample) provided by CORNETO in the general problem (equation (13)).
-
ΩSP corresponds to the last four constraints of equation (15) involving the injection or extraction of a unit of flow, which is the method specific set of constraints to formulate the shortest path as a flow problem.
By plugging f and ΩSP into equation (13), we already have multi-sample version of the shortest path problem in CORNETO. Note that, if there is only one sample and λ = 0, the problem is equivalent to a standard shortest path. If there is only one sample and λ > 0, then the single shortest path is regularized so, among all the paths with the mininum cost (if there is more than one solution), the shortest one (less number of edges) is returned. If there are n samples and λ = 0, then the problem is equivalent to solve independently each shortest path.
Supplementary Fig. 2 shows an example for all the components of the framework for the single shortest path problem.
Steiner trees
The Steiner tree problem can be defined as follows. Given an undirected graph \({\mathcal{G}}=({\mathcal{V}},{\mathcal{E}})\), and a set of terminal vertices \(T\subseteq {\mathcal{V}}\) with ∣T∣ ≥ 2, the goal is to find a tree \(S=({{\mathcal{V}}}_{S},{{\mathcal{E}}}_{S})\) within the graph \({\mathcal{G}}\) that includes all the terminal vertices, that is, \(T\subseteq {{\mathcal{V}}}_{S}\), while minimizing the total weight of the edges in S.
Steiner trees have two basic constraints: (1) all terminals must be selected as part of the solution; and (2) the solution is a tree. The reformulation of Steiner trees in CORNETO thus requires expressing them using the concepts defined in the framework. First, for the first constraint, the initial graph is transformed such that a new edge is added to every terminal vertex. One vertex is chosen as a root, and an edge to inject flow is added, and for the remaining terminals, we add one edge to extract flow. If no root is provided, the first terminal is selected as a root. It should be noted that, in a Steiner tree (undirected), any terminal vertex could be a root because all the terminals have to be connected. A constraint is added such that the input edge injects u ≥ 1 units of flow, and every other edge has to extract at most \(\frac{u}{| T| -1}\) of the injected flow. This constrains the space of feasible solutions to the space of subgraphs where the terminals are connected to the root through any path from the PKN.
The method SteinerTree, which uses a directed graph \({\mathcal{G}}\), can be implemented as follows:
-
\({{\mathcal{G}}}^{{\prime} }=\phi ({\bf{D}},{\mathcal{G}})=(\{{v}^{{\prime} }\,| \,{v}^{{\prime} }=(v,{\bf{0}})\},\{{e}^{{\prime} }\,| \,{e}^{{\prime} }=(e,{{\bf{d}}}_{e})\})\)
-
\({{\mathcal{G}}}^{{\prime\prime} }=\psi ({{\mathcal{G}}}^{{\prime} })\)\(=\left(\right.{{\mathcal{V}}}^{{\prime} },{{\mathcal{E}}}^{{\prime} }\cup \{(({{\emptyset}},{r}_{j}),{\bf{0}})\}\)\(\cup \left\{((t,{{\emptyset}}),{\bf{0}})|\right.\)\(\left.\left.t\in T\setminus \{{r}_{j}\}\right\}\right),\)\(\forall j\in\)\(\{1,2,\ldots ,n\}\) that is, an edge to inject flow is added to the root vertex, and one edge to extract flow is added to every other terminal that is not the root. These edges have no cost (de = 0). There can be one different root and set for terminals per sample, indexed by j.
-
ΩST further restricts the variables of the main problem, by imposing:
-
– \({\bf{Y}}\in {\varOmega }_{{\mathrm{Tree}}}({{\mathcal{G}}}_{u})\), where \({{\mathcal{G}}}_{u}\) is the union graph of the n modified graphs \({{\mathcal{G}}}^{{\prime\prime} }\).
-
– For each sample s, the flow injected into the root terminal rj through the edge \(({{\emptyset}},{r}_{j})\) must be equal to u ≥ 1:
$${{\bf{X}}}_{({{\emptyset}},{r}_{j}),\,j}=u\quad \forall j\in \{1,2,\ldots ,n\}.$$
-
– For each sample s, the flow extracted from each non-root terminal t ∈ Tj⧹{rj} through the edge \((t,{{\emptyset}})\) must be equal to \(\frac{u}{| {T}_{j}| -1}\):
$${{\bf{X}}}_{(t,{{\emptyset}}),\,j}=\frac{1}{| {T}_{j}| -1}\quad \forall j,\,\forall t\in {T}_{j}\setminus \{{r}_{j}\}.$$
-
– Each edge that injects flow through a root defined for a different sample cannot inject flow in the current sample:
$${{\bf{X}}}_{({{\emptyset}},r),\,j}=0\quad \forall j,\,\forall r\ne {r}_{j}.$$
-
– Similarly, for the edges to extract flow through the terminals,
$${{\bf{X}}}_{(t,{{\emptyset}}),\,j}=0\quad \forall j,\,\forall t\notin {T}_{j}.$$
-
-
f(y) = d⊤y, which minimizes the selection of edges in the tree.
For directed Steiner trees, the root node must be provided, because the reachability of vertices changes depending on the directions of the edges. Directionality of flows is controlled by lower and upper bounds on the edges. If an edge has a negative lower bound and positive upper bound, the edge behaves as a undirected edge. If the lower bound is 0, the edge is directed.
PCST
The PCST problem can be defined as follows. Given a graph \({\mathcal{G}}=({\mathcal{V}},{\mathcal{E}})\) and a non-negative prize function \(p:V\to {{\mathbb{R}}}^{+}\) that assigns a prize to each vertex, along with a weight function \(w:E\to {{\mathbb{R}}}^{+}\) that assigns a cost to each edge, the goal is to find a tree \(S=({{\mathcal{V}}}_{S},{{\mathcal{E}}}_{S})\) within the graph \({\mathcal{G}}\) that minimizes the total cost of the edges minus the total prize of the vertices included in the tree, minimizing
$$\sum _{e\in {E}_{S}}w(e)-\sum _{v\in {V}_{S}}p(v).$$
The solution to the PCST problem balances the cost of connecting the vertices (using the edges) against the benefits (prizes) obtained by including vertices in the tree.
PCST in CORNETO is implemented using the Steiner tree implementation, with minimal changes. First, because terminals are not forced to be selected but they are part of the optimization, flow is not forced through them. Instead, the selection of prized terminals is moved to the objective function. Thus, the objective function is defined as
$$f({\bf{y}})={{\bf{d}}}^{\top }{\bf{y}}-{{\bf{p}}}^{\top }{\bf{y}},$$
where \({\bf{d}}\in {{\mathbb{R}}}^{| {{\mathcal{E}}}_{u}| }\) is the vector containing the edge costs for a given sample, and \({\bf{p}}\in {{\mathbb{R}}}^{| {{\mathcal{E}}}_{u}| }\) the vector with prizes (associated to the attached edges for removing flow through the prized terminals) taken from the annotated union graph \({{\mathcal{G}}}_{u}\).
It should be noted that the vectors d and p are mutually exclusive, that is, for any index i, if di ≠ 0, then pi = 0, and if pi ≠ 0, then di = 0. This condition ensures that an edge either contributes to the edge cost (through d) or is associated with a prize (through p), but not both, making the problem equivalent to the original definition of the PCST.
FBA
FBA16 is a mathematical approach used to analyse the flow of metabolites through a metabolic network. Given a metabolic network represented by a stoichiometric matrix S, where each element Si,j indicates the stoichiometric coefficient of metabolite i in reaction j, FBA aims to find the distribution of fluxes \({\bf{v}}\in {{\mathbb{R}}}^{n}\) through the network that maximizes or minimizes a particular objective function, such as the biomass production of a given organism.
The key assumptions and components of FBA are as follows:
-
Steady-state assumption: the system is assumed to be in a steady state, meaning that the concentration of each metabolite does not change over time. The flux represents the rate at which each reaction occurs, typically expressed in units of millimoles per gram of dry weight per hour.
-
Constraints: the fluxes are subject to constraints based on reaction capacities.
-
Objective function: a linear objective function of the fluxes, such as maximizing biomass yield (for example, output flux from an artificial biomass reaction in the metabolic network).
A genome-scale metabolic network can be modelled in CORNETO using a hypergraph \({\mathcal{H}}=({\mathcal{V}},{\mathcal{E}})\), by setting the coefficients of the incidence matrix A to the stoichiometric coefficients of the metabolic network. Then, the stoichiometric matrix S is equivalent to the incidence matrix of the hypergraph. By using the lower and upper bounds on reaction fluxes, the set of feasible metabolic fluxes is exactly the set of feasible flows in the hypergraph \({\varOmega }_{{\mathrm{Flow}}}({\mathcal{H}})\) (equation (5)). Thus, FBA is a particular instance of a single-sample (hyper)network flow on CORNETO with λ = 0.
The objective function is defined as
$$f({\bf{x}})=-{{\bf{c}}}^{\top }{\bf{x}},$$
where \({\bf{c}}={\{0,1\}}^{| {{\mathcal{E}}}_{u}| }\) is a provided constant vector indicating the reaction fluxes to be maximized.
The FBA method does not need data mapping or graph transformations, because no omics data are used, and the PKN typically contains import and export reactions such that the space of feasible solutions contain non-zero flux (flows) solutions. CORNETO provides a method that uses COBRApy74 to import any genome-scale metabolic network as a CORNETO hypergraph.
sFBA
sFBA31,75 is a variant of FBA that seeks to find an optimal flux distribution with the additional constraint of minimizing the number of active reactions (reactions carrying non-zero flux) in the metabolic network. The goal is to identify the sparsest set of reactions that can support a given metabolic objective (for example, maximizing biomass production). This method is particularly useful for identifying core metabolic functions or simplifying metabolic networks for analysis.
To find the smallest set of active reactions (that is, those with non-zero fluxes) in a metabolic network that maximizes a given linear objective function, sFBA can be exactly implemented by activating the ℓ0 penalty on the selected edges y (λ > 0), and adding a constraint on the fluxes, for example, \({{\bf{x}}}_{{\mathrm{biomass}}}\ge 0.95\times {{\bf{x}}}_{{\mathrm{biomass}}}^{* }\), where \({{\bf{x}}}_{{\mathrm{biomass}}}^{* }\) is the optimal value of biomass flux in the standard FBA solution. This constraint ensures that the sparse solution retains at least 95% of the maximum possible biomass production, while minimizing the number of active reactions.
The multi-sample sFBA method implemented with CORNETO extends this definition to find a metabolic network per condition such that the union of the networks is minimal, while still satisfying all the metabolic constraints of each condition, such as different cell substrates, different reaction flux bounds or any other type of constraint typically used in FBA modelling. This is achieved by reformulating the sFBA problem using CORNETO’s framework, which creates n simultaneous flow problems that are equivalent to n coupled FBA problems. The metabolic fluxes for each problem are captured in the \({\bf{X}}\in {{\mathbb{R}}}^{| {{\mathcal{E}}}_{u}| \times n}\) variable created by CORNETO, and the flux indicator variable \({\bf{Y}}\in {\{0,1\}}^{| {{\mathcal{E}}}_{u}| \times n}\), which indicates if reaction i in condition j can carry metabolic flux or is blocked in all samples. Instead of minimizing the selection of reactions with non-zero flux within each individual condition, we aim to minimize it across all samples simultaneously. We define sparsity as the minimum number of different selected reactions. Thus, maximizing sparsity across samples is equivalent to minimizing the total number of unique reactions in the union of the n different inferred metabolic networks. This definition of structured sparsity is exactly what CORNETO captures with the regularization term λ∥Y1n∥0.
It should be noted that, in sFBA, the only term in the objective function is the minimization of active reactions. As a result, increasing λ does not have a progressive effect; it only distinguishes between λ = 0 (no sparsity) and λ > 0 (sparsity applied).
Multi-sample sFBA for flux adjustment
We extended the idea of sFBA to support flux adjustment by incorporating an additional ℓ2 regularization on the flux values (with parameter β) to control their magnitude, while maintaining global sparsity through a structured ℓ0 penalty (with parameter λ). The method minimizes the discrepancy between the predicted and observed fluxes, subject to the same metabolic constraints as standard sFBA. The objective functon that optimizes is
$$\mathop{\min }\limits_{\mathbf{X},\mathbf{Y}}\,\parallel \mathbf{M}\,\odot\,(\mathbf{X}-\hat{\mathbf{X}})\parallel_{2,1}+\beta\,\parallel \mathbf{X}\parallel_{2,1}+\lambda\,\parallel \mathbf{Y}\parallel_{0}$$
where ∥⋅∥F, where \({\bf{M}}\in {\{0,1\}}^{| {{\mathcal{E}}}| \times n}\) masks the unmeasured fluxes, and \(\parallel {\bf{X}}{\parallel }_{2,1}{:}=\mathop{\sum }\nolimits_{i=1}^{n}{{{\parallel{\bf{X}}}_{:,i}}\parallel}_2\) denotes the column-wise mixed ℓ2,1 norm.
It is important to note that the use of ℓ2 regularization introduces a quadratic term in the objective function, converting the problem into a mixed integer quadratic program (MIQP), also supported in CORNETO.
Metabolic network inference from omics (iMAT)
The iMAT method27 is a FBA-based method used to reconstruct context-specific metabolic networks by integrating omics data with a genome-scale metabolic model. The method identifies a subset of reactions that are consistent with the omics data while ensuring that the resulting network satisfies the constraints imposed by FBA.
Given a metabolic network, and an omics-based classification of reactions into three categories:
-
\({C}_{{\mathrm{H}}}\subseteq \{1,\ldots ,| {{\mathcal{E}}}_{u}| \}\): high-confidence reactions, which are strongly supported by omics data.
-
\({C}_{{\mathrm{L}}}\subseteq \{1,\ldots ,| {{\mathcal{E}}}_{u}| \}\): low-confidence reactions, which are weakly supported by omics data.
-
\({C}_{{\mathrm{U}}}=\{1,\ldots ,| {{\mathcal{E}}}_{u}| \}\setminus ({C}_{{\mathrm{H}}}\cup {C}_{{\mathrm{L}}}).\): uncertain reactions, with ambiguous or no supporting data.
The iMAT method solves the following optimization problem:
$$\mathop{\max }\limits_{{\bf{y}}\in {\{0,1\}}^{n}}\sum _{j\in {C}_{{\mathrm{H}}}}{{\bf{y}}}_{j}-\sum _{j\in {C}_{{\mathrm{L}}}}{{\bf{y}}}_{j}.$$
Subject to the FBA constraints. The binary variable yj ∈ {0, 1} ensures that a reaction j can carry flux only if it is included in the network (that is, yj = 1).
In CORNETO, this method is implemented similarly to the sFBA problem, but using the non-zero indicator Y ∈ Ω≠0(X) instead of ΩFree. Omics data are mapped using the gene–protein reaction rules in the metabolic network, with a specific \(\phi ({\bf{D}},{\mathcal{H}})\) (Supplementary Information). The objective function is then
$$f({\bf{y}})={{\bf{d}}}_{h}^{\top }(1-{\bf{y}})+{{\bf{d}}}_{l}^{\top }{\bf{y}},$$
where \({{\bf{d}}}_{h}={{\bf{1}}}_{{C}_{{\mathrm{H}}}}\in {\{0,1\}}^{n}\) and \({{\bf{d}}}_{l}={{\bf{1}}}_{{C}_{{\mathrm{L}}}}\in {\{0,1\}}^{n}\), with 1S being the indicator function.
Sign-consistent intracellular signalling network (CARNIVAL)
We implemented the CARNIVAL method using the flow-based formulation for multi-sample signalling network inference from transcriptomics. To compare with the original Carnival R for single samples, we implemented a more flexible and efficient version of Carnival R (Supplementary Information).
Given a directed (signed) PKN \({\mathcal{G}}\), where \({\bf{s}}\in {\{-1,1\}}^{| {\mathcal{E}}| }\) is the vector of edge interaction signs (−1, inhibition; 1, activation) from the PKN, a matrix \({{\bf{D}}}^{| {\mathcal{V}}| \times n}\) with TF activities (where negative values indicate inhibition and positive values indicate activation), and \({\bf{P}}\in {\{-1,1\}}^{| {{\mathcal{E}}}_{u}\times n| }\) is a matrix indicating which vertices from the PKN are perturbed in each sample.
-
\(\phi ({\bf{D}},{\mathcal{G}})=(\{{v}^{{\prime} }\,| \,{v}^{{\prime} }=(v,{{\bf{d}}}_{v})\},\{{e}^{{\prime} }\,| \,{e}^{{\prime} }=(e,{\bf{0}})\})\).
-
\(\psi ({{\mathcal{G}}}^{{\prime} })=\left({{\mathcal{V}}}^{{\prime} },{{\mathcal{E}}}^{{\prime} }\cup \left\{({{\emptyset}},\{{v}_{i}\})\,| \,{{\bf{p}}}_{i}\ne 0\right\}\cup \left\{(\{{v}_{k}\},{{\emptyset}})\,| \,{{\bf{d}}}_{k}\ne 0\right\}\right)\), where:
-
– pi represents the ith entry of a given column j of P (that is, pi = Pi,j), which corresponds to the perturbation status of vertex vi in sample j (with pi ≠ 0 indicating that vertex vi is perturbed in this sample),
-
– dk represents the kth entry of the same column j of D (that is, dk = Dk,j), which corresponds to the TF activity of vertex vk in sample j (with dk ≠ 0 indicating that TF vk is active or inhibited in this sample).
-
-
ΩCarnival has the following constraints and variables:
$$\begin{array}{l}{\bf{Y}}\in {\varOmega }_{{\mathrm{Tree}}}({\bf{X}})\\ {{\bf{E}}}^{{\rm{act}}}+{{\bf{E}}}^{{\rm{inh}}}\le {\bf{Y}}\\ {{\bf{V}}}^{{\rm{act}}}+{{\bf{V}}}^{{\rm{inh}}}\le {\bf{1}}\\ {{\bf{E}}}^{{\rm{act}}}\le ({{\bf{H}}}^{\top }{{\bf{V}}}^{{\rm{act}}})\odot {{\bf{1}}}_{{\bf{s}}\ > \ 0}+({{\bf{H}}}^{\top }{{\bf{V}}}^{{\rm{inh}}})\odot {{\bf{1}}}_{{\bf{s}} < 0}\\ {{\bf{E}}}^{{\rm{inh}}}\le ({{\bf{H}}}^{\top }{{\bf{V}}}^{{\rm{act}}})\odot {{\bf{1}}}_{{\bf{s}} < 0}+({{\bf{H}}}^{\top }{{\bf{V}}}^{{\rm{inh}}})\odot {{\bf{1}}}_{{\bf{s}}\ > \ 0}\\ {{\bf{V}}}_{i,\,j}^{{\rm{act}}}-{{\bf{V}}}_{i,\,j}^{{\rm{inh}}}={{\bf{P}}}_{i,\,j}\quad \,\text{if}\,\quad {{\bf{P}}}_{i,\,j}\ne 0,\end{array}$$
where \({\bf{H}}\in {\{0,1\}}^{| {{\mathcal{V}}}_{u}| \times | {{\mathcal{E}}}_{u}| }\) is defined as
$${\bf{H}}=\max (-{\bf{A}},0)=\left\{\begin{array}{ll}1\quad &\,\text{if vertex}\,v\,\text{is the tail of edge}\,e,\\ 0\quad &\,\text{otherwise}\,.\end{array}\right.$$
-
\(f({\bf{v}})={\left({\bf{1}}-{\bf{v}}\odot \mathrm{sign}\,({\bf{d}})\right)}^{\top }| {\bf{d}}|\), where \({\bf{v}}={{\bf{V}}}_{:,\,j}^{{\rm{act}}}-{{\bf{V}}}_{:,\,j}^{{\rm{inh}}}\) and d = D:,j, for a given sample j.
