Graph foundation model for relational data

Machine Learning


Relational databases make up the main majority of the enterprise data format, providing many prediction services across Google, as well as other services people use daily, such as content recommendations and traffic forecasting. Most trivial applications employ multiple tables – in fact, some elaborate applications at Google may need to maintain hundreds of tables – and extracting practical values ​​from such a network of tables is rather trivial. Traditional tabular machine learning (ML) methods (such as decision trees) struggle to fully utilize the connection structure of these relational schemas.

Meanwhile, recent advances in ML provide a set of tools for building graph neural networks (GNNs) tailored to graph structure data. Here we frame industry-related tasks as node classification (or regression) or graph-level predictions. However, most GNNs are fixed to the specific graph on which the model is being trained and cannot generalize to new graphs with new nodes, edge types, features, and node labels. For example, models trained with large 100m node citation graph benchmarks cannot be reused for their own graphs (transactions between users and products) due to their large differences in functionality and label space, so they must retrain the same model from scratch with their own data. Although some initial attempts have demonstrated the feasibility of the concept of specific link prediction and node classification tasks, there is yet no generalist model that can learn meaningful representations across relational data and tackle all node, link, and graph-level prediction tasks.

Today we explore the possibility of designing a single model that excels with interconnected relational tables and can be generalized to any set of tables, functions, and tasks at the same time without additional training. We look forward to sharing recent progress on the development of such graph foundation models (GFMs) that push graph learning and tabular ML well beyond standard baselines.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *