Zero-shot basic model for tabular data

Tabular data forms the backbone of enterprise data infrastructure and powers a significant portion of critical predictive machine learning applications. From predicting customer churn to identifying financial fraud, tabular regression and classification tasks are ubiquitous. For many years, supervised tree-based algorithms such as AdaBoost, XGBoost, and Random Forest, to name a few, have dominated the field and have provided robust performance on structured data.

However, significant bottlenecks exist in the lifecycle of implementing these traditional models. Adapting an XGBoost model to a new dataset is not just a problem. . fit() Step; always requires tedious manual work. Data scientists must invest countless hours in extensive hyperparameter optimization and domain-specific feature engineering just to extract reliable signals from raw data.

However, recent advances in the broader field of machine learning, particularly advances in large-scale language models (LLMs), have changed the way we interact with new tasks. LLM has demonstrated remarkable ability for zero-shot prediction through in-context learning (ICL). This technique allows a pre-trained model to learn new tasks by providing examples and instructions in the input context without updating the weights of the underlying model.

Today I’m introducing TabFM, a foundational model specifically designed for classification and regression on tabular data. TabFM eliminates the need for manual model training, hyperparameter tuning, and complex feature engineering by structuring tabular predictions as ICL problems. We are happy to share that this approach allows users to generate high-quality predictions for previously unseen tables in a single forward pass. TabFM is now available on Hugging Face and GitHub repositories.

Source link