MIT researchers teach AI model to interpret charts | Massachusetts Institute of Technology News

To accelerate and refine decision-making in fast-paced global markets, companies may deploy generative artificial intelligence models to help summarize and interpret charts commonly used in market summaries and financial reports.

However, even modern visual language models can struggle with this task, as it requires a model that integrates visual, numerical, and linguistic understanding. Even companies that have invested in cutting-edge models may receive inaccurate or incomplete information.

To close this performance gap, researchers at MIT and the MIT-IBM Computing Institute have developed a multifaceted resource for AI users specifically designed to teach vision language models (VLMs) how to effectively interpret charts.

They used novel data generation methods to build a state-of-the-art dataset containing over 1 million different graphs. This dataset also encodes many visual, linguistic, and numerical components of each chart image, allowing the model to reliably reason about the information in the chart.

The researchers used this dataset, called ChartNet, to train a set of open-source VLMs. Many of these small-scale models performed significantly better than larger commercial models, which are orders of magnitude larger, on tasks such as data extraction and graph summarization.

By enabling open source models to outperform commercial models, ChartNet has the potential to make it easier for small and medium-sized businesses with limited budgets to take advantage of AI. Open source datasets can improve the capabilities of AI models for tasks such as analyzing business trends and interpreting scientific numbers.

“We developed ChartNet to be a one-stop shop for understanding charts, essentially covering everything needed by AI models and the practitioners who train them. We hope our work will motivate researchers to achieve state-of-the-art performance with small models that don’t require endless computations,” said Giovana, a graduate student in MIT’s Department of Electrical Engineering and Computer Science (EECS) and lead author of a paper on ChartNet. Kondic said.

Her paper has numerous co-authors from MIT, MIT-IBM Computing Research Lab, and IBM Research. Among them is Pengyuan Li, a research staff member at IBM Research. Dhiraj Joshi, senior scientist at IBM Research. Isaac Sanchez, Software Engineer at IBM Research. Ord Oliva, Director of Strategic Industry Engagement at the MIT Schwarzman College of Computing, Director of the MIT-IBM Computing Institute, and Senior Research Fellow at the Computer Science and Artificial Intelligence Laboratory (CSAIL). Rogelio Feliz is principal scientist and manager of the MIT-IBM Computing Institute. This research will be presented at the IEEE Computer Vision and Pattern Recognition Conference.

Dataset bottleneck

Researchers have made great strides in developing generative AI models that excel at natural language processing and reasoning about natural images. However, Kondic says there is less research focused on interpreting the complex multimodal data contained within graphs.

But for large and small businesses in almost every industry, understanding charts is an important task.

“The financial industry thrives on charts, and if vision language models can extract information from charts, such as trend descriptions, it facilitates many downstream workflows,” says Joshi.

The lack of high-quality training data is a major bottleneck hindering the development of VLMs that can accurately interpret charts. Many datasets contain limited chart images taken from the internet, often lacking the scale or additional information needed for models to interpret the underlying data.

“Visual language models, unlike our brains, may need to see thousands of examples during training to reliably recognize something as a line graph,” Kondic says.

Researchers sought to overcome these shortcomings by generating synthetic data. Synthetic data is artificially generated by algorithms that mimic the statistical properties of real data.

The ChartNet dataset maintains over 1 million high-quality chart images and tables containing the corresponding code, textual description, and numerical information used to generate each chart. Additionally, each data point contains a question-answer pair to teach the model how to correctly answer questions about the chart image.

“These additional data modes guide the model to connect and reconcile the different information that the chart image encodes,” Kondic says.

data generation

To build ChartNet, the researchers created a two-stage synthetic data generation pipeline.

First, an automated system converts an existing set of chart images into code. The system then iteratively extends that code to change various aspects of each chart, such as chart type, data values, topics, and colors.

“You can start with one graph as a seed and come up with hundreds of extensions to it. In this way, we were able to build a dataset containing over a million diverse images,” Kondic explains.

It also includes an automated quality checking process to ensure that the synthetic data is of high quality. This process verifies that the code is executable and that the rendered chart image is accurate and clean.

“We don’t just want to generate a diverse sample, we also want the information to be presented in a meaningful way,” she says.

ChartNet also includes a set of chart data points annotated by human experts. This provides access to additional chart types and validated supporting data.

Joshi adds that practitioners can use annotated data to fine-tune existing VLMs to further improve performance for specific applications..

The researchers tested ChartNet by training models from IBM’s Granite Vision series and several other open source models of various sizes and evaluating them on a variety of chart interpretation tasks. This dataset improved the accuracy of all models in chart reconstruction, chart data extraction, chart summarization, and chart question answering.

On ChartNet, the small open source model consistently outperformed the much larger commercial model.

“Many previous training datasets focused solely on answering simple questions about charts. With ChartNet, we sought to go beyond that and generate data that supports all aspects of robust chart understanding,” says Kondic.

The researchers plan to continue extending ChartNet by incorporating more complex levels of data in the future. They also hope to leverage feedback from the research community.

This research was partially funded by the MIT-IBM Computing Research Lab.

Source link