Data2Story uses 7 AI agents to turn CSV files into verified, interactive news articles

A three-part diagram showing how a data journalist agent transforms a CSV dataset of card selections into a multimodal website containing text, interactive demos, and graphs through research, data analysis, and narrative storytelling. — Data2Story transforms raw datasets into verifiable, multimodal web articles. Here, we present a dataset of card choices for 1,354 respondents. |Image: Lin et al.

The authors demonstrate their system using a dataset that has received little coverage so far: the 2026 FIFA World Cup schedule. From the schedule and host city, we generate climate-focused articles with interactive maps.

Roughly four out of 10 games are scheduled in locations classified by players’ association FIFPRO as having a very high risk of heat, with humidity rather than temperature being the main factor. The authors emphasize that these are typical weather conditions and are not predictions for an actual tournament.

Six screenshots of three automatically generated data stories covering the 2026 FIFA World Cup and climate, ArXiv posts from 1991 to 2026, and a time usage diary. Each includes a title image and matching data visualization. — Data2Story generates stories from datasets, from World Cup stadium climate to ArXiv trends to how people spend their day, without any human input. |Image: Lin et al.

“Inspector” panel makes all claims traceable

The core feature of the system is the Inspector, a panel that displays structured evidence for each sentence and asset. Every annotated sentence, graph, and interactive element has its own index card that displays the exact line of code (and the data file behind it) or an external URL to support your claim.

Screenshot of generated article about Trump. Contains statements linked by arrows to two types of evidence: an externally referenced article and a Python script that reproduces the stated value of 20.1 percent. — The inspector links each statement to an external source or executable script that recalculates the numbers from the data. |Image: Lin et al.

This checks 93% of all displayed statements for their origin. The researchers stress that that doesn’t mean they’re correct, just that they can be verified. Do you doubt the numbers? Run the code. The baseline for articles written by humans is 25%, in part because journalists rarely publish their analytics code. The researchers argue that this gap reflects both holes in journalistic practice and strengths in the system.

7 agents, 1 editing workflow

Behind each story is a chain of seven expert agents, which the team calls a “virtual newsroom.” Tables rarely tell the whole story, so “detectives” perform web searches for context. For World Cup data, we link the host city to FIFPRO’s heat risk assessment and Open-Meteo’s climate data.

“Analysts” run code instead of guessing numbers. “Editors” choose which findings drive the story. The “designer” selects the appropriate medium, such as a geographic map or a musical audio clip. The “programmer” builds the HTML pages, the “auditor” checks the layout for errors, and the “inspector” ties everything back to the source.

Pipeline diagrams created by Lauren's detectives, analysts, editors, designers, programmers, and auditors collect the latest information about HTML and monitor everything for inspectors. — Each agent role in Data2Story’s virtual newsroom handles one step, from research to layout. The inspector links all statements to their source. |Image: Lin et al[[[[

The base model is Claude Opus 4.7 running on Claude Code. For images, video, and audio, the system ingests OpenRouter models such as gpt-5.4-image-2, seedance-2.0, and lyria-3-pro-preview.

53 readers rate agent articles higher than human original articles

The researchers combined 18 public datasets with matching original human-written datasets from three different sources. They used concise briefings from The Economist, lavishly designed long-form articles from The Pudding, and community datasets from Tidy Tuesday. Fifty-three recruited readers rated both versions across five categories, including visual design, narrative rhythm, data transparency, verifiability of claims, and insights gained.

Data2Story won all five categories. The biggest lead was transparency, at +1.49 on a 7-point scale. Overall, 74 percent preferred the agent article, 25 percent preferred the human article, and 2 percent said it was a tie.

The image changes depending on the information source. Agents clearly won in data-heavy areas economist Briefing session and neat tuesday pieces. There was a statistical tie for the Pudding report, which often takes design teams weeks to create. Agents couldn’t beat a handcrafted presentation.

Bar chart comparing agents and humans across 18 article pairs. The agent writes more short sentences (82.2 vs. 56.6 sentences, 16.0 vs. 20.9 words per sentence), covering 50.4 percent of the human perspective versus 35.1 percent for the opposite. — Across 18 article pairs, Data2Story covers about half of the human perspectives, but only one-third captures the agent’s perspective by journalists, most notably economists. |Image: Lin et al.

When measuring which descriptions of articles written by humans also appear in articles written by agents, Data2Story covers about half. Conversely, only 35% of the agent’s utterances are contained in the human text.

Agents add many unique perspectives, but only partially capture the core of the editorial. The gap is largest for brief, boilerplate economist briefings, where agents replicate 73 percent of human findings. This is probably because these texts are very close to the standard statistics that the agent calculates.

A place where humans can still win

The researchers flag three areas where human authors will continue to lead the way. From an editor’s perspective, reporters explain what data cannot do. Repair Cafe’s report found that repair rates are low because phone, car and tractor manufacturers intentionally block access to diagnostic tools and parts. This is a theory based on reports, not data. The agent indicates what will break, but the “why” remains hidden.

Comparison of two article versions about Repair Cafes. The human report above includes descriptive text about repair entitlements, and the agent version below displays a bar chart of repair costs broken down by the top 20 product types. — A human report explains why the repair fails. Data2Story only graphs repair rates by product type. |Image: Lin et al.

In terms of creative design, Pring’s work on stand-up comedy turns the complete transcript of Ali Wong’s show into a user interface. Next to each line is a circle the same size as the length of the laugh. For the same content, agents can simply embed a static YouTube thumbnail.

Comparison of two article versions on stand-up shows. The human Pudding report above uses the full transcript as the user interface, and the agent version below shows a static Netflix thumbnail and play button. — The Pudding team converts the entire transcript into an interface. Data2Story has embedded clickable thumbnails. |Image: Lin et al.

In a single, dense graphic, The Economist’s visualization of the space race overlays government and private providers, success rates, and annotations into a single image. The point is lost because the agent is distributing the same data across multiple graphs.

Comparison of two space race visualizations. The Economist's densely annotated diagram above shows government and private launch providers in a single view. The interactive agent version below uses a year slider and unannotated bare launch counts. — The Economist combines government and commercial announcements and commentary into one graphic. Data2Story spreads data across interactive views without notes. |Image: Lin et al.

Collaborator, not replacement

The authors frame Data2Story as a newsroom tool. Humans provide perspective and reporting, and agents handle calculations, graphics, and machine-verifiable procurement.

This may prove most useful for topics that newsrooms cannot cover due to lack of capacity, or niche datasets that would otherwise never result in readable articles. One limitation is that Data2Story currently runs on full autopilot. A version with human feedback is left for future work. The site is published at data2story.github.io and the code is on GitHub.

Machine verifiability is exactly where current AI systems continue to stumble. A recent Peking University benchmark found that leading models often gave the correct answer in document analysis but cited the wrong sources, a problem researchers called the “attribution illusion.”

Other research suggests that AI search agents do little research, mostly confirming what they already know from training. Data2Story attempts to close this gap by having analysts calculate numbers using executable code instead of guessing, and by having inspectors link every statement to the source. Perplexity follows a similar philosophy to “search as code,” where models write their own web searches instead of calling black-box APIs.

Source link