the standard “text-in, text-out” paradigm just takes you so far.
A real application that provides real value should be able to explore the reasons through visual, complex problems and produce results that the system can actually use.
In this post, we will design this stack by bringing together three powerful features. Multimodal input, inference, and Structured output.
To illustrate this, we will provide a practical example. Time-series abnormality detection system For e-commerce order data Openai's O3 model. Specifically, it shows how to pair O3's inference functionality with image input and emit validated JSON, making it easy to consume by downstream systems.
Finally, our app is:
- look: Analyze the time series chart of e-commerce order volumes
- think: Identify abnormal patterns
- Integrate: Output structured abnormality reports
It leaves behind a functional code that can be reused for a variety of use cases beyond mere anomaly detection.
Let's dive in.
Are you interested in learning more about the broader landscape than how LLM is applied to anomaly detection? Check out my previous posts: LLMS enhances anomaly detection, Here we summarized seven new application patterns Don't miss it.
1. Case studies
In this post, we aim to build an anomaly detection solution to identify anomaly patterns in e-commerce order time series data.
In this case study, we generated 3 sets Synthesis Daily order data. The dataset represents three different profiles of daily orders over approximately a month. We covered the weekend to reveal seasonality. The X-axis indicates the day of the week.



Each diagram contains one particular type of anomaly (Can you find them?). Later, we will use these numbers to test anomaly detection solutions and see if they can accurately recover from those anomalies.
2. Our Solution
2.1 Overview
Unlike traditional machine learning approaches that require boring functional engineering and model training, the current approach is much easier. The following steps work:
- Prepare diagrams to visualize e-commerce ordinal time series data.
- It prompts the inference model O3, and asks it to examine the time series images supplied to it to it to determine whether an unusual pattern exists.
- The O3 model outputs the results in a predefined JSON format.
And that's all. Simple.
Of course, to provide this solution, the O3 model must be able to acquire image input and emit a structured output. You will soon know how to do that.
2.2 Setting the inference model
As mentioned before, we will use the O3 model. This is OpenaI's flagship inference model, allowing you to tackle complex multi-step problems with cutting-edge performance. Specifically, you use an Azure Openai endpoint to invoke the model.
Make sure you place the endpoint, API key, and deployment name in .env You can then proceed to the LLM client setup.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from openai import AzureOpenAI
from dotenv import load_dotenv
import os
load_dotenv()
# Setup LLM client
endpoint = os.getenv("api_base")
api_key = os.getenv("o3_API_KEY")
api_version = "2025-04-01-preview"
model_name = "o3"
deployment = os.getenv("deployment_name")
LLM_client = AzureOpenAI(
api_key=api_key,
api_version=api_version,
azure_endpoint=endpoint
)
Use the following instructions as system messages for the O3 model (adjusted with GPT-5):
instruction = f"""
[Role]
You are a meticulous data analyst.
[Task]
You will be given a line chart image related to daily e-commerce orders.
Your task is to identify prominent anomalies in the data.
[Rules]
The anomaly kinds can be spike, drop, level_shift, or seasonal_outlier.
A level_shift is a sustained baseline change (≥ 5 consecutive days), not a single point.
A seasonal_outlier happens if a weekend/weekday behaves unlike peers in its category.
For example, weekend orders are usually lower than the weekdays'.
Read dates/values from axes; if you can’t read exactly, snap to the nearest tick and note uncertainty in explanation.
The weekends are shaded in the figure.
"""
In the instructions above, we clearly defined the role of the LLM, the tasks that LLM should complete, and the rules that LLM should follow.
To limit the complexity of the case study, we deliberately specified only four anomaly types that LLM must identify. We also provided a clear definition of these anomaly types to eliminate ambiguity.
Finally, I injected a bit of domain knowledge about e-commerce patterns. This means that we expect a drop in orders over the weekend compared to weekdays. Incorporating domain know-how is generally considered a good practice to guide the analysis process of the model.
Now that the model is set up, let's explain how to prepare an image for the O3 model to consume.
2.3 Image preparation
To enable the multimodal feature of O3, you must provide the figure as a specific format: a published web URL or a Base64 encoded data URL. Since numbers are generated locally, we use the second approach.
Anyway, what is base64 encoding? Base64 is a way to represent binary data (such as image files) using only text characters that can be sent safely over the Internet. Converts binary image data into a series of letters, numbers, and several symbols.
And what about the data URL? Data URLs are types of URLs that embed file content directly into the URL string, rather than pointing to the file location.
You can use the following functions to automatically handle this transformation:
import io
import base64
def fig_to_data_url(fig, fmt="png"):
"""
Converts a Matplotlib figure to a base64 data URL without saving to disk.
Args:
-----
fig (matplotlib.figure.Figure): The figure to convert.
fmt (str): The format of the image ("png", "jpeg", etc.)
Returns:
--------
str: The data URL representing the figure.
"""
buf = io.BytesIO()
fig.savefig(buf, format=fmt, bbox_inches="tight")
buf.seek(0)
base64_encoded_data = base64.b64encode(buf.read()).decode("utf-8")
mime_type = f"image/{fmt.lower()}"
return f"data:{mime_type};base64,{base64_encoded_data}"
Essentially, our function first stores the Matplotlib diagram in a memory buffer. Next, the binary PNG data is encoded as base64 text and wrapped in the desired data URL format.
Assuming you have access to synthetic daily order data, you can use the following function to generate plots and convert them to the appropriate data URL format at once:
def create_fig(df):
"""
Create a Matplotlib figure and convert it to a base64 data URL.
Weekends (Sat–Sun) are shaded.
Args:
-----
df: dataframe contains one profile of daily order time series.
dataframe has "date" and "orders" columns.
Returns:
--------
image_url: The data URL representing the figure.
"""
df = df.copy()
df['date'] = pd.to_datetime(df['date'])
fig, ax = plt.subplots(figsize=(8, 4.5))
ax.plot(df["date"], df["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Daily Orders', fontsize=14)
# Weekend shading
start = df["date"].min().normalize()
end = df["date"].max().normalize()
cur = start
while cur <= end:
if cur.weekday() == 5: # Saturday 00:00
span_start = cur # Sat 00:00
span_end = cur + pd.Timedelta(days=1) # Mon 00:00
ax.axvspan(span_start, span_end, alpha=0.12, zorder=0)
cur += pd.Timedelta(days=2) # skip Sunday
else:
cur += pd.Timedelta(days=1)
# Title
title = f'Daily Orders: {df["date"].min():%b %d, %Y} - {df["date"].max():%b %d, %Y}'
ax.set_title(title, fontsize=16)
# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
plt.tight_layout()
# Obtain url
image_url = fig_to_data_url(fig)
return image_url
Figures 1-3 are generated by the plot routines above.
2.4 Structured Output
In this section, we will explain how to ensure that the O3 model outputs a consistent JSON format instead of freeform text. This is known as “structured output” and is one of the key enablers for integrating LLM into existing automated workflows.
To achieve this, we start by defining a schema that manages the expected output structure. Use the Pydantic model.
from pydantic import BaseModel, Field
from typing import Literal
from datetime import date
AnomalyKind = Literal["spike", "drop", "level_shift", "seasonal_outlier"]
class DateWindow(BaseModel):
start: date = Field(description="Earliest plausible date the anomaly begins (ISO YYYY-MM-DD)")
end: date = Field(description="Latest plausible date the anomaly ends, inclusive (ISO YYYY-MM-DD)")
class AnomalyReport(BaseModel):
when: DateWindow = Field(
description=(
"Minimal window that contains the anomaly. "
"For single-point anomalies, use the interval that covers reading uncertainty, if the tick labels are unclear"
)
)
y: int = Field(description="Approx value at the anomaly’s most representative day (peak/lowest), rounded")
kind: AnomalyKind = Field(description="The type of the anomaly")
why: str = Field(description="One-sentence reason for why this window is unusual")
date_confidence: Literal["low","medium","high"] = Field(
default="medium", description="Confidence that the window localization is correct"
)
Our Pydantic schema attempts to capture both quantitative and qualitative aspects of the detected anomalies. For each field, specify its data type (for example: int For numbers, Literal (for example, a fixed set of options).
I'll also use it Field Ability to provide a detailed description of each key. These explanations are particularly important as they work effectively as inline instructions for O3, and therefore understand the semantic meaning of each component.
This covers multimodal inputs and structured outputs. This is the time to put them together in one LLM call.
2.5 Calling the O3 model
Use to interact with O3 using multimodal inputs and structured outputs LLM_client.beta.chat.completions.parse() API. Important arguments include:
model:Expand name.messages: Message object sent to the O3 model.max_completion_token: The maximum number of tokens the model can generate in the final response. Note that for inference models like O3, they internally generate Reasoning_Tokens to “think” the problem. the currentmax_completion_tokenLimits only visible output tokens that users receive.response_format: A Pydantic model that defines the expected JSON schema structure.reasoning_effort: A control knob that indicates the amount of calculation effort that O3 uses for inference. Available options include low, medium and high.
You can define helper functions to interact with the O3 model.
def anomaly_detection(instruction, fig_path,
response_format, prompt=None,
deployment="o3", reasoning_effort="high"):
# Compose messages
messages=[
{ "role": "system", "content": instruction},
{ "role": "user", "content": [
{
"type": "image_url",
"image_url": {
"url": fig_path,
"detail": "high"
}
},
]}
]
# Add prompt if it is given
if prompt is not None:
messages[1]["content"].append({"type": "text", "text": prompt})
# Invoke LLM API
response = LLM_client.beta.chat.completions.parse(
model=deployment,
messages=messages,
max_completion_tokens=4000,
reasoning_effort=reasoning_effort,
response_format=response_format
)
return response.choices[0].message.parsed.model_dump()
Please be careful about messages The object accepts both text and image content. Text prompts are optional as they only use numbers to prompt the model.
Set "detail": "high" To enable high resolution image processing. This is necessary in current case studies, as O3 is required to better read fine details such as Axis Tick labels, data point values, and subtle visual patterns. However, note that high-determined processing results in increased tokens and higher API costs.
Finally, use .parsed.model_dump()converts the JSON output to a regular Python dictionary.
That's for implementation. Let's look at some of the results next.
3. result
In this section, you will be asked to enter previously generated numbers into the O3 model and identify potential anomalies.
3.1 Spike abnormality
# df_spike_anomaly is the dataframe of the first set of synthetic data (Figure 1)
spike_anomaly_url = create_fig(df_spike_anomaly)
# Anomaly detection
result = anomaly_detection(instruction,
spike_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(result)
With the above call, spike_anomaly_url This is the data URL in Figure 1. The resulting output is shown below.
{
'when': {'start': datetime.date(2025, 8, 19), 'end': datetime.date(2025, 8, 21)},
'y': 166,
'kind': 'spike',
'why': 'Single day orders jump to ~166, far above adjacent days that sit near 120–130.',
'date_confidence': 'medium'
}
You can see that the O3 model returns the output accurately in the designed format. Now you can grab this result and generate a visualization programmatically.
# Create image
fig, ax = plt.subplots(figsize=(8, 4.5))
df_spike_anomaly['date'] = pd.to_datetime(df_spike_anomaly['date'])
ax.plot(df_spike_anomaly["date"], df_spike_anomaly["orders"], linewidth=2)
ax.set_xlabel('Date', fontsize=14)
ax.set_ylabel('Daily Orders', fontsize=14)
# Format x-axis dates
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
ax.xaxis.set_major_locator(mdates.WeekdayLocator(interval=1))
# Add anomaly overlay
start_date = pd.to_datetime(result['when']['start'])
end_date = pd.to_datetime(result['when']['end'])
# Add shaded region
ax.axvspan(start_date, end_date, alpha=0.3, color='red', label=f"Anomaly ({result['kind']})")
# Add text annotation
mid_date = start_date + (end_date - start_date) / 2 # Middle of anomaly window
ax.annotate(
result['why'],
xy=(mid_date, result['y']),
xytext=(10, 20), # Offset from the point
textcoords='offset points',
bbox=dict(boxstyle='round,pad=0.5', fc='yellow', alpha=0.7),
arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.1'),
fontsize=10,
wrap=True
)
# Add legend
ax.legend()
plt.xticks(rotation=0)
plt.tight_layout()
The generated visualization looks like this:

It can be seen that the O3 model correctly identified the spike anomaly presented in this first synthetic dataset.
It's not bad. In particular, consider the fact that it only encouraged LLM and did not perform traditional model training.
3.2 Level shift abnormality
# df_level_shift_anomaly is the dataframe of the 2nd set of synthetic data (Figure 2)
level_shift_anomaly_url = create_fig(df_level_shift_anomaly)
# Anomaly detection
result = anomaly_detection(instruction,
level_shift_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(result)
The resulting output is shown below.
{
'when': {'start': datetime.date(2025, 8, 26), 'end': datetime.date(2025, 9, 2)},
'y': 150,
'kind': 'level_shift',
'why': 'Orders suddenly jump from the 120-135 range to ~150 on Aug 26 and remain elevated for all subsequent days, indicating a sustained baseline change.',
'date_confidence': 'high'
}
Again, we can see that the model has accurately identified the presence of an anomaly of “level_shift” in the plot.

3.3 Seasonal abnormalities
# df_seasonality_anomaly is the dataframe of the 3rd set of synthetic data (Figure 3)
seasonality_anomaly_url = create_fig(df_seasonality_anomaly)
# Anomaly detection
result = anomaly_detection(instruction,
seasonality_anomaly_url,
response_format=AnomalyReport,
reasoning_effort="medium")
print(result)
The resulting output is shown below.
{
'when': {'start': datetime.date(2025, 8, 23), 'end': datetime.date(2025, 8, 24)},
'y': 132,
'kind': 'seasonal_outlier',
'why': 'Weekend of Aug 23-24 shows order volumes (~130+) on par with surrounding weekdays, whereas other weekends consistently drop to ~115, making it an out-of-season spike.',
'date_confidence': 'high'
}
This is a challenging case. Nonetheless, our O3 model was able to properly tackle it with precise localization and clear inference traces. Pretty impressive:

4. overview
Congratulations! We have successfully built an anomaly detection solution for time series data that is fully functional with visualization and prompts.
By feeding daily order plots to the O3 inference model and constraining its output to the JSON schema, LLM was able to identify three different anomaly types with precise localization. All this was achieved without training the ML model. Impressive!
A step backwards, we see that the solution we built exhibits a broader pattern of combining three features.
- look: Multimodal inputs allow the model to consume numbers directly.
- think: A step-by-step inference function for tackling complex problems.
- Integrate: Structured output (eg, generating visualizations) that can be easily consumed by downstream systems.
The combination of multimodal input + inference + structured output creates a versatile foundation for practically useful LLM applications.
Now your building blocks are ready. What do you want to make next?
