Structured output with LLM: JSON modes, function calls, and when to use each

Machine Learning


We’ve talked about many common techniques for optimizing the performance and cost of AI applications, such as response streaming and prompt caching. Today, I want to talk about something a little different, but equally important when building real-world AI apps. In other words, structured machine-readable output.

Most of the examples I’ve shared so far have involved free text responses from AI models. When the user asks a question, the model responds in natural language and simply displays that response to the user in some way. It’s pretty simple and easy. But what if your model needs to return data in a specific format (such as a JSON object) so that it can be further processed programmatically later? What if you need a model that extracts specific fields from text or images, or populates a database entry, or triggers subsequent actions based on the response? In such cases, getting large amounts of text back is not very convenient. 🤔

Fortunately, there are multiple solutions to this problem. There are two main approaches to obtaining structured, machine-readable output from LLM. JSON mode and function call (Also called using tools). These two are often confused (understandably so, since they both handle structured output), but their purposes are quite different. In addition to this, OpenAI introduced a stricter variant of function calls called . structured outputThis takes schema application one step further, as we will see later. In this post, we’ll take a closer look at all three to understand how each works under the hood and when to use each.

So let’s take a look!


1. What is JSON mode?

JSON mode is a simpler approach to achieving machine-readable output from LLM. This is basically a parameter that you can set on your API request to tell your model: everytime Returns a valid JSON object. That’s all you need. Nevertheless, this simplicity comes at a price, as there is no guarantee that the JSON structure or schema (remember, we don’t define schemas, field names, types, etc.) is valid, parsable JSON.

For example, using OpenAI’s API in Python, you can add parameters to enable JSON mode. response_format={"type": "json_object"} to the call to the model. More specifically:

from openai import OpenAI

client = OpenAI(api_key="your_api_key")

response = client.chat.completions.create(
    model="gpt-4o-mini",
    response_format={"type": "json_object"},
    messages=[
        {
            "role": "system",
            "content": "You are a helpful assistant. Always respond in JSON format."
        },
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

print(response.choices[0].message.content)

And the response looks like this:

{
  "name": "Maria",
  "age": 32,
  "city": "Athens"
}

And voila! ✨ Change a simple parameter once and return valid JSON every time. No string parsing or weird regex hacks required.

However, there is a catch. JSON mode guarantees that the output is: valid JSONbut it is do not have Guarantee specific structure. If you run the same example multiple times, you may end up with slightly different field names or a slightly different structure each time. For example, a single run might return the following results: "name" and one more "full_name". This is a problem if you are trying to reliably extract certain fields programmatically.

Another thing is that beyond the settings response_format={"type": "json_object"}we also recommend that you always explicitly tell your model to respond with JSON at system prompts. Notice in the example above that we also added: “Always respond in JSON format” at the system prompt. If you don’t do this, the model’s behavior can be unpredictable, so the model may return valid JSON, but not always.


2. What is a function call?

Function calls (or using tools) are a more advanced approach to obtaining structured, machine-readable output from LLM. Rather than simply asking the model to format the response as JSON, schema. That is, we explicitly define a formal description of the structure that follows the output. In this way, the model is more constrained to return data that exactly matches its schema. In other words, the function call predefines what fields are expected, what types those fields should be, which are required, which are not, etc.

Here’s how the same extraction example looks using a function call.

from openai import OpenAI
import json

client = OpenAI(api_key="your_api_key")

# define the schema of the output we expect
tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "description": "Extract personal information from a text",
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {
                        "type": "string",
                        "description": "The full name of the person"
                    },
                    "age": {
                        "type": "integer",
                        "description": "The age of the person"
                    },
                    "city": {
                        "type": "string",
                        "description": "The city the person lives in"
                    }
                },
                "required": ["name", "age", "city"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "extract_person_info"}},
    messages=[
        {
            "role": "user",
            "content": "Extract the name, age, and city from this text: 'Maria is 32 years old and lives in Athens.'"
        }
    ]
)

# parse the structured output
tool_call = response.choices[0].message.tool_calls[0]
result = json.loads(tool_call.function.arguments)
print(result)

The output should look like this:

{
  "name": "Maria",
  "age": 32,
  "city": "Athens"
}

The output of this example using function calls is the same as that obtained using JSON mode. Nevertheless, the important difference is that the output is consistent in function calls, unlike in JSON mode. it will be so everytime Follow a precisely defined schema and use consistent field names, types, and other attributes you define on it.


🍨 data cream A newsletter with stories and tutorials about AI, data, and technology. If you are interested in these topics, Subscribe here!


Bonus: A little more information about function calls

Before moving on to structured output, it’s worth pausing and explaining a little more about the original motivation and usage behind function calls. This is more than just getting structured output. Fundamentally, the concept of function calls is the foundation of agent AI workflows. More specifically, in agent settings, LLM looks like this: Don’t just react Regarding the user’s question, it is rather that decide What action should you take next based on the user’s input?

For example, imagine a customer support assistant that can search for an order, issue a refund, or escalate to a human agent in response to a user’s question. Using function calls, all three of these candidate actions can be defined as “tools” (functions), and the output of the model defines which action to call based on the input and what arguments to use.

tools = [
    {
        "type": "function",
        "function": {
            "name": "lookup_order",
            "description": "Look up the status of a customer order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string", "description": "The order ID"}
                },
                "required": ["order_id"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "issue_refund",
            "description": "Issue a refund for a customer order",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string"},
                    "reason": {"type": "string"}
                },
                "required": ["order_id", "reason"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o-mini",
    tools=tools,
    messages=[
        {"role": "user", "content": "I want a refund for order #12345, it arrived broken."}
    ]
)

tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)       # "issue_refund"
print(tool_call.function.arguments)  # '{"order_id": "12345", "reason": "arrived broken"}'

So the API response object looks like this:

ChatCompletionMessage(
    content=None,
    role='assistant',
    tool_calls=[
        ChatCompletionMessageToolCall(
            id='call_abc123',
            type='function',
            function=Function(
                name='issue_refund',
                arguments='{"order_id": "12345", "reason": "arrived broken"}'
            )
        )
    ]
)

The print statement then outputs something like this:

issue_refund
{"order_id": "12345", "reason": "arrived broken"}

So what’s going on here? The model returns: tool_calls Use objects instead of regular text responses (see how)content teeth None). internal tool_calls For objects, you can see what the model has decided to call. issue_refund (do not have lookup_order), enter your own arguments based on what the user says. It then parses those arguments and executes the actual refund logic in the system.

Not only does the model return the requested data; I decided Check which of the suggested actions is most appropriate to perform, and enter the appropriate arguments in the response. This way you can take these arguments and actually perform the corresponding actions in your system. This is the true power of function calls, and why they are a fundamental component of agentic AI applications.

But let’s get back to machine-readable output and talk more about agent AI workflows and function calls in another post.


3. What about structured output?

A more precise variation of function call is structured output. Even though the function call causes the model to provide output according to the defined schema, this is not the case. Really severely restricted. In practice, this means that deviations from this defined schema can still occur. Such deviations include:

  • A field marked as required, but actually omitted if the model has trouble understanding its value.
  • Additional fields not defined in the schema are added
  • field defined as integer returned as a string "32" instead of 32

…and so on.

This means that the model is I’m trying Although it follows the schema, this is still a best-effort generation. Like other LLM outputs, the output here is essentially one predicted token at a time, and the schema is just a strong hint. It’s still quite possible that the per-token generation gets derailed somewhere along the route and produces output that deviates from the defined schema.


Structured output, on the other hand, takes function calls a step further by ensuring that all fields in the defined schema always appear in the output as defined. There are no missing fields or extra fields. The key differentiators are that OpenAI uses: Constrained decoding Behind the scenes. This means that at each token step, the model is only allowed to generate tokens that keep the output valid according to the schema. This means that the schema is not only requested through system prompts, but is also applied at the generation level.

Enabling OpenAI structured output is as simple as configuring it. strict: true In the function definition:

tools = [
    {
        "type": "function",
        "function": {
            "name": "extract_person_info",
            "strict": True,  # enables Structured Outputs
            "parameters": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"},
                    "city": {"type": "string"}
                },
                "required": ["name", "age", "city"],
                "additionalProperties": False
            }
        }
    }
]

But again, this comes at a price. Structured output is available on GPT-4o and later models, but older models revert to JSON mode. Not all JSON structures are supported, and OpenAI preprocesses the results, which can be a little slow.

Nevertheless, this is the strictest and safest way to irrevocably enforce a particular schema on the model’s output. For production systems where reliability and consistency are really important, this is generally the safest option.


But aren’t these all the same thing?

JSON mode, function call, and structured output may appear to do the same thing, since they all essentially retrieve JSON from the model. Nevertheless, as we have already seen, they differ greatly in what they guarantee and in their design. especially:

  • Schema enforcement: JSON mode returns valid JSON, but there are no structural guarantees. The function call returns valid JSON that matches the defined schema according to specific field names, types, and required fields, but deviations are still possible. Structured Outputs goes a step further and enforces that schema at the production level, making deviations impossible.
  • Use case: Use JSON mode when you want a machine-readable response but can use a variable format. Function calls are primarily designed for cases where a model needs to trigger an action or pass arguments to an external tool, and are essentially the general case for machine-readable output. Structured output is a function call with guaranteed reliability, making it ideal for production pipelines that require consistent output.
  • Ease of setup: JSON mode is the easiest option to configure. Just change one parameter without schema definition. Conversely, you also need to think about and configure JSON Schema for function calls and structured output.

That said, OpenAI itself recommends using structured output instead of JSON mode whenever possible as a general rule of thumb.


in my heart

Obtaining machine-readable output from LLM and choosing the right approach to do so can make a big difference in the reliability and maintainability of your AI applications. Free-text responses are great for conversational interfaces, but when an LLM becomes a component of a larger system (feeding data downstream, triggering actions, populating a database, etc.), structured responses become essential. JSON mode, function calls, and structured output can each provide such output at different levels of rigor. As with many decisions in AI engineering, the right choice depends on what you’re building and how much variation you can tolerate.


If you’ve made it this far, Piergorism may help — The platform we’ve built to help teams securely manage their organization’s knowledge in one place.


Like this post?Join us 💌substack And💼linkedin


All images are by the author unless otherwise noted.



Source link