Introducing structured output for custom model imports in Amazon Bedrock

Machine Learning


Amazon Bedrock Custom Model Import allows you to deploy and scale fine-tuned foundational models or your own foundational models in a fully managed serverless environment. You can bring your own models to Amazon Bedrock, safely scale them without managing any infrastructure, and integrate them with other Amazon Bedrock features.

Today, we’re excited to announce that we’ve added structured output to custom model imports. Structured output constrains the model generation process in real time, ensuring that all tokens produced by the model conform to the schema you define. Rather than relying on prompt engineering tricks or fragile post-processing scripts, you can now generate structured output directly at inference time.

For certain production applications, predictability of model output is more important than creative flexibility. While customer service chatbots can benefit from diverse and natural responses, order processing systems require precise structured data that adheres to a predefined schema. Structured output bridges this gap by preserving the intelligence of the underlying model while validating that the output meets strict formatting requirements.

This represents a shift from free-form text generation to output that is consistent, machine-readable, and designed to integrate seamlessly with enterprise systems. Free-form text is great for human consumption, but production applications require greater precision. Businesses cannot tolerate ambiguity due to natural language variation when systems rely on structured output to reliably connect with APIs, databases, and automated workflows.

In this post, you will learn how to implement structured output for custom model imports in Amazon Bedrock. Learn what structured output is, how to enable it in API calls, and how to apply structured output to real-world scenarios that require structured and predictable output.

Understand structured output

Structured output (also known as constrained decoding) is a way to direct LLM output to conform to a predefined schema, such as valid JSON. Rather than allowing the model to freely choose tokens based on a probability distribution, we introduce constraints that limit the choices to only those that maintain structural validity during generation. If a particular token violates the schema by generating invalid JSON, inserting stray characters, or using unexpected field names, structured output should reject it and the model should choose another allowed option. This real-time validation ensures that the final output is consistent, machine-readable, and ready for use by downstream applications without the need for additional post-processing.

Without structured output, developers often end up withResponds with JSON only.“While this approach sometimes works, it remains unreliable due to the inherently probabilistic nature of LLM. These models generate text by sampling from a probability distribution, introducing natural variability to humanize the response, but create significant challenges for automated systems.”

Consider a customer support application that categorizes tickets. The response is “It seems like it’s a billing issue.“”We classify this as follows. Claim,” and “Category = Billing;“Downstream code cannot reliably interpret the results. Instead, what production systems need is predictable, structured output. For example:

{
  "category": "billing",
  "priority": "high",
  "sentiment": "negative"
}

Such responses allow applications to automatically route tickets, trigger workflows, or update databases without human intervention. By providing predictable, schema-aligned responses, structured output transforms LLM from a conversational tool into a reliable system component that can be integrated with databases, APIs, and business logic. This capability opens new possibilities for automation while preserving the intelligent inference that underpins the value of these models.

In addition to improved reliability and simplified post-processing, structured output provides additional benefits that enhance the performance, security, and safety of your production environment.

  • Reduced token usage and faster response: By restricting generation to a defined schema, structured output removes unnecessary redundant free-form text, thereby reducing the number of tokens. Token generation occurs sequentially, so shorter outputs result in faster response, lower latency, and better overall performance and cost efficiency.
  • Enhanced security against prompt injection: Structured output helps narrow the expression space of your model and prevents the generation of arbitrary or unsafe content. Malicious parties cannot insert instructions, code, or unexpected text outside of the defined structure. Each field must match the expected type and format so that the output falls within safe boundaries.
  • Safety and policy management: Structured output allows you to design schemas that help prevent content that is inherently harmful, harmful, or violates policy. By restricting fields to approved values, enforcing patterns, and restricting free-form text, schemas ensure that output meets regulatory requirements.

In the following sections, we explore how structured output works with custom model imports in Amazon Bedrock and provide an example of enabling structured output in an API call.

Using Structured Output with Amazon Bedrock Custom Model Import

First, let’s assume that you have already imported the Hugging Face model into Amazon Bedrock using the custom model import feature.

Prerequisites

Before continuing, make sure you have the following:

  • An active AWS account with access to Amazon Bedrock
  • Custom models created in Amazon Bedrock using the custom model import feature
  • Appropriate AWS Identity and Access Management (IAM) permissions to invoke the model through Amazon Bedrock Runtime

With these prerequisites in place, let’s explore how to implement structured output using imported models.

To start using structured output with custom model import in Amazon Bedrock, first set up your environment. In Python, this involves creating a Bedrock Runtime client and initializing the tokenizer from the imported Hugging Face model.

The Bedrock Runtime client provides access to models imported using Bedrock. InvokeModel API. The tokenizer applies the correct chat template that matches the imported model. This defines how user, system, and assistant messages are combined into one prompt, and how role markers are displayed. <|user|>, <|assistant|>) is inserted and is where the model’s response begins.

by calling tokenizer.apply_chat_template(messages, tokenize=False) You can generate prompts that exactly match the input format expected by your model. This is essential for consistent and reliable inference, especially when structured encoding is enabled.

import boto3
from transformers import AutoTokenizer
from botocore.config import Config

# HF model identifier imported into Bedrock
hf_model_id = "<>" # Example: "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
model_arn = "arn:aws:bedrock:<>:<>:imported-model/your-model-id"
region      = "<>"

# Initialize tokenizer aligned with your imported model 
tokenizer = AutoTokenizer.from_pretrained(hf_model_id)

# Initialize Bedrock client
bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name=region)

Implementing structured output

When you call a custom model in Amazon Bedrock, you can add options to enable structured output. response_format Block the request payload. This block accepts a JSON schema that defines the structure of the model’s response. During inference, the model applies this schema in real time and ensures that each generated token conforms to the defined structure. Below is a walkthrough that shows how to implement structured output using a simple address extraction task.

Step 1: Define the data structure

Pydantic models can be used to define expected outputs. This serves as a typed contract for the data you want to extract.

from pydantic import BaseModel, Field

class Address(BaseModel):
    street_number: str = Field(description="Street number")
    street_name: str = Field(description="Street name including type (Ave, St, Rd, etc.)")
    city: str = Field(description="City name")
    state: str = Field(description="Two-letter state abbreviation")
    zip_code: str = Field(description="5-digit ZIP code")

Step 2: Generate JSON Schema

Pydantic can automatically convert your data model to JSON Schema.

schema = Address.model_json_schema()
address_schema = {
    "name": "Address",
    "schema": schema
}

This schema defines the type, description, and requirements for each field and creates a blueprint for the model to follow during generation.

Step 3: Prepare the input message

Format the input using the chat format expected by your model.

messages = [{
    "role": "user",
    "content": "Extract the address: 456 Tech Boulevard, San Francisco, CA 94105"
}]

Step 4: Apply the chat template

Use the model’s tokenizer to generate formatted prompts.

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

Step 5: Build the request payload

Create the request body. response_format What references the schema:

request_body = {
    'prompt': prompt,
    'temperature': 0.1,
    'max_gen_len': 1000,
    'top_p': 0.9,
    'response_format': {
        "type": "json_schema",
        "json_schema": address_schema
    }
}

Step 6: Invoke the model

Submit your request using . InvokeModel API:

response = bedrock_runtime.invoke_model(
    modelId=model_arn,
    body=json.dumps(request_body),
    accept="application/json",
    contentType="application/json"
)

Step 7: Parse the response

Extract the generated text from the response.

result = json.loads(response['body'].read().decode('utf-8'))
raw_output = result['choices'][0]['text']
print(raw_output)

The schema defines required fields, so the model response includes those fields.

{
"street_number": "456",
"street_name": "Tech Boulevard",
"city": "San Francisco",
"state": "CA",
"zip_code": "94105"
}

The output is clean, valid JSON that you can use directly in your applications without any additional parsing, filtering, or cleanup.

conclusion

Structured output with Amazon Bedrock’s Custom Model Import provides an effective way to generate structured, schema-aligned output from your models. By moving validation to the model inference itself, structured output reduces the need for complex post-processing workflows and error handling code.

Structured output produces output that is predictable and easy to integrate into your system, supporting a variety of use cases. Examples include building financial applications that require accurate data extraction, healthcare systems that require structured clinical documentation, and customer service systems that require consistent ticket classification.

Start experimenting with structured output using custom model import today and transform the way your AI applications deliver consistent, production-ready results.


About the author

Manoj Selvakumar As a Generative AI Specialist Solutions Architect at AWS, I help organizations design, prototype, and scale AI-powered solutions on the cloud. With expertise in deep learning, scalable cloud-native systems, and multi-agent orchestration, we focus on turning emerging innovations into production-ready architectures that drive measurable business value. He is passionate about turning complex AI concepts into practical applications and enabling customers to responsibly innovate at scale, from early experiments to enterprise deployments. Prior to joining AWS, Manoj worked in consulting, delivering data science and AI solutions to enterprise clients, building end-to-end machine learning systems supported by strong MLOps practices for training, deployment, and monitoring in production.

Yangyang Zhang She is a Senior Generative AI Data Scientist at Amazon Web Services and a Generative AI Specialist working on cutting-edge AI/ML technologies to help customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in electrical engineering. Outside of work, I love traveling, working out, and exploring new things.

Lokeswaran Ravi He is a senior deep learning compiler engineer at AWS, specializing in ML optimization, model acceleration, and AI security. He is focused on building a secure ecosystem to increase efficiency, reduce costs, and democratize AI technology, making cutting-edge ML available and impactful across industries.

Revendra Kumar I am a senior software development engineer at Amazon Web Services. In my current role, I focus on model hosting and inference MLOps on Amazon Bedrock. Previously, he was an engineer hosting quantum computers in the cloud and developing infrastructure solutions for on-premises cloud environments. Outside of professional work, Revendra enjoys staying active by playing tennis and hiking.

Muzart Tuman I’m a software engineer. We leverage our experience in areas such as deep learning, machine learning optimization, and AI-driven applications to help you solve real-world problems in a scalable, efficient, and accessible way. His goal is to create impactful tools that not only improve technical capabilities, but also create meaningful change across industries and communities.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *