Openai's GPT OSS model is now available on Sagemaker Jumpstart

Machine Learning


Today we look forward to unveiling the availability of the new open weight GPT OSS model for open AI. gpt-oss-120b and gpt-oss-20bfrom Openai on Sagemaker Jumpstart on Amazon. With this launch, you can now deploy Openai's latest inference model to build, experiment, and responsibly expand generative AI ideas on AWS.

This post shows you how to get started with these models with Sagemaker Jumpstart.

Solution overview

Openai GPT OSS model (gpt-oss-120b and gpt-oss-20b) Excellent for coding, scientific analysis, and mathematical inference tasks. Both models have a 128K context window and adjustable inference levels (low/medium/high) to suit your specific requirements. They support the integration of external tools and can be used in agent workflows via frameworks such as Strands Agents, an open source AI agent SDK. The full mindset output feature allows you to visualize the inference process in your model in detail. Using the Openai SDK, you can simply update the endpoint and call Sagemaker Endpoint directly. The models benefit from enterprise-grade security and seamless scaling, while still providing the flexibility to modify and customize them to your specific business needs.

Sagemaker Jumpstart is a fully managed service that provides cutting-edge basic models (FMS) for a variety of use cases, including content writing, code generation, question answering, copywriting, summary, classification, information search, and more. It provides a collection of pre-trained models that can deploy and accelerate the development and deployment of machine learning (ML) applications. One of the key components of Sagemaker Jumpstart is the model hub that provides a vast catalog of pre-trained models such as Openai for various tasks.

Discover and deploy Openai models in Amazon Sagemaker Studio or programmatically via the Amazon Sagemaker Python SDK to derive model performance and MLOPS controls with Amazon Sagemaker features such as Amazon Sagemaker Pipelines, Amazon Sagemaker Debugger, and Contater Logs. The model is deployed under a secure AWS environment and VPC controls to help support data security for your enterprise security needs.

You can discover GPT OSS models from the US East (Ohio, N. Virginia) and the Asia-Pacific region (Mumbai, Tokyo).

Through this example, use gpt-oss-120b Model. These steps can be replicated in gpt-oss-20b The same goes for the model.

Prerequisites

The following prerequisites are required to deploy a GPT OSS model:

  • An AWS account that contains AWS resources.
  • The role of AWS Identity and Access Management (IAM) to access Sagemaker. To learn more about how IAM works with Sagemaker, see AWS Identity and Access Management for Amazon Sagemaker AI.
  • Access to Sagemaker Studio, Sagemaker Notebook instances, or interactive development environments (IDEs) such as Pycharm and Visual Studio code. I recommend using Sagemaker Studio to use simple expansion and inference.
  • To deploy a GPT OSS model, make sure you have access to the recommended instance types based on the model size. Recommendations for these instances can be found on the Sagemaker Jumpstart model card. The default instance type for both these models is P5.48XLARGE, but you can also use other P5 family instances available. To verify that you have the required service quota, complete the following steps:
    • Service Quarter Console, Under AWS Serviceschoose Amazon Sagemaker.
    • Make sure you have sufficient quotas for the instance types required for your endpoint deployment.
    • Make sure at least one of these instance types is available in the target area.
    • If necessary, request an increase in quota and contact our AWS account team for assistance.

Deploy GPT-OSS-120B via Sagemaker JumpStart UI

To deploy, complete the following steps gpt-oss-120b Through Sagemaker Jumpstart:

  1. Select in the Sagemaker console studio In the navigation pane.
  2. First-time users will be asked to create a domain. If not, choose Open Studio.
  3. In the Sagemaker Studio console, select to access Sagemaker Jumpstart Jump start In the navigation pane.
  4. Search on the Sagemaker Jumpstart landing page gpt-oss-120b Use the search box.

  1. Select a model card to view details about the model, including licenses, data used for training, and how the model is used. Before deploying a model, check the configuration and model details from the model card. The model details page contains the following information:
    1. Model name and provider information.
    2. a Expand The button to expand the model.

  1. choose Expand Continue with deployment.
    1. for Endpoint nameenter the endpoint name (up to 50 characters).
    2. for Number of instancesenter a number between 1 and 100 (default: 1).
    3. for Instance Typeselect the instance type. For optimal performance gpt-oss-120bI recommend GPU-based instance types such as p5.48xlarge.

  1. choose Expand Expand the model and create the endpoint.

Once the deployment is complete, the endpoint status changes Inservice. At this point, the model is ready to accept inference requests through the endpoint. Once the deployment is complete, you can invoke the model and integrate it with your application using the Sagemaker Runtime client.

Deploy GPT-OSS-120B using Sagemaker Python SDK

To deploy using the SDK, start with a selection gpt-oss-120b Model specified by model_id By value openai-reasoning-gpt-oss-120b. You can use the Python SDK examples in the next section to expand the selected model to Sagemaker. Similarly, it can be expanded gpt-oss-20b Use that model ID.

Enable web search on models in EXA

By default, the Sagemaker jumpstart model runs with network isolation. The GPT OSS model comes with built-in tools for web search using EXA. This is a semantic-based web search API with embeddings. To use this tool, OpenAI requires that the customer obtain the API key from EXA and pass this key as an environment variable. JumpStartModel When deploying via the Sagemaker Python SDK. The following code details how to deploy a model to a surge maker with network isolation disabled, and explains how to pass it to an EXA API key to a model.

from sagemaker.jumpstart.model import JumpStartModel 

accept_eula = True 
model = JumpStartModel(
    model_id="openai-reasoning-gpt-oss-120b",
    enable_network_isolation=False, 
    env={
        "EXA_API_KEY": ""
    }
) 
predictor = model.deploy(
    accept_eula=accept_eula
)

You can change these configurations by specifying other non-default values JumpStartModel. The End User License Agreement (EULA) value is True Accept the conditions. In the above deployment, network isolation is set to deployment time, so you will need to create a new endpoint to turn it on again.

Optionally, you can expand the model with the default jumpstart value (with network isolation enabled) as follows:

from sagemaker.jumpstart.model import JumpStartModel 
accept_eula = True 
model = JumpStartModel(model_id="openai-reasoning-gpt-oss-120b") 
predictor = model.deploy(accept_eula=accept_eula)

Perform inference in Sagemaker Predictor

After the model is deployed, inference can be performed on the expanded endpoint via the Sagemaker predictor.

payload = {
    "model": "/opt/ml/model",
    "input": [
        {
            "role": "system",
            "content": "You are a good AI assistant"
        },
        {
            "role": "user",
            "content": "Hello, how is it going?"
        }
    ],
    "max_output_tokens": 200,
    "stream": "false",
    "temperature": 0.7,
    "top_p": 1
}
    
response = predictor.predict(payload)
print(response['output'][-1]['content'][0]['text'])

You will get the following answer:

Hey there! All good on my end—just ready to dive into whatever you need. How’s it going on your side?

Function call

The GPT OSS model was trained in a harmony response format for defining conversational structures, generating inference outputs, and constructing function calls. This format is designed to mimic the Openai Responses API, so if you've used that API before, you'll probably think this format is familiar to you. The model should not be used without using harmony format. The following example shows an example of using the tool in this format:

payload= {
  "model": "/opt/ml/model",
  "input": "System: You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2024-08-05\n\nreasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.\nCalls to these tools must go to the commentary channel: 'functions'.\n\n# Tools\n\n## functions\n\nnamespace functions {\n\n// Gets the current weather for a specific location.\ntype get_current_weather = (_: {\n// The city and state/country, e.g. \"San Francisco, CA\" or \"London, UK\"\nlocation: string,\n// Temperature unit preference\nunit?: \"celsius\" | \"fahrenheit\", // default: celsius\n}) => any;\n\n} // namespace functions\n\nDeveloper: You are a helpful AI assistant. Provide clear, concise, and helpful responses.\n\nHuman: What's the weather like in Seattle?\n\nAssistant:",
  "instructions": "You are a helpful AI assistant. Provide clear, concise, and helpful responses.",
  "max_output_tokens": 2048,
  "stream": "false",
  "temperature": 0.7,
  "reasoning": {
    "effort": "medium"
  },
  "tools": [
    {
      "type": "function",
      "name": "get_current_weather",
      "description": "Gets the current weather for a specific location",
      "parameters": {
        "type": "object",
        "properties": {
          "location": {
            "type": "string",
            "description": "The city and state/country, e.g. 'San Francisco, CA' or 'London, UK'"
          },
          "unit": {
            "type": "string",
            "enum": ["celsius", "fahrenheit"],
            "default": "celsius",
            "description": "Temperature unit preference"
          }
        },
        "required": ["location"]
      }
    }
  ],
}

You will get the following answer:

{'arguments': '{"location":"Seattle, WA"}', 'call_id': 'call_596a67599df2465495fd444772ff9539', 'name': 'get_current_weather', 'type': 'function_call', 'id': 'ft_596a67599df2465495fd444772ff9539', 'status': None}

cleaning

Once the notebook is finished running, make sure to delete any resources you created in the process to avoid additional charges. For more information, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, I showed how to deploy and get started with Openai's GPT model (gpt-oss-120band gpt-oss-20b) Sage Maker's jump start. These inference models bring advanced capabilities for coding, scientific analysis and mathematical inference tasks directly to an AWS environment with enterprise-grade security and scalability. Try out the new model and share your feedback in the comments.

Thank you to everyone who contributed to the launch: Malav Shastri, Varun Morishetty, Evan Kravitz, Benjamin Crabtree, Shen Teng, Loki Ravi, Mike James, Sadaf Fardeen, Siddharth Shah.


About the author

Nithin VijeaswaranSpecialist Solution Architect
Brean WarnerEnterprise Solutions Architect
Pradun RamadraiSenior Software Development Engineer
Yotam MossSoftware Development Manager
June wonPrincipal Product Manager



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *