|
Modern applications increasingly require complex and lengthy coordination between services, including multi-step payment processing, orchestration of AI agents, and approval processes that await human decisions. Building these has traditionally required significant effort to implement state management, handle failures, and integrate multiple infrastructure services.
Starting today, you can use AWS Lambda durable functions to build reliable multi-step applications directly within the familiar AWS Lambda experience. Persistent functions are regular Lambda functions with the same event handlers and known integrations. When you write sequential code in your favorite programming language, persistent functions track your progress, automatically retry on failure, and pause execution for up to a year at defined points. You don’t have to pay for idle compute while you wait.
AWS Lambda persistent functions provide these capabilities using a checkpoint and replay mechanism known as persistent execution. After you enable persistent execution for your function, add the new open source persistent execution SDK to your function code. Then add automatic checkpoints and retries to your business logic using SDK primitives such as “steps” and “wait” to efficiently pause execution without any compute charges. If execution terminates unexpectedly, Lambda restarts from the last checkpoint and re-executes the event handler from the beginning, skipping completed operations.
Get started with AWS Lambda persistent functions
Describes how to use persistent functions.
First, create a new Lambda function in the console, Author from scratch. in Permanent execution section, select enable. Note that persistent function settings can only be set at function creation time and cannot be changed for existing Lambda functions at this time.

After you create a Lambda persistent function, you can get started using the provided code.

Lambda persistent functions introduce two core primitives that handle state management and recovery.
- step—
context.step()This method adds automatic retries and checkpointing to your business logic. Once a step is completed, it will be skipped during playback. - hang on—
context.wait()This method pauses execution for a specified period of time, exits the function, and pauses and resumes execution without compute charges.
Additionally, Lambda persistence functions provide other operations for more complex patterns. create_callback() Create callbacks that can be used to wait for the outcome of external events such as API responses or human approval. wait_for_condition() Pause until certain conditions are met, such as polling a REST API for process completion. parallel() or map() Working with advanced concurrency use cases.
Build a production-ready order processing workflow
Next, let’s extend the default example to build a production-ready order processing workflow. It shows how to use callbacks for external authorization, how to properly handle errors, and how to configure a retry strategy. The code is intentionally kept simple to focus on these core concepts. A complete implementation can use Amazon Bedrock to enhance the validation step and add AI-powered order analysis.
Here’s how the order processing workflow works:
- beginning,
validate_order()Check your order data to ensure all required fields are present. - Next,
send_for_approval()Submit instructions for external human approval, wait for callback responses, and pause execution with no compute fees. - after that,
process_order()Complete order processing. - Throughout the workflow, try-catch error handling distinguishes between terminal errors that immediately stop execution and recoverable errors within steps that trigger automatic retries.
Here is the complete order processing workflow, including step definitions and main handler.
import random
from aws_durable_execution_sdk_python import (
DurableContext,
StepContext,
durable_execution,
durable_step,
)
from aws_durable_execution_sdk_python.config import (
Duration,
StepConfig,
CallbackConfig,
)
from aws_durable_execution_sdk_python.retries import (
RetryStrategyConfig,
create_retry_strategy,
)
@durable_step
def validate_order(step_context: StepContext, order_id: str) -> dict:
"""Validates order data using AI."""
step_context.logger.info(f"Validating order: {order_id}")
# In production: calls Amazon Bedrock to validate order completeness and accuracy
return {"order_id": order_id, "status": "validated"}
@durable_step
def send_for_approval(step_context: StepContext, callback_id: str, order_id: str) -> dict:
"""Sends order for approval using the provided callback token."""
step_context.logger.info(f"Sending order {order_id} for approval with callback_id: {callback_id}")
# In production: send callback_id to external approval system
# The external system will call Lambda SendDurableExecutionCallbackSuccess or
# SendDurableExecutionCallbackFailure APIs with this callback_id when approval is complete
return {
"order_id": order_id,
"callback_id": callback_id,
"status": "sent_for_approval"
}
@durable_step
def process_order(step_context: StepContext, order_id: str) -> dict:
"""Processes the order with retry logic for transient failures."""
step_context.logger.info(f"Processing order: {order_id}")
# Simulate flaky API that sometimes fails
if random.random() > 0.4:
step_context.logger.info("Processing failed, will retry")
raise Exception("Processing failed")
return {
"order_id": order_id,
"status": "processed",
"timestamp": "2025-11-27T10:00:00Z",
}
@durable_execution
def lambda_handler(event: dict, context: DurableContext) -> dict:
try:
order_id = event.get("order_id")
# Step 1: Validate the order
validated = context.step(validate_order(order_id))
if validated["status"] != "validated":
raise Exception("Validation failed") # Terminal error - stops execution
context.logger.info(f"Order validated: {validated}")
# Step 2: Create callback
callback = context.create_callback(
name="awaiting-approval",
config=CallbackConfig(timeout=Duration.from_minutes(3))
)
context.logger.info(f"Created callback with id: {callback.callback_id}")
# Step 3: Send for approval with the callback_id
approval_request = context.step(send_for_approval(callback.callback_id, order_id))
context.logger.info(f"Approval request sent: {approval_request}")
# Step 4: Wait for the callback result
# This blocks until external system calls SendDurableExecutionCallbackSuccess or SendDurableExecutionCallbackFailure
approval_result = callback.result()
context.logger.info(f"Approval received: {approval_result}")
# Step 5: Process the order with custom retry strategy
retry_config = RetryStrategyConfig(max_attempts=3, backoff_rate=2.0)
processed = context.step(
process_order(order_id),
config=StepConfig(retry_strategy=create_retry_strategy(retry_config)),
)
if processed["status"] != "processed":
raise Exception("Processing failed") # Terminal error
context.logger.info(f"Order successfully processed: {processed}")
return processed
except Exception as error:
context.logger.error(f"Error processing order: {error}")
raise error # Re-raise to fail the execution
This code demonstrates some important concepts.
- error handling—try-catch blocks handle terminal errors. If an unhandled exception is thrown outside a step (such as a validation check), execution terminates immediately. This is useful in cases where retrying does not make sense, such as invalid order data.
- Retry step—Inside
process_orderIn steps, exceptions trigger automatic retries by default (Step 1) or based on configuration.RetryStrategy(Step 5). This handles temporary failures such as temporary API unavailability. - logging—I use
context.loggerFor the main handlerstep_context.loggerinner step. Context logger suppresses duplicate logs during replay.
Next, create a test event order_id Then call the function asynchronously to start the order workflow. move to test Click on the tab and fill in the options persistent execution name To identify this run. Note that persistent functions provide built-in idempotency. If you call a function twice with the same run name, the second call does not create a duplicate and returns the existing run results.

You can monitor the execution by going to Permanent execution Lambda console tabs:

You can check the status and timing of each step here. The execution result is as follows CallbackStarted followed by InvocationCompletedThis indicates that the function has finished and execution has been paused to avoid idle charges while waiting for an authorization callback.

You can now complete callbacks directly from the console by selecting . Successful sending or Send failureor programmatically using the Lambda API.

i choose Successful sending.

Once the callback completes, execution resumes and orders are processed. if process_order If a step fails due to a simulated unstable API, it will be automatically retried based on the configured strategy. If all retries are successful, the execution completes successfully.

Monitoring executions using Amazon EventBridge
You can also use Amazon EventBridge to monitor the execution of durable functions. Lambda automatically sends execution status change events to the default event bus, allowing you to build downstream workflows, send notifications, or integrate with other AWS services.
To receive these events, create an EventBridge rule on the default event bus with the following pattern:
{
"source": ["aws.lambda"],
"detail-type": ["Durable Execution Status Change"]
}
What you need to know
Important points to note are:
- availability—Lambda persistent functions are now available in the US East (Ohio) AWS Region. For the latest regional availability, please visit the AWS Features by Region page.
- Programming language support—At the time of release, AWS Lambda durable functions support JavaScript/TypeScript (Node.js 22/24) and Python (3.13/3.14). We recommend bundling the persistent execution SDK with your function code using your preferred package manager. The SDK changes rapidly, so you can easily update dependencies as new features become available.
- Using Lambda versions—When deploying durable functions to production, use Lambda versions to ensure that replay always occurs on the same code version. If you update your function code while execution is paused, replay uses the version that started execution, preventing inconsistencies due to code changes during long-running workflows.
- Test durable functionality—You can test your persistent functions locally without AWS credentials using a separate test SDK with pytest integration and the AWS Serverless Application Model (AWS SAM) command-line interface (CLI) for more complex integration tests.
- Open source SDK—Durable Execution SDK is open source for JavaScript/TypeScript and Python. You can review the source code, contribute improvements, and stay up to date with the latest features.
- Pricing—For more information about AWS Lambda durable function pricing, see the AWS Lambda pricing page.
Visit the AWS Lambda console to start using AWS Lambda persistent functions. For more information, see the AWS Lambda Durable Functions documentation page.
Happy building!
— Donnie
