AWS Bedrock AgentCore: Controls, Governance, and the Architectural Decisions That Shape Everything Else

Part 3 of 3: Policy, Guardrails, Memory, Observability, Registry, Cost, and the Patterns That Harden Into Defaults

Parts 1 and 2 of this series covered the strategic case for platform foundations and the infrastructure layer; Runtime, Identity, and Gateway. With those in place, you can deploy agents and connect them to tools in a governed, auditable way. This post covers what comes next: the controls that keep agents operating safely, the governance patterns that make spend and access attributable, the architectural decisions around model access and cost attribution, and the lessons that only surface once you are building in production.

AgentCore Policy: Deterministic Enforcement Outside the Model

The platform concern: LLMs cannot guarantee their own behavioural boundaries. An agent that is instructed not to access financial data may still attempt to do so if a user crafts the right prompt. Business rules encoded in system prompts are suggestions, not controls. At enterprise scale, you need enforcement that is deterministic, auditable, and completely independent of the model’s probabilistic outputs.

What Policy provides: A policy engine that intercepts all agent traffic through AgentCore Gateways and evaluates every request against defined policies before tool access is granted. It operates entirely outside of agent code. The model cannot reason around it, and the agent cannot bypass it.

Policies are authored in Cedar (an open-source policy language purpose-built for fine-grained authorisation) or in plain English, which AgentCore automatically translates to Cedar. Before policies are applied to live traffic, automated reasoning checks validate them for common authoring errors: overly permissive grants, overly restrictive rules, and logically unsatisfiable conditions that would silently block everything. All policy decisions are logged to CloudWatch, giving you an auditable record of every enforcement action.

The practical consequence of this architecture is significant: you can enforce access controls based on user identity and tool input parameters, and those controls hold regardless of how the agent was prompted. An agent that is policy-restricted from modifying production records cannot modify production records, even if someone asks it to. That’s a qualitatively different security posture than hoping the system prompt holds.

Policy also supports a log-only mode that evaluates requests against defined policies without blocking them. This makes it practical to introduce policy enforcement incrementally — you can observe what would have been blocked in production before switching to enforce mode, rather than discovering overly restrictive rules by breaking a live agent.

Bedrock Guardrails: Safe and Compliant Model Behaviour at the Infrastructure Level

The platform concern: Even when an agent is behaving exactly as instructed, the foundation model it uses may produce outputs that violate content policies, expose sensitive data, discuss topics the organisation has explicitly restricted, or generate responses that create regulatory risk. Solving this with prompt engineering alone is fragile: prompts can be bypassed, and they don’t give you auditability or consistent enforcement across model versions.

What Guardrails provides: An evaluation layer that intercepts both user inputs and model responses against configurable policies, applied at the inference API level across InvokeModel, Converse , and their streaming variants. When a guardrail triggers on an input, the model is never invoked; the request is blocked before incurring inference cost. When it triggers on an output, the response is replaced with a configured blocked message or sensitive content is masked in place.

The policy types cover the main enterprise use cases:

Content filters: detect and block harmful content categories (violence, hate speech, sexual content) at configurable severity thresholds

Denied topics: prevent the model from engaging with specific subject areas defined by your organisation (competitor products, legal matters, restricted domains)

Sensitive information filters: automatically redact PII and confidential data from responses before they reach the user

Word filters: block specific terms or phrases

Image content filters: evaluate image inputs and outputs where multimodal models are in use

Different guardrail configurations can be applied to different agents, allowing stricter controls where the risk profile demands it. A customer-facing agent and an internal analyst tool can carry different policies without any change to model configuration.

It is worth being explicit about how Policy and Guardrails relate, because they are frequently confused. Guardrails govern what the model says. Policy governs what the agent does. An agent can produce entirely compliant model outputs and still attempt to call a tool it should not. A sound control model applies both: Guardrails at the inference layer, Policy at the tool access layer. Neither replaces the other.

AgentCore Memory: Managed Persistence With Governance Built In

The platform concern: Stateless agents are limited agents. Useful assistants need to remember context across sessions: previous interactions, user preferences, in-progress tasks. But implementing persistence yourself means making decisions about storage, retention, scope boundaries, and data governance that compound into a significant surface area. Memory implemented as a database with a session key is memory without governance.

What Memory provides: Managed context persistence scoped to the agent, user, and session by design. When agents are redeployed, scaled, or replaced, memory persists correctly without you managing the underlying storage. The scoping model ensures that context doesn’t leak across session or user boundaries, which matters both for correctness and for compliance with data handling obligations.

For enterprise deployments, the main benefit is that memory governance decisions, including retention periods, access boundaries, and what categories of information should be stored, can be made at the platform level rather than delegated to individual agent teams. This is significantly cheaper to design early than to retrofit once agents are live and users are relying on persistent context. Once users depend on an agent remembering them, changing the retention model requires coordinated changes across the platform and agent code — and a conversation with users about why their history has changed.

AgentCore Observability: Tracing That Works Like the Rest of Your Stack

The platform concern: Debugging a misbehaving agent is hard when you can’t see what it did. The model invocation is a black box. The tool calls are distributed. The latency could be in the model, the Gateway, or a downstream API. Without end-to-end tracing, every incident starts with a guessing game, and every investigation involves piecing together logs from multiple systems that weren’t designed to correlate.

What Observability provides: Distributed tracing integrated with AWS X-Ray and OpenTelemetry, covering the full request path from the initial invocation through model inference and every tool call the agent makes. Traces are correlated across components and flow into CloudWatch alongside your other operational metrics. No separate AI monitoring console to context-switch into during an incident.

Performance metrics and SLA tracking are included. The Observability APIs also support metadata queries that can serve as a foundation for an agent registry: a queryable record of what agents are deployed, what they’re connected to, and how they’re performing at any point in time.

AgentCore Registry: A Governed Catalogue for the Full Agent Fleet

AgentCore Registry is currently in public preview and is expected to reach general availability shortly.

The platform concern: As the number of agents grows, a simple question becomes surprisingly hard to answer: what agents actually exist across the organisation, who owns them, what they connect to, and whether a team about to build something new could instead reuse something that already works. Without a structured answer to that question, you get agent sprawl: parallel development of overlapping capabilities, no visibility into the full fleet, and no governed process for publishing or retiring agents. A shared document can serve this purpose for a handful of agents. It does not scale to dozens or hundreds.

What Registry provides: A fully managed discovery and governance service that maintains a centralised catalogue of agents, tools, MCP servers, agent skills, and custom resources across the organisation. Each entry is a registry record: a structured metadata object describing what a resource is, what it does, and how to reach it. The registry is not a deployment service; it does not run agents. It is a record of what exists, where it lives, and who is responsible for it.

Discovery uses a hybrid approach combining semantic and keyword search, designed to be queried by both humans and AI agents. A search for “payment processing” can surface entries tagged as “billing” or “invoicing”; the registry understands intent, not just exact terms. This matters most for preventing duplicate development: before a team builds a new agent, they can search the catalogue to confirm whether an equivalent capability already exists and is available for reuse.

Governance follows a structured publication lifecycle: draft, pending approval, approved. Administrators configure the registry and set approval requirements; publishers submit records; curators review and approve or reject them. Amazon EventBridge can be configured to notify curators when records enter the approval queue, integrating the publication process into your existing operational tooling. Records carry version control and can be deprecated and retired as capabilities evolve. Authorisation is handled through IAM credentials or JWT tokens from your corporate identity provider, controlling who can publish to the registry and who can search it.

The registry integrates with the rest of the AgentCore suite. Agents and MCP servers hosted on Runtime can be catalogued; tools exposed through Gateway can be registered; AWS CloudTrail logs all registry API operations for audit. The registry also exposes a remote MCP endpoint, which means an AI agent can query the registry directly to discover other agents or tools, enabling coordination patterns where one agent finds and delegates to another via the catalogue. Alongside the Observability APIs, this gives you a queryable operational picture of the full agent fleet: what is registered, what is running, and how it is performing.

Cost Governance Deserves Its Own Section

Of all the foundations, cost governance is the most consistently deprioritised until the pain arrives. By the time a surprise cloud bill materialises, the attribution work is retroactive at best.

The mechanism to implement this correctly is Application Inference Profiles: named, tagged wrappers around foundation model ARNs. Instead of agents invoking models directly, every agent uses a profile. This single architectural decision enables tag-based spend tracking per team or application, AWS Budgets alerts tied directly to profile tags, IAM policies scoped to specific profiles rather than raw model ARNs, and Cost Anomaly Detection monitoring per-profile spend patterns.

Application Inference Profiles also address a governance concern that sits alongside cost attribution. When a profile is created per agent, per team, or per AWS account, IAM permissions can be scoped so that each consumer can invoke only the model behind their associated profile, with no access to other model ARNs and no ability to call models provisioned for other agents. Combined with Service Control Policies that deny direct model invocation entirely, every consumer in the organisation is required to go through a named, governed profile. The result is that access to a specific model, or to a more capable or expensive model, must be granted explicitly through profile provisioning rather than being available to anyone who knows the model ARN.

The complement is a Central AI Account pattern: foundation models provisioned in a dedicated account, Service Control Policies preventing application accounts from invoking Bedrock directly, and all model access flowing through inference profile ARNs with cross-account IAM roles. Every model invocation across the organisation is attributable, filterable, and budgetable.

A tagging schema worth locking in early:

Tag Key

Purpose

ApplicationCI

Links spend to the CMDB service record

Application

Groups costs at the platform level

Owner

Routes budget alerts to the right team

Environment

Separates prod vs non-prod spend

ModelID

Quick filtering in Cost Explorer

These five tags give you everything you need to answer “who is spending what on which model” without building custom attribution tooling.

A second attribution mechanism works at the identity layer rather than the resource layer. AWS Bedrock automatically records the IAM principal making each inference call and surfaces it in CUR 2.0 via the line_item_iam_principal column, capturing IAM user ARNs, assumed role ARNs, and federated identities. Tags attached to those principals appear in CUR 2.0 with an iamPrincipal/ prefix, letting you slice spend by team, project, or cost centre using dimensions already present in your IAM configuration, without creating any new resource types.

This covers four caller patterns: direct IAM users, dedicated application roles, federated identities via OIDC or SAML, and gateway patterns where session tags are passed dynamically at role assumption using --role-session-name and --tags.

The two approaches answer different questions. Application Inference Profiles attribute spend to the agent or application, which is the right model when you want to track at the platform layer and enforce spend controls through IAM and AWS Budgets. IAM principal attribution attributes spend to the caller identity, which is the right model when you want user or team-level visibility and prefer to work within existing IAM infrastructure. For most enterprise deployments, both are worth applying together.

When to Consider a Model Gateway

AWS Bedrock AgentCore’s Gateway handles integration between agents and the tools they call. It does not address a separate concern: what happens when your organisation needs to consume models that are not hosted on Bedrock. GPT-4o, Gemini, Mistral, or models hosted internally on SageMaker may be needed for specific use cases, or an existing agent fleet may already be built against another provider’s API. When that is the case, a dedicated model gateway, sometimes called an LLM proxy, is worth evaluating as a separate infrastructure concern.

The core function of a model gateway is to sit between your agents and any number of model providers, presenting a unified API surface regardless of what sits behind it. Agents call one endpoint; the gateway routes, authenticates, rate-limits, and logs each request. AWS publishes a reference architecture for a multi-provider generative AI gateway that builds on LiteLLM, an open source proxy supporting over 100 providers behind an OpenAI-compatible interface, deployed on Amazon ECS or EKS. The same entry point can cover Bedrock, SageMaker, OpenAI, and Anthropic’s direct API simultaneously.

A simpler option for environments that use only Bedrock is the Amazon API Gateway and Lambda pattern documented by the AWS Architecture team. API Gateway handles authentication, rate limiting, and quota management; a Lambda authorizer integrates with your existing identity provider; a Lambda function signs requests and forwards them to Bedrock. The operational overhead is minimal, but it does not extend to external providers.

Both approaches deliver the controls that make a gateway valuable at enterprise scale: provider credentials stored once in AWS Secrets Manager rather than distributed across agent codebases, a single audit trail across all model invocations, cost attribution by team or use case, and rate limiting that prevents any single consumer from running up uncapped spend.

Routing Strategies and Their Trade-offs

Once a gateway is in place, routing decisions become the main source of ongoing complexity. Three approaches are well documented:

Static routing directs each agent or task type to a fixed model. Straightforward to implement and reason about, but requires manual reconfiguration as requirements change.

Semantic routing uses embeddings and similarity matching to select the right model based on query content. Scales well across many categories but requires ongoing maintenance of reference prompts.

LLM-assisted routing uses a classifier model to select the target model. Handles nuanced classification well but adds latency and inference cost to every request.

A hybrid approach that combines semantic routing for broad categorisation with a classifier for fine-grained decisions often performs best at enterprise scale, but also carries the highest implementation and maintenance cost. This pattern suits deployments spanning multiple domains such as finance, legal, and HR, where the routing surface is large and diverse.

Weigh the Overhead Before Committing

A model gateway is a meaningful infrastructure commitment. It introduces an additional component that must be deployed with high availability, kept current, secured against prompt injection at the gateway layer, and monitored separately from your agents. If your organisation operates entirely within Bedrock, most of what a gateway provides is already available natively: AgentCore Gateway handles agent-to-tool integration, Application Inference Profiles handle model access governance and cost attribution, and Bedrock Guardrails handle content controls.

The justification for a dedicated model gateway is strongest when three conditions apply together: your organisation needs models that are not available on Bedrock, you want a consistent API surface so agents are not directly coupled to any one provider’s SDK, and you have enough model diversity that governing access centrally at the gateway is cheaper than managing it per agent.

If only one or two of those conditions apply, the operational overhead may not justify the investment. This decision warrants a clear-eyed analysis of your actual model landscape and access patterns before committing to the infrastructure.

What the Architecture Diagrams Don’t Show

Architecture diagrams show you the components and how they connect. They don’t show you which decisions are cheap to revisit and which are not, where two components that look similar are actually solving different problems at different layers, or what it means to build on a platform that is actively changing around you. Those things only emerge from the work.

Policy and Guardrails serve different layers, and you need both. Guardrails operate at the model inference layer and protect against unsafe content and PII exposure. AgentCore Policy operates at the tool access layer and enforces behavioural boundaries: what the agent is allowed to do, not just what it’s allowed to say. Neither replaces the other. An agent can produce compliant model outputs and still attempt to call a tool it shouldn’t. A sound control model applies guardrails to what the model says and policy to what the agent does.

Feature

AgentCore Policy

Bedrock Guardrails

Where it operates

Tool access layer, via AgentCore Gateway

Model inference layer

What it governs

What the agent is allowed to do: tool and API access

What the model is allowed to say: content, topics, and sensitive data

Enforcement mechanism

Cedar policies evaluated before each tool call is granted

Content, topic, and data filters evaluated on inputs and outputs

When blocking occurs

Before the tool call is executed

Inputs: before the model is invoked (inference discarded). Outputs: after inference, before the response is returned

Bypassed by prompt injection

No. Operates outside the model and agent code entirely

No. Operates at the inference API level, independent of the prompt

Testing mode

Log-only mode evaluates policies without blocking, for safe pre-production testing

No equivalent mode; active on all configured model invocations

Decisions logged to

CloudWatch metrics and logs

CloudWatch

The platform is moving fast. Components that were in private preview at design time have since gone GA; AgentCore Policy and account-level guardrails are recent examples. Building on L1 CDK constructs rather than L2 alpha constructs gives you a more stable deployment foundation, even if it’s more verbose. Plan for components to mature mid-engagement.

The Gateway authentication decision matters at architecture time. IAM-based authentication for tightly-coupled agent/tool pairs. Cognito-based authentication for independently-operated services accessed across teams. Getting this wrong and retrofitting it later is painful: it touches the agent code, the Gateway configuration, and the downstream service authentication setup simultaneously.

Memory governance is cheaper to design early than retrofit later. Once agents are live and users are relying on persistent context, changing retention policies and scope boundaries requires coordinated changes across the platform and agent code. Design these decisions from the start.