A practical guide to shipping enterprise agents that work: using AWS Bedrock AgentCore to build and deploy, and Veris AI to simulate, grade, and improve.
Building an enterprise AI agent is a two-part problem:
AWS Bedrock AgentCore solves the first half. Veris AI solves the second. Together they give enterprise teams a workflow that goes from a five-line scaffold to a battle-tested production agent, without "release-and-pray" rollouts or production data leaving your environment.
This doc walks through the problem, the joint workflow, a worked example (a Medical Triage Agent on AWS HealthLake), and how the same pattern applies to any enterprise agent.
Most agent teams today follow the same loop: small controlled rollout, hope nothing breaks, then gradually expand. That works for consumer toys. It does not scale for enterprise, for three reasons.
Enterprise agents touch PHI, PII, financial records, internal documents, customer contracts. You cannot afford to find out in production that the agent leaks data, hallucinates a record, or mishandles a privileged workflow. You need confidence before the first real user.
You can write a handful of manual test cases. You cannot manually anticipate the full distribution of real user behavior: phrasing variations, partial information, contradictory instructions, adversarial inputs. The edge cases you don't think of are the ones that break in production.
Real enterprise integrations like Epic/FHIR, Salesforce, payment APIs, and internal services are slow and risky to test against. Sometimes you literally cannot test against them. Every iteration cycle is gated on access, data setup, and side-effect cleanup. Improvements stall.
The net effect: teams ship under-tested, then either firefight in production or never roll out broadly enough to matter.
AgentCore handles the production path. You write your agent logic; AgentCore handles the container, runtime, identity, networking, scaling, and tool auth.
Veris AI handles the readiness path. You hand Veris AI the same agent code. Veris AI wraps it in a sandbox that mocks every service it depends on, simulates the users it will talk to, generates a comprehensive scenario set, runs everything in parallel, grades the results, and returns specific fixes.
To make this concrete, here is an end-to-end example using a healthcare agent that hits every enterprise pain point.
What it does:
Why it's hard:
The entire scaffold is roughly five lines:
from bedrock_agentcore.runtime import BedrockAgentCoreApp
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload):
return agent.run(payload["prompt"])
@app.websocket
async def chat(ws):
...
@app.entrypoint and @app.websocket turn your agent into a deployable, stateful service with HTTP and WebSocket endpoints out of the box: no server code, no load balancer config.
Local dev and deployment use the same CLI:
agentcore dev -p 8088 # run locally
agentcore invoke --dev --port 8088 # hit it with a prompt
agentcore deploy # ship to production
agentcore deploy packages the container, pushes it to Bedrock, wires up the IAM task role, and exposes the production endpoint. Every redeploy lands in the same environment, so iteration is fast and predictable.
Tools are just decorated Python functions:
@agent.tool
def search_patient(name: str, dob: str) -> Patient: ...
@agent.tool
def get_conditions(patient_id: str) -> list[Condition]: ...
@agent.tool
def book_referral(patient_id: str, specialty: str) -> Appointment: ...
The Medical Triage Agent exposes nine such tools, all backed by AWS HealthLake. AgentCore's runtime handles SigV4 signing and IAM at the task-role level. No API keys in env vars, no custom auth code, no IAM plumbing in the agent itself.
The agent is live. You can hit it in the AgentCore console. But how do you actually test it? Two bad options:
Veris AI is the third option.
You wrap the same agent in a Dockerfile and point it at Veris AI-mocked services. The agent's code is identical to production; only its dependencies are swapped for stateful simulations:
No production APIs touched. No real patient data. Every service the agent depends on still behaves like the real thing.
Veris AI reads the agent's code, understands its tools, and generates a comprehensive scenario set covering every tool and every meaningful path. You can steer it toward happy paths, edge cases, adversarial inputs, or specific categories.
For the triage agent: 25 scenarios spanning routine appointments to emergency escalations, with 100% tool coverage. Scenarios at this breadth are nearly impossible to write by hand.
Scenarios run in parallel. Twenty-five scenarios finish in roughly the same wall-clock time as one. For each scenario you get:
When something fails, the trace tells you exactly where.
The end-of-run report includes:
The fixes are integration-ready. The same simulation data can also feed downstream model tuning (SFT, RL).
Every change runs as a candidate against the production baseline. Side-by-side comparison: green for improvements, red for regressions, column by column. Catch problems before merge, not after deploy. Runs fit naturally into CI: every commit, nightly, or on-demand.
Going through this loop on the Medical Triage Agent produced a few outcomes worth calling out:
The Medical Triage Agent is one shape of a general pattern. Any enterprise agent that depends on:
...has the same shape of problem and the same shape of solution.
| Industry | Agent | Sensitive surface | Integration to mock |
|---|---|---|---|
| Healthcare | Triage, prior auth, scheduling | PHI | FHIR / HealthLake / EHR |
| Financial svc. | KYC, dispute resolution, ops | PII, account data | Core banking, fraud APIs |
| Insurance | Claims intake, adjudication | Claims, medical records | Policy / claims systems |
| Customer ops | Tier-1 support, returns | Customer + order data | CRM, OMS, payments |
| Internal ops | IT helpdesk, HR, procurement | Employee data, contracts | ServiceNow, Workday |
| Sales | Prospecting, deal-desk | Pipeline, pricing | Salesforce, CPQ |
The pattern is the same in every row:
When teams try to do this in-house, they typically end up writing:
Veris AI ships those components. AgentCore ships the production runtime. Your team writes the agent.
Spin up an account and run your first simulation at console.veris.ai.
From there:
@app.entrypoint, decorate your tools, and agentcore deploy.Dockerfile. Point at the Veris AI-mocked versions of your enterprise services via console.veris.ai.The result: an agent that's been through hundreds of realistic conversations, with edge cases caught, fixes verified, and zero exposure to real production systems or customer data, before it ever talks to a real user.
That's the joint promise. AgentCore gets your agent out there. Veris AI makes sure it's actually ready.
Or book a demo at veris.ai.