Frontier agent – AWS DevOps Agent

Disciplined release management

Open all

AWS DevOps Agent helps verify code changes are release-ready during code generation by checking adherence to standards, dependency impacts, and access controls. It runs functional verification to confirm your software builds and runs as expected in an AWS-managed verification environment. DevOps Agent maps cross-repository dependencies to surface breaking changes before merge and uses deterministic mathematical verification to assess that infrastructure changes do not drift permissions outside of Well Architected best practices. By understanding your full service topology, it reasons about blast radius and reviews changes in context of the broader system.

AWS DevOps Agent generates and runs change-specific test plans for web and API-based applications in customer- provisioned environments, catching regressions, UX issues, and integration failures before they reach production. Tests target risk areas surfaced during the release readiness review rather than a static regression suite.

Get immediate feedback on standards deviations, dependency breakage, and test results without waiting on other teams to review or waiting on a pipeline to run and find them. AWS DevOps Agent delivers results through pull requests, coding agent IDEs, and CI/CD pipelines so developers stay in flow from code generation through deployment.

Autonomous incident response

Open all

AWS DevOps Agent integrates with ticketing and alarming systems like PagerDuty and ServiceNow to automatically launch investigations from incident tickets, accelerating incident response within your existing workflows to reduce mean time to resolution (MTTR).

You can also initiate and guide investigations using interactive chat. AWS DevOps Agent acts as a member of your operations team, working directly within your collaboration tools like ServiceNow and Slack to share findings and coordinate response. When needed, create an AWS Support case directly from an investigation, giving AWS Support experts immediate context for faster resolution.

AWS DevOps Agent automatically triages incidents and correlates related alarms to identify when they originate from the same event. This accelerates incident response by immediately understanding which alarms are related and which require separate investigation, reducing noise and enabling teams to focus on the most critical issues first.

AWS DevOps Agent integrates with observability tools, code repositories, and CI/CD pipelines to correlate and analyze telemetry, code, and deployment data, sharing its explored hypotheses, observations, and root cause findings. Through systematic investigations, AWS DevOps Agent identifies root cause of issues stemming from system changes, input anomalies, resource limits, component failures, and dependency issues across your entire environment.

Once AWS DevOps Agent has identified the root cause, it provides detailed mitigation plans, which include actions to resolve the incident, validate success, and revert a change if needed. AWS DevOps Agent also provides agent-ready instructions that can be implemented by another frontier agent, for example, code improvements that can be implemented by Kiro autonomous agent.

AWS DevOps Agent enhances investigation capabilities by reviewing past investigations to create learned investigation skills. The learned investigation skill analyzes past investigations to learn how to triage events and generate root cause analysis and mitigation plans better and faster, getting smarter over time.

Through systematic investigation of alarms stemming from system changes, input anomalies, resource limits, component failures, and dependency issues across your entire stack, AWS DevOps Agent guides DevOps teams with targeted mitigation steps, reducing mean time to resolution (MTTR) from hours to minutes. For example:

System changes: If an incident is caused by Amazon DynamoDB getting throttled due to a recent code change that results in high latency from inefficient use, AWS DevOps Agent may recommend rolling back the change as an immediate mitigation.
System changes: If an incident is caused by Amazon SNS subscription errors due to filter policy mismatch following a code deployment, AWS DevOps Agent may recommend rolling back the code change that altered the message structure as an immediate mitigation to restore message flow.
Input anomalies: If an incident is caused by AWS Lambda throttling on notifications due to high traffic exceeding limits, AWS DevOps Agent may recommend increasing concurrency limits as an immediate mitigation.
Input Anomalies: If an incident is caused by Amazon SNS message publish failures due to message size issues, AWS DevOps Agent may recommend adding validation to Amazon SNS message publishing as an immediate mitigation.
Resource Limits: If an incident is caused by API throttling due to exceeded rate limits, AWS DevOps Agent may recommend raising rate/burst limits as an immediate mitigation.
Resource Limits: If an incident is caused by Amazon DynamoDB throttling due to exceeded write capacity, AWS DevOps Agent may recommend increasing write capacity as an immediate mitigation.
Component Failures: If an incident is caused by cold start latency due to performance degradation, AWS DevOps Agent may recommend increasing provisioned concurrency as an immediate mitigation.

Proactive site reliability

Open all

AWS DevOps Agent analyzes patterns across historical incidents to provide actionable recommendations that strengthen four key areas: observability, infrastructure optimization, deployment pipeline enhancement, and application resilience. For example, AWS DevOps Agent can identify testing gaps that would have prevented an issue from reaching production. Recommendations also include agent-ready specs to hand implementation off to your coding agent or a colleague to update application or infrastructure code. This drives continuous improvement without need to manage a backlog.

AWS DevOps Agent identifies gaps in observability coverage and opportunities to fine tune your alarms, reducing the mean time to detection (MTTD) so you can identify issues before they become a larger problem. For example, after identifying that incident detection for recent failures took too long, AWS DevOps Agent may recommend implementing monitoring and anomaly detection closer to the error source to reduce detection time, preventing extended outages.

Using a learning loop, AWS DevOps Agent continues to refine its recommendations, align with your operational priorities, and deliver increasingly relevant recommendations tailored to your organizational needs based on your team’s feedback on recommendations.

AWS DevOps Agent analyzes patterns across historical incidents to provide targeted recommendations that prevent future outages and strengthen system resilience. By evaluating real incidents, it delivers specific, actionable improvements that reduce both frequency and impact of similar issues in four key areas: observability, infrastructure optimization, deployment pipeline enhancement, and application resilience.

Observability improvement: AWS DevOps Agent may recommend adjusting alarm thresholds from 15 failures over 20 minutes to 3 failures within 5 minutes for critical authentication systems to reduce detection time, preventing extended integration outages.
Observability improvement: AWS DevOps Agent may recommend implementing targeted CloudWatch metric filters to track anomalous "Access Denied" patterns for IAM role changes, enabling faster detection compared to a prior alarm.
Infrastructure improvement: After analyzing that the Amazon DynamoDB table schema doesn't match the service's main access pattern, forcing inefficient full table scans, AWS DevOps Agent recommends creating a Global Secondary Index (GSI) with the frequently-queried attribute as the partition key. This would transform operations from Scans to Queries, reducing latency from 2,500-3,500ms to under 100ms and preventing throttling.
Infrastructure improvement: AWS DevOps Agent’s analysis shows the application has adequate resources but is constrained by a single-pod bottleneck where all requests queue to one instance during traffic spikes. AWS DevOps Agent may recommend adding Horizontal Pod Autoscaler to the Kubernetes cluster, which will automatically scale the service horizontally based on demand, effectively distributing the load across multiple pods.
Deployment pipeline: After analyzing failed Amazon ECS deployments, AWS DevOps Agent may recommend enabling automatic rollbacks and monitoring deployment states with Amazon EventBridge. These changes will quickly detect and address task health check failures, preventing disruption of customer transactions.
Deployment pipeline: After analyzing deployment failures, AWS DevOps Agent may recommend mandatory pre-deployment validation of Amazon Managed Service for Prometheus connectivity for Amazon ECS task definitions. This recommendation would reduce failed deployments by detecting connectivity issues during the deployment process.

On-Demand SRE task handling

Open all

Ask DevOps Agent any operational question and get immediate, contextual answers grounded in your actual environment without navigating between consoles or monitoring tools. Beyond Q&A, create, save, and share custom charts and reports such as daily ops health summaries or 4xx error trends that help you track operational metrics and communicate insights with your team.

Create and schedule custom agents within Agent Spaces that run on a cadence or in response to events. For example, create a daily database health report that checks slow database queries and parameter tuning opportunities, or build an agent that reviews logs from the past 24 hours and flags anomalies for review. Add your own sub-agents that DevOps Agent can invoke as part of its workflows. Share custom agents across your team to standardize on proven patterns.

AWS DevOps Agent operates as a remote server so other applications or agents can invoke release readiness checks, trigger investigations, or query operational health. Supports MCP and A2A protocols, enabling coding agents, planning agents, or partner-built agents to leverage DevOps Agent capabilities without custom integration code.

Production intelligence

Open all

AWS DevOps Agent learns your environment, automatically discovering applications, their component services, and the resources that compose these services. It maps these relationships into a dynamic, continuously updated topology so you don't have to manually map out dependencies across individual services and resources. By correlating this live resource map with telemetry, code, and deployment data, AWS DevOps Agent enables faster incident resolution, proactive prevention of future issues, and context-aware answers grounded in how your applications actually run.

AWS DevOps Agent improves at every task the more you use it, reviewing past investigations to develop sharper triage strategies and learning your policies and deployment patterns to deliver more precise readiness reviews. Add reusable, modular skills that encode your runbooks, architectural standards, and operational practices so the agent executes tasks consistently and reliably. The agent's capabilities compound over time, both from what it learns and what you teach it.

AWS DevOps Agent integrates out of the box with CloudWatch, Datadog, Dynatrace, New Relic, Splunk, Grafana, GitHub, GitLab, Azure DevOps, ServiceNow, PagerDuty, Slack, and Microsoft Teams. Connect to remote MCP servers to extend further into proprietary systems, open-source observability signals like Prometheus, customer-managed version control, and internal runbooks without routing confidential traffic over the internet. Use private connections to keep confidential traffic within your private networks.

AWS DevOps Agent features

Disciplined release management

Autonomous incident response

Proactive site reliability

On-Demand SRE task handling

Production intelligence

Next steps

Try AWS DevOps Agent

Get started with AWS DevOps Agent

Read how AWS DevOps Agent acts as your operations teammate across AWS, multicloud, and on-prem environments

Find out how DevOps Agent credits are included with your AWS Support Plan

Did you find what you were looking for today?

Learn

Resources

Developers

Help

AWS DevOps Agent features

Disciplined release management

Release readiness review

Autonomous release testing

Built into the developer workflow

Autonomous incident response

Automated investigations

Incident coordination

Correlated alarm insights

Root cause analysis

Detailed mitigation plans

Continuously improving investigations

Example use cases

Proactive site reliability

Targeted recommendations

Early issue detection

Continuous learning

Continuous service improvements

Example use cases

On-Demand SRE task handling

Always-Available Operations Teammate

Custom SRE agents

Access from anywhere

Production intelligence

Application Mapping

Extensible agent skills

Built-in and custom Integrations

Next steps

Try AWS DevOps Agent

Get started with AWS DevOps Agent

Read how AWS DevOps Agent acts as your operations teammate across AWS, multicloud, and on-prem environments

Find out how DevOps Agent credits are included with your AWS Support Plan

Did you find what you were looking for today?

Learn

Resources

Developers

Help