Most production AWS environments at Fortune 100 companies are not built by the company that runs them. They are built by vendors — specialty firms shipping AWS infrastructure as a contracted deliverable. The vendor designs the architecture, writes the CDK or Terraform, ships releases on a monthly cadence, and hands operations back to the buyer.

This arrangement has a name in the procurement world: vendor-shipped infrastructure. It has another name in the buyer's engineering team: the system that keeps breaking and nobody can explain why.

Vendor-Shipped AWS Gatekeeping is the senior IC discipline that sits between vendor releases and the buyer's production. It is buy-side technical authority over a sell-side delivery process. It is the work of catching what the vendor missed, before the vendor's monthly release hits your $3,000-per-minute revenue system.

This guide describes the role, the failure modes, the framework, and the hiring decision.

Why vendor-shipped AWS exists

Three forces created the vendor-shipped AWS pattern over the last decade:

Specialty knowledge concentration. A vendor that has shipped sixteen healthcare-exchange AWS deployments has institutional knowledge that no single Fortune 100 team can reproduce internally.
Capital structure preference. Buyers prefer OpEx (vendor contract) over CapEx (internal team build-out). The vendor invoice is one line item; an internal cloud team is a multi-year commitment.
Vendor commercial velocity. Vendors iterate faster than internal teams. A monthly vendor release cycle ships more change than an annual internal release cycle.

The arrangement works — until it doesn't. Vendors optimize for shipping. Buyers operate the result. The gap between those two incentives is where production incidents live.

The buyer-vendor technical dynamic

In a vendor-shipped AWS relationship, the buyer typically holds:

The AWS account (or accounts) the vendor deploys into
The commercial contract specifying scope, deliverables, and SLAs
The production operational responsibility (because the buyer's customers are the ones impacted by outages)
The downstream system relationships (publishers, consumers, payment rails, customer interfaces)

The vendor holds:

The architectural design
The Infrastructure-as-Code source (CDK, Terraform, CloudFormation)
The release cadence and pipeline
The implementation expertise on the specific platform

The buyer has accountability without authorship. The vendor has authorship without operational accountability. This asymmetry produces a specific failure mode: vendor decisions that are reasonable in isolation but unreasonable in the buyer's production context.

Common failure modes

From operating as the sole buyer-side gatekeeper across multiple vendor-shipped AWS systems, the same failure patterns appear:

1. The reasonable-default IAM problem

The vendor's default IAM role uses "Action": "*" on whatever resources their service touches. This is reasonable from the vendor's perspective — they need access to ship features. It is unreasonable from the buyer's perspective — any compromise of the vendor's credentials grants attackers the same broad access. Vendor-shipped AWS environments routinely fail SOC 2 audits on this exact failure mode.

2. The ungoverned NAT Gateway

The vendor's Terraform template includes a NAT Gateway in the default-VPC pattern. The vendor's developer puts a single service that needed outbound internet access into a private subnet, justifying the NAT Gateway. Months later, the buyer's AWS bill shows $4,500/month for NAT data processing — for a service that turns out to need that internet access only during a five-minute deployment step. This pattern, repeated across services, accounts for most of the 30%+ AWS waste in vendor-shipped environments.

3. The undeclared dependency cascade

The vendor's release adds a Lambda function that calls an internal DynamoDB table they've also added. They forget to mention that the Lambda's IAM role inherits permissions from a parent stack and that DynamoDB writes now affect throughput on a separate table the buyer's customers depend on. The first the buyer hears about the dependency is when the customer-facing table starts throttling.

4. The observability gap

Vendors ship their own internal observability — CloudWatch dashboards designed for the vendor's perspective on the system. The buyer's downstream consumers, however, need different signals: customer-facing latency percentiles, payment success rates, queue depth on internal integration points. These rarely appear in the vendor's default observability layer because the vendor doesn't operate them.

5. The vendor SLA versus customer SLA mismatch

The vendor contracts a 99.5% SLA. The buyer commits a 99.9% SLA to their own customers. The buyer's effective SLA is bounded by the vendor's, but the contract was negotiated by procurement before engineering reviewed the math. The 0.4% gap means roughly thirty extra minutes of downtime per month that the buyer absorbs.

The five-step gatekeeping framework

Effective vendor-shipped AWS gatekeeping follows five repeatable steps. Each new vendor release runs through the full cycle.

Step 1: Establish the contract surface

Before any release review, document the AWS resources the vendor has access to: which accounts, which services, which IAM roles, which network boundaries. This is the contract surface. Without it, a release review is impossible — you cannot evaluate what changed if you don't know what existed.

The contract surface document should fit on one page. It should list every AWS service the vendor has touched, the IAM principal that touched it, and the operational impact of changes to it. This document is updated quarterly, not per-release.

Step 2: Read the IaC like a vendor-skeptic

When the vendor's release CDK or Terraform diff lands, read it assuming the vendor is optimizing for ship velocity, not your production stability. Specifically look for:

IAM roles or policies broadening access
NAT Gateways added without justification
Resources added in non-default regions (cost and compliance risk)
Cross-stack dependencies without explicit declaration
Default encryption settings overridden
Public-facing resources (API Gateway, ELB) without WAF
Lambda timeouts or memory increased beyond previous limits
RDS or DynamoDB capacity changes that affect cost
Missing CloudWatch alarms on new resources
Missing tags for cost allocation

Read time per release: typically 2 to 4 hours for a meaningful change. This is the gatekeeper's most concentrated value-adding activity.

Step 3: Run gated remediation conversations

Before any release lands in your account, surface the 2-3 issues that must be addressed. Frame these as a production-readiness checklist, not as blockers. The framing matters: vendors respond better to a list of conditions for shipping than to a rejection of their work.

Typical findings: change one IAM policy from "*" to specific actions, move one NAT Gateway resource, add three CloudWatch alarms, declare one missing cross-stack dependency. These are surgical edits — usually under a day of vendor work. The conversation framing keeps the vendor relationship healthy while the technical bar holds.

Step 4: Deploy with observability ahead of traffic

When the release is cleared for production, the buyer's gatekeeper does not just let the vendor flip the switch. The gatekeeper deploys the observability layer first: CloudWatch alarms, X-Ray instrumentation, and synthetic canaries that simulate real customer traffic.

The principle: observability leads, traffic follows. When traffic does land on the new release, every important metric is already being watched. If a vendor's release degrades a critical metric, the alarm fires before customers complain.

Step 5: Post-release retrospectives

Within fourteen days of each release, the buyer's gatekeeper holds a brief retrospective. The document captures four things:

What shipped versus what was promised
What broke versus what was caught
What deserves to be added to the next release's gating checklist
What architectural pattern the vendor is drifting toward (and whether to accept or push back)

This document is shared with the vendor. It signals attention. Over time, vendors ship higher-quality releases when they know the gatekeeper reviews this depth.

When you need vendor-shipped AWS gatekeeping

The role is needed when several conditions converge:

A vendor builds and ships AWS infrastructure into your accounts on a recurring cadence (monthly is typical)
The system supports revenue-bearing or compliance-bearing operations
Your internal team lacks deep AWS architectural experience on the specific patterns the vendor uses
The cost of an outage exceeds the cost of senior buy-side coverage by a meaningful multiple

Most Fortune 100 buyers of vendor-shipped AWS have all four conditions. Most of them don't have the role staffed.

When you don't need it

Skip the role if:

The vendor-shipped system is non-production (dev, staging, or sandbox only)
Your internal team already has senior AWS depth and bandwidth to review every release
The vendor releases quarterly or less frequently and you can absorb dedicated review cycles internally
The system's blast radius is small — single-tenant, low-revenue, easy to recover

Honesty here serves the buyer better than the consultant.

In-house, fractional, or both

Three ways to staff vendor-shipped AWS gatekeeping:

In-house senior hire

A full-time Principal Engineer or Cloud Architect with vendor-management responsibility. Best for organizations with multiple vendor-shipped systems and steady volume.

Fractional senior IC

A part-time embedded senior engineer with the specialty. Right for one or two vendor relationships where a full FTE is overscoped but the work is real.

Project audit + handoff

A defined-scope engagement that establishes the framework, runs through the first 2-3 release cycles, and trains an internal engineer to continue. Best when capability transfer is the goal.

Most Fortune 100 buyers of vendor-shipped AWS land on fractional. The role is real, ongoing, but rarely big enough to justify a Principal-tier FTE. A 10-20 hour weekly fractional engagement covers monthly release cycles, ad-hoc incident escalation, and quarterly architecture review without the overhead of full-time recruiting.

What good looks like

After six months of disciplined vendor-shipped AWS gatekeeping, a healthy engagement shows:

Zero production incidents caused by ungated vendor changes
Material reduction (often 20-40%) in AWS spend on the vendor-managed portion of the environment
Vendor release quality measurably improving — fewer remediation items per release after the gatekeeping discipline establishes the bar
Internal team comfort with the system increasing — knowledge transfer is happening as part of the gating process
Customer-facing SLA holding above the vendor's baseline SLA, because observability and recovery are now buyer-controlled

Working with David

I currently provide vendor-shipped AWS gatekeeping for a Fortune 100 client running a high-revenue commerce platform built by an external vendor. I review every monthly vendor release, typically identify several remediation items before production-ready status, and coordinate the buyer-side remediation conversations. I am the sole senior technical resource on the buyer side of an eight-figure annual vendor relationship.

I also operate the production observability layer, hold sole 24x7 on-call coverage for the platform, and serve as the cross-team senior escalation point when downstream consumers report integration issues.

If your organization runs vendor-shipped AWS infrastructure on a revenue-critical system and your internal team lacks dedicated senior IC coverage on the buy side, that's the engagement I specialize in. Book a 15-minute discovery call to talk through whether the role fits your situation.

Book a discovery call →

What is Vendor-Shipped AWS Gatekeeping?