The 2026 AI Agent Platform RFP Checklist: 60 Questions CIOs Should Ask (MCP, A2A, AgentKit, Agentforce 360, Agent 365)

As of November 22, 2025, the agent platform race has gone mainstream: OpenAI launched AgentKit in October, Salesforce rolled out Agentforce 360, Microsoft introduced Agent 365, and Amazon pushed browser-native automation with Nova Act. Meanwhile, industry groups advanced open protocols like A2A and enterprises are doubling down on governance and security. citeturn0search0turn0search3turn0news12turn0search2turn4search2

That noise makes procurement hard. This vendor-agnostic RFP checklist turns the headlines into practical questions you can hand vendors on day one—and score consistently across security, interoperability, observability, cost-to-serve, and reliability. It also links to hands-on build guides if you want to prototype before you buy.

Who this is for

Startup founders validating a first production agent.
E‑commerce operators scaling support or post‑purchase automations.
Product, data, and platform teams standardizing on an enterprise agent stack.

Context you can cite in your RFP

Recent launches signal the features buyers should demand: agent builders and evaluations (AgentKit), enterprise orchestration (Agentforce 360), centralized registry/IAM (Agent 365), browser agents for GUI workflows (Nova Act), and open interop via A2A. Use them as reference capabilities, not vendor lock‑in. citeturn0search0turn0search3turn0news12turn0search2turn4search2

Risk teams will ask for alignment with recognized frameworks like NIST AI RMF (and its Generative AI profile) and ISO/IEC 42001 for AI management systems. Reference them explicitly in your RFP. citeturn1search0turn1search2turn3search0

Finally, expect questions about real‑world reliability: Microsoft’s recent research shows agents fail in surprising ways under competitive, messy conditions—so demand evidence, not demos. citeturn0search5

How to use this checklist

Score each question 0–3 (0=no capability, 1=partial, 2=meets, 3=exceeds).
Weight sections: Security 25%, Interop 15%, Observability 15%, Reliability/SLOs 15%, Data & Compliance 15%, Remaining 15%.
Require a 2–3 page evidence appendix per section (screenshots, schema, dashboards).

60 RFP questions to send vendors

1) Registry, Identity & Access (least privilege)

Do you provide an agent registry with unique IDs, roles, and lifecycle states (draft, prod, retired)?
How are tool permissions scoped per agent (capability bounding, time‑boxed tokens)?
Do you support human operator identities and auditable approvals for high‑risk actions?
Can we enforce policy centrally (deny‑lists, rate limits, kill switch) across all agents?
What is your integration surface with SSO/SCIM and workload identity (OIDC, mTLS)?
How do you version and revoke agent/skill manifests?

2) Interoperability & Protocols (MCP + A2A)

Which open protocols do you support for tools and data (e.g., MCP tool servers)?
Do your agents speak Agent2Agent (A2A) for cross‑vendor collaboration? Roadmap dates? citeturn4search2
Can agents interoperate with Microsoft/Copilot Studio or other platforms via A2A? citeturn4search1
How do you encode capabilities/limits in portable agent cards or descriptors?
Do you provide migration guides from/to AgentKit or Agentforce 360? citeturn0search0turn0search3
How are cross‑org agent trust and discovery handled (allow‑lists, attestations)?

3) Observability (OpenTelemetry first)

Are traces/logs/metrics exported in OpenTelemetry format with GenAI/agent semantics? Links to your schema please.
Can we view a single, correlated trace from user intent → plan → tool calls → external systems → outcome?
Do you support prompt, tool, and policy span attributes; PII scrubbing; and cost-per-outcome tags?
Can we stream traces to our APM (Datadog, Grafana, New Relic) without vendor gateways?
How do you sample intelligently (error/latency/cost‑aware) to control telemetry bill?
Do you expose dashboards for success rate, TTFT, TPOT, and human‑handoff rate?

4) Evals, Testing & Red‑teaming

Do you ship task‑level evals and regression suites (Evals for Agents or equivalent)? citeturn0search0
How do you test for prompt injection, tool abuse, data exfiltration, and jailbreaks?
Can we bring custom datasets and failure libraries? Scheduled evals pre‑release?
Do you support canary cohorts and guardrail tests in CI/CD?
What’s your coverage for browser tasks vs API‑only tasks?
Will you share recent red‑team findings and fixes (within an NDA)?

5) Security & Safety (production posture)

How are tool servers isolated (network egress, allow‑lists, sandboxing, timeouts)?
Do you support content safety filters and deny‑policies at runtime?
Is there a hold‑to‑operate or approval workflow for high‑impact actions (refunds, wire transfers)?
How do you authenticate/authorize browser agents and protect session state?
What incident response do you support (freeze agent, revoke secrets, replay traces)?
Which third‑party audits/compliance attestations can you share (SOC 2, ISO 27001)?

6) Reliability & SLOs

Which agent SLOs do you support out of the box (success rate, TTFT, TPOT, cost per task)?
Do you have fallback and escalation strategies when models/tools fail?
How are retries and idempotency handled for external actions?
Can we define per‑task SLOs and link alerts to traces?
What’s your strategy to reduce flakiness across releases?
Do you publish monthly reliability reports? Sample please.

7) Execution Modes: API vs Browser Agents

Do you support both API‑first and browser‑native automation? When do you recommend each?
How do you handle auth, cookies, and DOM drift for browser agents (e.g., Nova Act‑style flows)? citeturn0search2
What’s your approach to deterministic replay of browser sessions for audits?
Do you expose feature flags to toggle modes per workflow?
Can a single plan switch between API and browser steps?
How are accessibility and localization handled in browser automation?

8) Data, Privacy & Compliance

Map your controls to NIST AI RMF and its Generative AI Profile; share your crosswalk. citeturn1search0turn1search2
Do you align with ISO/IEC 42001 (AI management systems) or have plans to certify? citeturn3search0
Where do prompts, traces, and outcomes live (regions, retention)? Can we self‑host telemetry?
How do you minimize and mask PII in prompts/tools?
What model/data supply‑chain disclosures do you provide (SBOM, eval results)?
How do you handle DSRs, audit logs, and legal hold?

9) Human‑in‑the‑Loop & Escalations

Can agents request approvals with full context and a signed action plan?
Do you support seamless escalation to human agents in CRM/Helpdesk?
How are partial outcomes summarized for human review?
Is there a feedback loop to improve prompts/tools post‑escalation?
What accessibility/usability features do approvers get (mobile, Slack, email)?
Do you measure deflection and CSAT deltas vs human‑only baselines?

10) Commercials, Support & Roadmap

How is pricing structured (per task/minute/tool call/seat)? Show a five‑workflow TCO model.
What’s your migration/onboarding plan (from DIY or another platform)?
Which enterprise features are GA vs beta? Dates for A2A/MCP milestones? citeturn4search2
What SLAs and response times do you commit to?
Do you provide solution architects and red‑team support during rollout?
Can you share customer evidence for production scale (e.g., funded deployments in support)? citeturn0search1

Why these requirements now

Platforms are converging on four pillars buyers should insist on: (1) open interop (MCP/A2A), (2) registry + IAM + policy, (3) first‑class observability (OpenTelemetry traces), and (4) evals + red‑teaming. Market signals—from security startups specializing in MCP to major vendor launches—show enterprise agents are leaving the lab. citeturn4search2turn0search0turn0search3turn0search4

Related hands‑on playbooks (build or pilot before you buy)

Ship an AI Agent Registry + IAM in 7 Days (MCP, AgentKit, Agent 365, OpenTelemetry)
Build an Internal AI Agent Control Plane in 7 Days (MCP + A2A + OpenTelemetry)
Browser Agents vs APIs: When to Use Each — decide when you need Nova‑style browser control. citeturn0search2
30‑Day AI Agent Security Hardening Plan
Red‑Team Your Customer Support AI Agent in 48 Hours
Ship Agent SLOs That Matter
Ship Agent Memory That Works
24‑Hour Checkout Recovery Agent for Shopify
48‑Hour Returns & Exchanges Agent (Shopify + WhatsApp)

Scoring template (copy/paste)

Section, Weight, Vendor A, Vendor B, Vendor C
Registry & IAM, 0.25, __/18, __/18, __/18
Interop (MCP/A2A), 0.15, __/18, __/18, __/18
Observability (OTel), 0.15, __/18, __/18, __/18
Evals & Red‑team, 0.10, __/18, __/18, __/18
Security, 0.15, __/18, __/18, __/18
Reliability & SLOs, 0.15, __/18, __/18, __/18
Data & Compliance, 0.15, __/18, __/18, __/18
Total (weighted): __.__ / 3.00

Bottom line

Treat agents like production software: insist on open protocols, strong IAM, first‑class telemetry, rigorous evals, and clear SLOs. Use this checklist to force specifics—dates, schemas, dashboards—so you’re ready for a 2026 rollout with fewer surprises.

Sources

OpenAI AgentKit overview and Evals for Agents. TechCrunch. citeturn0search0
Salesforce Agentforce 360 enterprise platform. TechCrunch. citeturn0search3
Microsoft Agent 365 registry/IAM focus. Wired. citeturn0news12
Amazon Nova Act browser agent. TechCrunch. citeturn0search2
Microsoft’s synthetic “Magentic Marketplace” study. TechCrunch. citeturn0search5
NIST AI RMF 1.0 + GenAI profile. NIST; NIST. citeturn1search0turn1search2
ISO/IEC 42001 (AI management systems). ISO. citeturn3search0
Linux Foundation Agent2Agent (A2A) protocol. Linux Foundation. citeturn4search2
Early market traction: Wonderful funding in support agents; MCP security startup Runlayer. TechCrunch; TechCrunch. citeturn0search1turn0search4

Call to action: Need help running this RFP or piloting on MCP + OpenTelemetry first? Book a working session with HireNinja—ship a governed, observable agent in a week, then buy with confidence.

HireNinja: Blog

recent posts

about