The 2026 AI Agent Platform RFP Checklist: 60 Questions CIOs Should Ask (MCP, A2A, AgentKit, Agentforce 360, Agent 365)
As of November 22, 2025, the agent platform race has gone mainstream: OpenAI launched AgentKit in October, Salesforce rolled out Agentforce 360, Microsoft introduced Agent 365, and Amazon pushed browser-native automation with Nova Act. Meanwhile, industry groups advanced open protocols like A2A and enterprises are doubling down on governance and security. citeturn0search0turn0search3turn0news12turn0search2turn4search2
That noise makes procurement hard. This vendor-agnostic RFP checklist turns the headlines into practical questions you can hand vendors on day one—and score consistently across security, interoperability, observability, cost-to-serve, and reliability. It also links to hands-on build guides if you want to prototype before you buy.
Who this is for
- Startup founders validating a first production agent.
- E‑commerce operators scaling support or post‑purchase automations.
- Product, data, and platform teams standardizing on an enterprise agent stack.
Context you can cite in your RFP
Recent launches signal the features buyers should demand: agent builders and evaluations (AgentKit), enterprise orchestration (Agentforce 360), centralized registry/IAM (Agent 365), browser agents for GUI workflows (Nova Act), and open interop via A2A. Use them as reference capabilities, not vendor lock‑in. citeturn0search0turn0search3turn0news12turn0search2turn4search2
Risk teams will ask for alignment with recognized frameworks like NIST AI RMF (and its Generative AI profile) and ISO/IEC 42001 for AI management systems. Reference them explicitly in your RFP. citeturn1search0turn1search2turn3search0
Finally, expect questions about real‑world reliability: Microsoft’s recent research shows agents fail in surprising ways under competitive, messy conditions—so demand evidence, not demos. citeturn0search5
How to use this checklist
- Score each question 0–3 (0=no capability, 1=partial, 2=meets, 3=exceeds).
- Weight sections: Security 25%, Interop 15%, Observability 15%, Reliability/SLOs 15%, Data & Compliance 15%, Remaining 15%.
- Require a 2–3 page evidence appendix per section (screenshots, schema, dashboards).
60 RFP questions to send vendors
1) Registry, Identity & Access (least privilege)
- Do you provide an agent registry with unique IDs, roles, and lifecycle states (draft, prod, retired)?
- How are tool permissions scoped per agent (capability bounding, time‑boxed tokens)?
- Do you support human operator identities and auditable approvals for high‑risk actions?
- Can we enforce policy centrally (deny‑lists, rate limits, kill switch) across all agents?
- What is your integration surface with SSO/SCIM and workload identity (OIDC, mTLS)?
- How do you version and revoke agent/skill manifests?
2) Interoperability & Protocols (MCP + A2A)
- Which open protocols do you support for tools and data (e.g., MCP tool servers)?
- Do your agents speak Agent2Agent (A2A) for cross‑vendor collaboration? Roadmap dates? citeturn4search2
- Can agents interoperate with Microsoft/Copilot Studio or other platforms via A2A? citeturn4search1
- How do you encode capabilities/limits in portable agent cards or descriptors?
- Do you provide migration guides from/to AgentKit or Agentforce 360? citeturn0search0turn0search3
- How are cross‑org agent trust and discovery handled (allow‑lists, attestations)?
3) Observability (OpenTelemetry first)
- Are traces/logs/metrics exported in OpenTelemetry format with GenAI/agent semantics? Links to your schema please.
- Can we view a single, correlated trace from user intent → plan → tool calls → external systems → outcome?
- Do you support prompt, tool, and policy span attributes; PII scrubbing; and cost-per-outcome tags?
- Can we stream traces to our APM (Datadog, Grafana, New Relic) without vendor gateways?
- How do you sample intelligently (error/latency/cost‑aware) to control telemetry bill?
- Do you expose dashboards for success rate, TTFT, TPOT, and human‑handoff rate?
4) Evals, Testing & Red‑teaming
- Do you ship task‑level evals and regression suites (Evals for Agents or equivalent)? citeturn0search0
- How do you test for prompt injection, tool abuse, data exfiltration, and jailbreaks?
- Can we bring custom datasets and failure libraries? Scheduled evals pre‑release?
- Do you support canary cohorts and guardrail tests in CI/CD?
- What’s your coverage for browser tasks vs API‑only tasks?
- Will you share recent red‑team findings and fixes (within an NDA)?
5) Security & Safety (production posture)
- How are tool servers isolated (network egress, allow‑lists, sandboxing, timeouts)?
- Do you support content safety filters and deny‑policies at runtime?
- Is there a hold‑to‑operate or approval workflow for high‑impact actions (refunds, wire transfers)?
- How do you authenticate/authorize browser agents and protect session state?
- What incident response do you support (freeze agent, revoke secrets, replay traces)?
- Which third‑party audits/compliance attestations can you share (SOC 2, ISO 27001)?
6) Reliability & SLOs
- Which agent SLOs do you support out of the box (success rate, TTFT, TPOT, cost per task)?
- Do you have fallback and escalation strategies when models/tools fail?
- How are retries and idempotency handled for external actions?
- Can we define per‑task SLOs and link alerts to traces?
- What’s your strategy to reduce flakiness across releases?
- Do you publish monthly reliability reports? Sample please.
7) Execution Modes: API vs Browser Agents
- Do you support both API‑first and browser‑native automation? When do you recommend each?
- How do you handle auth, cookies, and DOM drift for browser agents (e.g., Nova Act‑style flows)? citeturn0search2
- What’s your approach to deterministic replay of browser sessions for audits?
- Do you expose feature flags to toggle modes per workflow?
- Can a single plan switch between API and browser steps?
- How are accessibility and localization handled in browser automation?
8) Data, Privacy & Compliance
- Map your controls to NIST AI RMF and its Generative AI Profile; share your crosswalk. citeturn1search0turn1search2
- Do you align with ISO/IEC 42001 (AI management systems) or have plans to certify? citeturn3search0
- Where do prompts, traces, and outcomes live (regions, retention)? Can we self‑host telemetry?
- How do you minimize and mask PII in prompts/tools?
- What model/data supply‑chain disclosures do you provide (SBOM, eval results)?
- How do you handle DSRs, audit logs, and legal hold?
9) Human‑in‑the‑Loop & Escalations
- Can agents request approvals with full context and a signed action plan?
- Do you support seamless escalation to human agents in CRM/Helpdesk?
- How are partial outcomes summarized for human review?
- Is there a feedback loop to improve prompts/tools post‑escalation?
- What accessibility/usability features do approvers get (mobile, Slack, email)?
- Do you measure deflection and CSAT deltas vs human‑only baselines?
10) Commercials, Support & Roadmap
- How is pricing structured (per task/minute/tool call/seat)? Show a five‑workflow TCO model.
- What’s your migration/onboarding plan (from DIY or another platform)?
- Which enterprise features are GA vs beta? Dates for A2A/MCP milestones? citeturn4search2
- What SLAs and response times do you commit to?
- Do you provide solution architects and red‑team support during rollout?
- Can you share customer evidence for production scale (e.g., funded deployments in support)? citeturn0search1
Why these requirements now
Platforms are converging on four pillars buyers should insist on: (1) open interop (MCP/A2A), (2) registry + IAM + policy, (3) first‑class observability (OpenTelemetry traces), and (4) evals + red‑teaming. Market signals—from security startups specializing in MCP to major vendor launches—show enterprise agents are leaving the lab. citeturn4search2turn0search0turn0search3turn0search4
Related hands‑on playbooks (build or pilot before you buy)
- Ship an AI Agent Registry + IAM in 7 Days (MCP, AgentKit, Agent 365, OpenTelemetry)
- Build an Internal AI Agent Control Plane in 7 Days (MCP + A2A + OpenTelemetry)
- Browser Agents vs APIs: When to Use Each — decide when you need Nova‑style browser control. citeturn0search2
- 30‑Day AI Agent Security Hardening Plan
- Red‑Team Your Customer Support AI Agent in 48 Hours
- Ship Agent SLOs That Matter
- Ship Agent Memory That Works
- 24‑Hour Checkout Recovery Agent for Shopify
- 48‑Hour Returns & Exchanges Agent (Shopify + WhatsApp)
Scoring template (copy/paste)
Section, Weight, Vendor A, Vendor B, Vendor C Registry & IAM, 0.25, __/18, __/18, __/18 Interop (MCP/A2A), 0.15, __/18, __/18, __/18 Observability (OTel), 0.15, __/18, __/18, __/18 Evals & Red‑team, 0.10, __/18, __/18, __/18 Security, 0.15, __/18, __/18, __/18 Reliability & SLOs, 0.15, __/18, __/18, __/18 Data & Compliance, 0.15, __/18, __/18, __/18 Total (weighted): __.__ / 3.00
Bottom line
Treat agents like production software: insist on open protocols, strong IAM, first‑class telemetry, rigorous evals, and clear SLOs. Use this checklist to force specifics—dates, schemas, dashboards—so you’re ready for a 2026 rollout with fewer surprises.
Sources
- OpenAI AgentKit overview and Evals for Agents. TechCrunch. citeturn0search0
- Salesforce Agentforce 360 enterprise platform. TechCrunch. citeturn0search3
- Microsoft Agent 365 registry/IAM focus. Wired. citeturn0news12
- Amazon Nova Act browser agent. TechCrunch. citeturn0search2
- Microsoft’s synthetic “Magentic Marketplace” study. TechCrunch. citeturn0search5
- NIST AI RMF 1.0 + GenAI profile. NIST; NIST. citeturn1search0turn1search2
- ISO/IEC 42001 (AI management systems). ISO. citeturn3search0
- Linux Foundation Agent2Agent (A2A) protocol. Linux Foundation. citeturn4search2
- Early market traction: Wonderful funding in support agents; MCP security startup Runlayer. TechCrunch; TechCrunch. citeturn0search1turn0search4
Call to action: Need help running this RFP or piloting on MCP + OpenTelemetry first? Book a working session with HireNinja—ship a governed, observable agent in a week, then buy with confidence.

Leave a comment