The 2026 AI Agent Platform RFP Checklist: 60 Questions CIOs Should Ask (MCP, A2A, AgentKit, Agentforce 360, Agent 365)

The 2026 AI Agent Platform RFP Checklist: 60 Questions CIOs Should Ask (MCP, A2A, AgentKit, Agentforce 360, Agent 365)

As of November 22, 2025, the agent platform race has gone mainstream: OpenAI launched AgentKit in October, Salesforce rolled out Agentforce 360, Microsoft introduced Agent 365, and Amazon pushed browser-native automation with Nova Act. Meanwhile, industry groups advanced open protocols like A2A and enterprises are doubling down on governance and security. citeturn0search0turn0search3turn0news12turn0search2turn4search2

That noise makes procurement hard. This vendor-agnostic RFP checklist turns the headlines into practical questions you can hand vendors on day one—and score consistently across security, interoperability, observability, cost-to-serve, and reliability. It also links to hands-on build guides if you want to prototype before you buy.

Who this is for

  • Startup founders validating a first production agent.
  • E‑commerce operators scaling support or post‑purchase automations.
  • Product, data, and platform teams standardizing on an enterprise agent stack.

Context you can cite in your RFP

Recent launches signal the features buyers should demand: agent builders and evaluations (AgentKit), enterprise orchestration (Agentforce 360), centralized registry/IAM (Agent 365), browser agents for GUI workflows (Nova Act), and open interop via A2A. Use them as reference capabilities, not vendor lock‑in. citeturn0search0turn0search3turn0news12turn0search2turn4search2

Risk teams will ask for alignment with recognized frameworks like NIST AI RMF (and its Generative AI profile) and ISO/IEC 42001 for AI management systems. Reference them explicitly in your RFP. citeturn1search0turn1search2turn3search0

Finally, expect questions about real‑world reliability: Microsoft’s recent research shows agents fail in surprising ways under competitive, messy conditions—so demand evidence, not demos. citeturn0search5

How to use this checklist

  1. Score each question 0–3 (0=no capability, 1=partial, 2=meets, 3=exceeds).
  2. Weight sections: Security 25%, Interop 15%, Observability 15%, Reliability/SLOs 15%, Data & Compliance 15%, Remaining 15%.
  3. Require a 2–3 page evidence appendix per section (screenshots, schema, dashboards).

60 RFP questions to send vendors

1) Registry, Identity & Access (least privilege)

  • Do you provide an agent registry with unique IDs, roles, and lifecycle states (draft, prod, retired)?
  • How are tool permissions scoped per agent (capability bounding, time‑boxed tokens)?
  • Do you support human operator identities and auditable approvals for high‑risk actions?
  • Can we enforce policy centrally (deny‑lists, rate limits, kill switch) across all agents?
  • What is your integration surface with SSO/SCIM and workload identity (OIDC, mTLS)?
  • How do you version and revoke agent/skill manifests?

2) Interoperability & Protocols (MCP + A2A)

  • Which open protocols do you support for tools and data (e.g., MCP tool servers)?
  • Do your agents speak Agent2Agent (A2A) for cross‑vendor collaboration? Roadmap dates? citeturn4search2
  • Can agents interoperate with Microsoft/Copilot Studio or other platforms via A2A? citeturn4search1
  • How do you encode capabilities/limits in portable agent cards or descriptors?
  • Do you provide migration guides from/to AgentKit or Agentforce 360? citeturn0search0turn0search3
  • How are cross‑org agent trust and discovery handled (allow‑lists, attestations)?

3) Observability (OpenTelemetry first)

  • Are traces/logs/metrics exported in OpenTelemetry format with GenAI/agent semantics? Links to your schema please.
  • Can we view a single, correlated trace from user intent → plan → tool calls → external systems → outcome?
  • Do you support prompt, tool, and policy span attributes; PII scrubbing; and cost-per-outcome tags?
  • Can we stream traces to our APM (Datadog, Grafana, New Relic) without vendor gateways?
  • How do you sample intelligently (error/latency/cost‑aware) to control telemetry bill?
  • Do you expose dashboards for success rate, TTFT, TPOT, and human‑handoff rate?

4) Evals, Testing & Red‑teaming

  • Do you ship task‑level evals and regression suites (Evals for Agents or equivalent)? citeturn0search0
  • How do you test for prompt injection, tool abuse, data exfiltration, and jailbreaks?
  • Can we bring custom datasets and failure libraries? Scheduled evals pre‑release?
  • Do you support canary cohorts and guardrail tests in CI/CD?
  • What’s your coverage for browser tasks vs API‑only tasks?
  • Will you share recent red‑team findings and fixes (within an NDA)?

5) Security & Safety (production posture)

  • How are tool servers isolated (network egress, allow‑lists, sandboxing, timeouts)?
  • Do you support content safety filters and deny‑policies at runtime?
  • Is there a hold‑to‑operate or approval workflow for high‑impact actions (refunds, wire transfers)?
  • How do you authenticate/authorize browser agents and protect session state?
  • What incident response do you support (freeze agent, revoke secrets, replay traces)?
  • Which third‑party audits/compliance attestations can you share (SOC 2, ISO 27001)?

6) Reliability & SLOs

  • Which agent SLOs do you support out of the box (success rate, TTFT, TPOT, cost per task)?
  • Do you have fallback and escalation strategies when models/tools fail?
  • How are retries and idempotency handled for external actions?
  • Can we define per‑task SLOs and link alerts to traces?
  • What’s your strategy to reduce flakiness across releases?
  • Do you publish monthly reliability reports? Sample please.

7) Execution Modes: API vs Browser Agents

  • Do you support both API‑first and browser‑native automation? When do you recommend each?
  • How do you handle auth, cookies, and DOM drift for browser agents (e.g., Nova Act‑style flows)? citeturn0search2
  • What’s your approach to deterministic replay of browser sessions for audits?
  • Do you expose feature flags to toggle modes per workflow?
  • Can a single plan switch between API and browser steps?
  • How are accessibility and localization handled in browser automation?

8) Data, Privacy & Compliance

  • Map your controls to NIST AI RMF and its Generative AI Profile; share your crosswalk. citeturn1search0turn1search2
  • Do you align with ISO/IEC 42001 (AI management systems) or have plans to certify? citeturn3search0
  • Where do prompts, traces, and outcomes live (regions, retention)? Can we self‑host telemetry?
  • How do you minimize and mask PII in prompts/tools?
  • What model/data supply‑chain disclosures do you provide (SBOM, eval results)?
  • How do you handle DSRs, audit logs, and legal hold?

9) Human‑in‑the‑Loop & Escalations

  • Can agents request approvals with full context and a signed action plan?
  • Do you support seamless escalation to human agents in CRM/Helpdesk?
  • How are partial outcomes summarized for human review?
  • Is there a feedback loop to improve prompts/tools post‑escalation?
  • What accessibility/usability features do approvers get (mobile, Slack, email)?
  • Do you measure deflection and CSAT deltas vs human‑only baselines?

10) Commercials, Support & Roadmap

  • How is pricing structured (per task/minute/tool call/seat)? Show a five‑workflow TCO model.
  • What’s your migration/onboarding plan (from DIY or another platform)?
  • Which enterprise features are GA vs beta? Dates for A2A/MCP milestones? citeturn4search2
  • What SLAs and response times do you commit to?
  • Do you provide solution architects and red‑team support during rollout?
  • Can you share customer evidence for production scale (e.g., funded deployments in support)? citeturn0search1

Why these requirements now

Platforms are converging on four pillars buyers should insist on: (1) open interop (MCP/A2A), (2) registry + IAM + policy, (3) first‑class observability (OpenTelemetry traces), and (4) evals + red‑teaming. Market signals—from security startups specializing in MCP to major vendor launches—show enterprise agents are leaving the lab. citeturn4search2turn0search0turn0search3turn0search4

Related hands‑on playbooks (build or pilot before you buy)

Scoring template (copy/paste)

Section, Weight, Vendor A, Vendor B, Vendor C
Registry & IAM, 0.25, __/18, __/18, __/18
Interop (MCP/A2A), 0.15, __/18, __/18, __/18
Observability (OTel), 0.15, __/18, __/18, __/18
Evals & Red‑team, 0.10, __/18, __/18, __/18
Security, 0.15, __/18, __/18, __/18
Reliability & SLOs, 0.15, __/18, __/18, __/18
Data & Compliance, 0.15, __/18, __/18, __/18
Total (weighted): __.__ / 3.00
  

Bottom line

Treat agents like production software: insist on open protocols, strong IAM, first‑class telemetry, rigorous evals, and clear SLOs. Use this checklist to force specifics—dates, schemas, dashboards—so you’re ready for a 2026 rollout with fewer surprises.

Sources

  • OpenAI AgentKit overview and Evals for Agents. TechCrunch. citeturn0search0
  • Salesforce Agentforce 360 enterprise platform. TechCrunch. citeturn0search3
  • Microsoft Agent 365 registry/IAM focus. Wired. citeturn0news12
  • Amazon Nova Act browser agent. TechCrunch. citeturn0search2
  • Microsoft’s synthetic “Magentic Marketplace” study. TechCrunch. citeturn0search5
  • NIST AI RMF 1.0 + GenAI profile. NIST; NIST. citeturn1search0turn1search2
  • ISO/IEC 42001 (AI management systems). ISO. citeturn3search0
  • Linux Foundation Agent2Agent (A2A) protocol. Linux Foundation. citeturn4search2
  • Early market traction: Wonderful funding in support agents; MCP security startup Runlayer. TechCrunch; TechCrunch. citeturn0search1turn0search4

Call to action: Need help running this RFP or piloting on MCP + OpenTelemetry first? Book a working session with HireNinja—ship a governed, observable agent in a week, then buy with confidence.

Posted in

Leave a comment