AI and the Three Pillars of Success

Learnings across our think tank | a Chatham House Summary

This is a summary of learnings on the three pillars of success in AI in the Enterprise - Strategy, Enterprise Architecture and Organizational Design & Change. We also reflect on an example amongst our membership and take away key learnings. Across these rich discussions, these summaries often end up very dense - happy to discuss/provide color in a conversation - or a (non-temperature controlled) AI-rendered Audio Conversation is here.

1) AI & Strategy

AI should be treated as a means to enterprise outcomes rather than an end in itself. The most durable strategies make an explicit call on where value will accrue: cost, revenue, or table-stakes capability required to compete. Evidence shows meaningful but uneven gains — e.g., ~80% faster document analysis, ~1.7× higher sales conversion, 5–10% shorter call handling, 20–30% improvements in manufacturing throughput, and ~3× acceleration in design/engineering. These signals are real, yet the spread is wide; first-mover advantage is situational, while disciplined fast-followers often capture superior economics.

Two concrete use cases illustrate where returns concentrate. First, “institutional knowledge” twins that codify brand, process, and compliance logic so teams stop re-litigating the basics and can reuse approved assets. Second, “market insight” twins that simulate focus groups in days, compressing multi-week learning loops into hours and speeding go-to-market decisions. Separately, some have invested in large-scale telemetry of work (task mining at thousands-of-instance scale) to reveal the real flow of tasks, handoffs, and delays. That empirical map then becomes the substrate for agent design and prioritization—where to start, what to automate, and which handoffs to keep human.

There is substantial inconsistency across the industry in how ROI is defined, which costs are captured (build-and-run, data, change, controls), and how returns are calculated (gross productivity vs. net P&L impact). This makes cross-company comparisons and industry insights difficult and often misleading. Broadly speaking, for each unit of cost takeout, roughly one-third should fund the build-and-run of the agent layer (including integration, monitoring, guardrails, and model/runtime costs) and about two-thirds should accrue as net savings. On returns, top-line plays such as champion–challenger pricing and marketing-mix optimization already attract strong executive sponsorship.

The automation-versus-augmentation debate is a false choice. High-trust deployments typically begin with augmentation—analysis and guidance—then progress to supervised action and, only after evidence and controls, to bounded autonomy. Trust, explainability, and regulatory posture must be explicit: set accuracy thresholds, disclose limits, phase authority, and manage AI like a product with a real lifecycle—pilot, certify, scale, and monitor.

2) Enterprise Architecture in the Age of AI

Architecture has returned to the center. Without clear standards and governance, AI devolves into model sprawl: dozens of embedded models, inconsistent controls, opaque costs, and rising lock-in risk. The north star is a compact set of patterns that travel everywhere—data contracts and stewardship; agent identity, authorization, and orchestration; observability with per-agent cost telemetry; a unified security framework; and security by design. Data residency and sovereignty are first-class constraints, and SaaS/cloud concentration risk should be handled deliberately with exit plans, multi-region failover, and clear responsibilities when upstream platforms fail.

Core–edge discipline matters. Keep a fit-to-standard “clean core” for systems of record—often via brownfield modernization when the economics are superior—and differentiate at the edge, where agents, APIs, and experiences evolve quickly. Standardization in the core increases agent leverage by reducing country/process variance and the number of edge-case agents needed. Pair process mining with task mining to decide what to re-engineer versus where to place agents. Favor zero-copy data patterns to minimize movement; when movement is necessary, make lineage, access, and retention explicit. Improve reuse with an API/data product catalog so teams can discover and subscribe rather than rebuild.

Data products provide the backbone for reuse and control. A practical split is origin data products lifted from sources with defined interaction mechanisms; foundation data products that are business-defined, stewarded, and discoverable; and consumer data products that are fit-for-purpose views with an intentional shelf life. Align agent ownership to the data domain so the owner of the data also owns the agent that acts on it.

Energy and cost are architectural constraints. Treat interactions and tokens as metered resources and require a total cost of ownership per agent in design reviews, including build-and-run, data, change, controls, and monitoring. Small actions compound at scale, so sustainability and capacity stay in the foreground. Model choice is part of the architecture: smaller domain models reduce latency, cost, and carbon for bounded work, while larger models are reserved for genuinely open-ended reasoning. Standards should anticipate pricing volatility and vendor lock-in, from token price shifts to long-term platform commitments.

Operational guardrails should be staged. Begin with insights-only use, progress to supervised execution with a time-bound human-in-the-loop probation window, and grant bounded autonomy only when evidence and controls warrant it. Maintain a company-wide agent inventory that records ownership, scope, invocation rights, audit trails, lineage, cost, residency, and decommissioning rules. An architecture forum or AI control tower certifies patterns, prevents sprawl, monitors per-agent SLOs (latency, error budgets, cost envelopes, residency/sovereignty), and is empowered to say “no” or “not yet.”

3) Organizational Design for the AI Future

Operating models are shifting from distributing tools to delivering productized, full-stack value. A practical sequence is to start in shared-services domains—order-to-cash, record-to-report, service desks—where telemetry and change control are mature, and then expand into line functions with clearly bounded outcomes. When staged this way, organizations typically see one of two benefits: either material back-office reductions over a two-year horizon (on the order of 20–30% in finance/HR, with some G&A programs targeting up to ~50%), or the ability to absorb double-digit demand growth in customer operations with flat headcount (for example, call centers planning ~15% volume increases with embedded AI). The measurement lens should be end-to-end service cost (e.g., total cost to deliver order-to-cash or claims), not just the cost of an isolated agent. Make definitions explicit up front (what costs are in scope; whether impacts are gross productivity or net P&L) so CFO-visible results are comparable quarter to quarter.

Scaling depends on explicit governance and maturity staging. An agent maturity ladder—from information to pattern-matching to action—helps set expectations for risk and coverage. Practical spans look like 1:2–3 (information), ~1:7–8 (pattern-matching), and up to ~1:30 (action), subject to domain risk. Use a time-boxed certification period—on the order of six months with human-in-the-loop—before granting bounded autonomy, and encode limits in machine-readable policy (delegation of authority) so agents act only within owned domains and auto-escalate across boundaries. Seat an AI ethics board chaired by Legal with HR and Technology as standing members. Run a control tower that maintains an inventory of agents tied to business KPIs, risk posture, unit economics, audit trails, and cost to serve, and that can say “no” or “not yet” to prevent sprawl. Expect “shadow IT on steroids”; channel it with sanctioned promotion paths from sandbox to certified patterns so bottom-up creativity can scale safely. Hands-on executive build sessions—standing up a real agent end-to-end—help close the gap between ambition and delivery.

Vendor strategy is part of operating-model design. Smaller partners often move faster on agent builds; some larger providers struggle to pass through productivity in commercial terms. Where third-party development is significant, expect friction if you insist on price reductions that reflect AI-enabled productivity, and be prepared to redesign contracts and acceptance criteria to capture end-to-end gains rather than local optimizations. Since process redesign and change management land best after process consolidation, in some cases it is still worth “lifting and shifting” into shared services even if agents are coming, simply to consolidate processes and data ahead of automation, which increases agent leverage and reduces exception handling later.

Workforce transformation is already visible. Entry-level, repetitive roles are shrinking while judgment-heavy roles expand, particularly in architecture, domain decisioning, and stewardship. Mandated training and apprenticeship programs establish a common baseline at scale; several organizations have moved thousands through foundations courses and are running ongoing cohorts. AI-native tools can deliver substantial step-changes in specific lifecycle steps—reverse-engineering and documentation work that took one to two weeks can now be executed in roughly an hour—but realizing those gains end-to-end usually requires process redesign and tighter vendor integration; otherwise improvements stall in single digits. Teams often need top-down insistence to adopt new workbenches rather than treating them as optional experiments.

Demographics and talent markets add urgency. Aging technical populations and near-term retirements pressure capacity, shifting recruiting toward experienced domain specialists rather than entry-level roles. Attracting younger talent to established brands requires visible changes: modern spaces, intentional in-office collaboration rhythms, hackathons, and reframing “boring” sectors as high-impact platforms. Some organizations have refreshed employer branding, sponsored university programs, and used gamified design work (e.g., retirement-planning experiences) to make the mission tangible. Internal communications matter too: cultivate champions, create always-on AI communities, and embed expectations directly into job descriptions and performance goals.

Team topology evolves as well. Cross-functional, iterative squads that combine domain, data, engineering, controls, and change management reduce the handoffs that doom agent programs. For back-office autonomy, rewrite delegation of authority into machine-executable policies; for operations, use agent-to-human ratios and SLAs to plan spans of control and coverage. In parallel, preserve institutional memory by creating digital twins of critical roles—scripts, playbooks, heuristics—so knowledge persists as people move or exit. Finally, be clear about where AI is not yet the answer (no reliable data sources, deeply entangled integrations, or change-control constraints): pick the right pockets first, prove the economics, and expand deliberately.

4) Case Study: Transformation at a Major Financial Services Provider

The starting point was a high ~65% cost–income ratio, multi-year delivery cycles, and a transactional split between business and technology. The answer was a platform operating model with paired business-and-technology leadership (“two in a box”), each platform sized at ~300–500 people and funded on multi-year horizons. Methods and roles were simplified: job families collapsed to a tight set; programming languages narrowed to five; a single CI/CD pipeline and golden paths enforced; and waterfall project/BA roles were replaced by product squads, integrators, and journey managers. Change and run were fused—teams own what they change.

Talent moved to a skills basis through self/peer/manager assessment tied to reskilling paths and clear thresholds. Ratios were defined (e.g., architects per team), and vendor models shifted from fixed-price to time-and-materials. Job families reduced from ~35 to ~11. A location strategy brought squads together on set office days, while a captive engineering center scaled rapidly to thousands, complemented by graduate pipelines, hackathons, and a refreshed employer brand. Diversity targets were pursued explicitly.

The operating stack modernized in parallel. An API marketplace replaced “call-a-friend” integration; duplicate interfaces were governed out; self-service into master data and shared services was designed to fit within a sprint or two; and a unified security framework replaced bespoke token/identity patterns. “Clean core vs. edge” became policy: systems of record were kept fit-to-standard (accounting-engine mindset), with differentiation at the edge. End-user computing was curtailed; work moved into shared tooling (e.g., strategic planning in one system, delivery in one backlog tool). Funding conversations shifted from currency to capacity—budget requests expressed as feature-team counts, not pounds.

Delivery outcomes changed visibly: a flagship mobile product shipped in months rather than years; features released on a two-week cadence; incidents fell; big-bang releases disappeared (blue/green normalized); and user engagement scaled materially, mirrored by a step-change in external trust scores. Teams became fungible across domains because tooling, skills, and language standardized. User engagement scaled materially, mirrored by a step-change in external trust scores from ~1.7 to ~4.7.

Two governing themes anchored the next wave. First, sustainability and cost as design constraints: seemingly small interactions compound at scale, so the board asked for total cost of ownership per agent and energy-aware choices (e.g., small domain models for bounded tasks, larger models reserved for open-ended reasoning). Second, pace and ambition: annual “version” upgrades of the operating model gave way to more continuous iteration, with the retrospective that earlier waves could have aimed bigger and moved faster (within legal consultation limits).

The philosophy that endures: build core capabilities instead of rebadging others’ IP to preserve data rights and institutional competence; separate systems of record from systems of engagement to keep the core standard; measure productivity with outcome-centric metrics (e.g., release cadence, incident rates, NPS/engagement) rather than lines of code; and hold to a three-year horizon for visible transformation—course-correct if it’s not showing up in customer experience, reliability, and economics.

Executive Technology Board (c)

the AI & Pillars of Success Podcast