SUSTAINABILITY SEMINAR SERIES:
SUSTAINABILITY SEMINAR SERIES:
Efficient and Scalable Agentic AI with Heterogeneous Systems
by Tom St. John
Gimlet Labs
EVENT DETAILS
February 18, 2026
15:00 - 16:00 (Eastern Time)
ABSTRACT:
AI agents are emerging as a dominant workload in a wide range of applications, promising to be the vehicle that delivers the promised benefits of AI to enterprises and consumers. Unlike conventional software or static inference, agentic workloads are dynamic and structurally complex. These agents are often directed graphs of compute and I/O operations that span multi-modal data input and conversion (e.g. speech to text), data processing and context gathering (e.g. privacy filtering, vector DB lookups), LLM inferences, tool calls, etc. To scale AI agent usage, we need efficient and scalable deployment and agent-serving infrastructure. Today, however, the vast majority of these workloads are deployed on homogeneous, high-end, single-vendor infrastructure, which can often be quite expensive and limits broad rollout.
To tackle this challenge, we present a system design for dynamic orchestration of AI agent workloads on heterogeneous compute infrastructure spanning CPUs and accelerators, both from different vendors and across different performance tiers within a single vendor. The system delivers several building blocks: a framework for planning and optimizing agentic AI execution graphs using cost models that account for compute, memory, and bandwidth constraints of different HW; an MLIR-based compilation system that can decompose AI agent execution graphs into granular operators and generate code for different HW options; and a dynamic orchestration system that can place the granular components across a heterogeneous compute infrastructure and stitch them together while meeting an end-to-end SLA. Our design thus performs a system-level TCO optimization and our results show that leveraging a heterogeneous infrastructure can deliver significant TCO benefits.
SPEAKER BIO
Tom St. John is a member of technical staff at Gimlet Labs, where he leads technical efforts related to heterogeneous disaggregation of AI workloads. Prior to his current role, he served as technical lead for MTIA training performance at Meta AI and led the distributed machine learning performance optimization efforts within Tesla Autopilot. His research primarily focuses on the intersection of parallel programming models and computer architecture design, and the impact that this has on large-scale machine learning. He completed his M.S. and Ph.D. at the University of Delaware and B.S. at Rutgers University.
SEMINAR RECORDING:.