Aleksei Semiglazov, Cloudflare
Running AI inference on the geographically distributed network presents a unique set of challenges that traditional cloud-centric SRE practices often fail to address. Unlike centralized data centers, edge deployments involve geographically dispersed processing units, vast model catalogs, and a complex interplay of resource constraints and network variability. A critical, yet often elusive, objective in this domain is achieving high processing unit utilization, which directly impacts both the operational cost (CapEx efficiency) and service quality (latency trade-offs). Underutilized models are a significant financial burden, while over-utilization can degrade user experience.

Aleksei Semiglazov is Senior Systems Engineer at Cloudflare. With over 15 years of experience in different areas he is currently driving large-scale deployments of real-time AI inference infrastructure across 300+ global edge locations. With deep expertise in edge orchestration, observability, and model lifecycle management, Aleksei bridges the gap between infrastructure resilience and AI performance, brings practical insights from building and operating mission-critical edge systems that prioritize both latency and cost-efficiency.

author = {Aleksei Semiglazov},
title = {Utilization Is the Key to Efficiency: What It Takes to Run Inference on the Geographically Distributed Network},
year = {2025},
address = {Dublin},
publisher = {USENIX Association},
month = oct
}
