
In our next edition of Voices from the Lab, we feature Sarat Ahmad, Research Fellow in the School of Electronic and Electrical Engineering at University of Leeds.
Sarat’s work sits at the cutting edge of generative AI, machine learning, robotics, and next-generation wireless networks, where intelligent systems meet real-world deployment. As part of the CHEDDAR Hub, Sarat focuses on bringing advanced AI models such as large language models (LLMs), vision-language models (VLMs), and retrieval-augmented generation (RAG) systems into practical environments like O-RAN and multi-access edge computing infrastructure.
From enabling robots to make real-time decisions at the network edge to exploring how AI can support self-optimising 6G systems, Sarat’s research bridges the gap between theory and application. Sarat’s work addresses one of the biggest challenges in modern communications: how to deliver intelligence where speed, privacy, and reliability matter most.
In this conversation, Sarat shares insights into deploying multimodal AI on real infrastructure, the future of autonomous networks, and why the boundary between research and engineering is where the most meaningful innovation happens.
Can you tell us about your role as a Research Software Engineer at the University of Leeds and the work you’re doing within the CHEDDAR Hub?
As a Research Software Engineer at Leeds, I bridge AI research and practical deployment for next-generation communications and robotics. Within the CHEDDAR Hub, my work focuses on enabling intelligent autonomous capabilities through edge computing and generative AI. My work so far has involved deploying LLMs, VLMs, and RAG/GraphRAG systems on real infrastructure like O-RAN and multi-access edge computing nodes. Essentially, I translate cutting-edge AI into systems that operate under real-world constraints, managing latency, resource limitations, and accuracy requirements. This means building scalable pipelines, rigorous evaluation frameworks, and demonstrating what’s practically achievable, not just theoretically possible. My role sits at the boundary between research and engineering, and I think that boundary is exactly where the most useful work happens.
Your work sits at the intersection of machine learning, generative AI, and cloud-native systems. How do these areas come together in your research?
These three areas are genuinely interdependent in the problems I work on. Machine learning provides the perceptual foundation. Generative AI extends that into systems that understand intent and reason across modalities in ways that were not possible even a few years ago. But without the infrastructure to run these models reliably under real-world constraints, they remain research artefacts. Cloud-native systems, edge deployment, containerisation, and distributed compute, are what close the gap between what a model can do in a controlled environment and what it can do on a physical robot or a live network node. My research sits at that junction deliberately, because treating these as separate disciplines produces systems that are technically impressive but practically limited.
You’re working on applications in robotics and next-generation wireless networks like O-RAN. What real-world problems is your work helping to solve?
The core problem is that intelligent systems generate data faster than centralised infrastructure can act on it. In manufacturing, healthcare, and disaster response, decisions need to happen in milliseconds. Cloud-based processing can introduce over 200 milliseconds of latency, whereas many real-world control loops demand responses closer to 10 milliseconds. That gap is the difference between a system that is operationally viable and one that is not.
My work addresses this directly. By deploying vision-language models on 5G and 6G edge infrastructure, I demonstrate that robots can perform real-time multimodal perception, understanding their environment through vision and speech, without offloading sensitive data to remote servers. This matters practically: a humanoid robot assisting in a clinical setting or navigating a disaster zone cannot depend on wide-area network connectivity, and it cannot be streaming raw camera feeds off-site. Edge intelligence solves both problems simultaneously.
You design scalable ML pipelines that integrate multimodal sensing and edge/cloud computing. What does this look like in practice?
In practice, this means engineering end-to-end systems where multimodal data from cameras, microphones, and depth sensors flows from the robot through 5G networks to edge compute nodes for real-time AI inference. For my VLM deployment on the Unitree G1, RGB video and speech are streamed via WebRTC to an NVIDIA L4 edge node running quantised LLaMA-3.2-11B under O-RAN/MEC infrastructure. API inference server handles image preprocessing, vision encoding, multimodal fusion, and language generation, returning responses within sub-second thresholds. Applying 4-bit NF4 quantisation reduced the memory footprint while preserving near-cloud accuracy, and latency profiling revealed autoregressive decoding as the dominant bottleneck, contributing over 85% of end-to-end latency. The pipeline is instrumented throughout, capturing latency breakdowns and accuracy metrics across both standardised benchmarks and robot-collected datasets. The goal is a system that is reproducible, evaluable under realistic conditions, and scalable across different infrastructure configurations. Future direction of my work involves implementing techniques like adaptive split inferencing, visual-token pruning and speculative decoding to make system-level speedups.
You’ve transitioned from industry roles into a research-focused position. How has your industry experience shaped your research approach?
My industry experience fundamentally shaped how I approach research quality. Working across software engineering taught me that real systems fail in ways simulations never reveal, and that reproducibility, monitoring, and rigorous evaluation are not optional extras. I carry that discipline directly into my research: I deploy on physical hardware, measure end-to-end performance under realistic conditions, and build pipelines that others could actually replicate and extend. The difference is that industry optimises for shipping, whereas research optimises for understanding. Having worked in both, I try to hold both standards simultaneously, which I think is what separates work that advances knowledge from work that also advances practice.
With your background in AWS and cloud-native architecture, how do you see cloud technologies shaping the future of communications?
Cloud-native technologies are enabling the disaggregation and programmability that define next-generation networks. O-RAN’s core promise of vendor interoperability and AI-driven optimisation only becomes operationally viable through containerised deployment models that allow network functions to run across centralised cloud, edge nodes, or on-device, dynamically allocated based on latency and resource requirements. The deeper transformation is treating network infrastructure as code: programmable, version-controlled, and continuously deployable, applying the same engineering discipline that modernised software development to telecommunications infrastructure itself. The future is genuinely hybrid, where cloud-native flexibility handles orchestration and management while edge intelligence meets latency-critical execution requirements. These are not separate concerns but complementary tiers of the same system, and understanding how to architect across both is where the real engineering challenge lies.
Generative AI is rapidly evolving. How do you see it transforming network optimisation and autonomous systems?
Generative AI is moving from a tool that answers questions to one that takes actions. In network optimisation, the near-term shift is from static configuration and reactive fault management toward intent-driven, self-healing networks where LLMs interpret operator goals, navigate complex specifications like O-RAN, and autonomously implement changes. RAG architectures are already making this tractable, and my own work, where I benchmarked Vector, Graph, and Hybrid RAG pipelines for O-RAN, showed that grounding models in structured knowledge meaningfully improves factual correctness, though significant challenges around reliability and reasoning depth remain. The more significant horizon now is agentic AI, where networks of specialised models collaborate autonomously, one agent diagnosing a fault, another reconfiguring resources, another verifying the outcome, without human intervention at each step. What makes this moment interesting is that the infrastructure to support it, low-latency edge compute, programmable O-RAN, is maturing in parallel with the models themselves.
What are some of the biggest challenges you face in building intelligent, resource-efficient systems?
The fundamental tension is between capability and constraint. The most capable models are computationally expensive, and the environments where you most need intelligence are precisely where resources are most limited. In my own work, deploying an 11 billion parameter vision-language model on edge hardware required aggressive quantisation just to fit within memory bounds, and even then, generation time dominated end-to-end latency.
Beyond raw compute, some challenges are harder to engineer around. Vision-language models were largely trained on clean, well-lit, internet-scale imagery, but real industrial and robotic environments are cluttered, dynamic, and visually ambiguous. Model performance degrades in ways that benchmarks do not predict. There is also the question of trust: edge-deployed models make decisions autonomously in safety-critical contexts, but they can hallucinate confidently. Knowing when a model is reliable and when it is not, and building systems that fail gracefully rather than catastrophically, remains one of the genuinely unsolved problems in this space.
CHEDDAR is highly collaborative across institutions. How does this collaboration influence your work?
CHEDDAR has genuinely expanded how I think about my work. Being part of the hub means you regularly encounter researchers tackling adjacent problems from completely different starting points, and those conversations at events and workshops often surface ideas you would not reach on your own. There is also real value in the industry partnerships: engaging directly with network operators and technology companies brings clarity about what matters in deployment that is hard to get any other way. More broadly, being embedded in a national research programme with shared momentum gives the work a sense of direction and purpose that I think makes a genuine difference to how you approach it day to day.
Looking ahead, what impact do you hope your research will have on future communication systems and society?
The work I am doing sits at the early stages of something that will matter significantly in the next decade. Autonomous systems that can perceive, reason, and act in real environments, supported by networks intelligent enough to manage themselves, represent a genuine shift in how infrastructure and robotics operate. What excites me is that the foundational problems, latency, privacy, reliability, and model efficiency, are being solved right now, and the solutions are finding their way into real deployments. I hope my research contributes to accelerating that transition, particularly in high-stakes domains like industrial automation/healthcare, where the gap between what is technically possible and what is deployed remains frustratingly wide



