What is Latency?

Fundamentals

Latency is the delay between a user's request and the system's response. In AI systems, latency includes model processing time plus network and infrastructure overhead.

Why It Matters

Even accurate AI can feel unusable if latency is high, so teams optimise prompts, models, and infrastructure to keep responses fast.

In practice

A voice assistant target is under 300ms latency to feel conversational and avoid awkward pauses.

Related Terms

Inference Time

Inference time is the time an AI model takes to generate an output after receiving input. It is often measured in milliseconds for real-time systems or seconds for complex generative tasks.

Inference

Inference is the process of using a trained AI model to make predictions or generate outputs on new, unseen data. While training is about learning patterns, inference is about applying what the model has learned to real-world inputs.

API (Application Programming Interface)

An API is a set of rules and protocols that allows different software applications to communicate with each other. In AI, APIs let developers integrate AI capabilities — like text generation or image analysis — into their own applications without building models from scratch.

Keep learning with guided projects

Our programme follows a structured level 3-6 curriculum with project-based learning, practical workflows, and guided implementation across business and career use cases. The full programme fee is £2,999 with flexible instalment plans.

Apply Now See Pricing Options