Loading
Loading
Latency is the delay between a user's request and the system's response. In AI systems, latency includes model processing time plus network and infrastructure overhead.
Even accurate AI can feel unusable if latency is high, so teams optimise prompts, models, and infrastructure to keep responses fast.
A voice assistant target is under 300ms latency to feel conversational and avoid awkward pauses.
Inference Time
Inference time is the time an AI model takes to generate an output after receiving input. It is often measured in milliseconds for real-time systems or seconds for complex generative tasks.
Inference
Inference is the process of using a trained AI model to make predictions or generate outputs on new, unseen data. While training is about learning patterns, inference is about applying what the model has learned to real-world inputs.
API (Application Programming Interface)
An API is a set of rules and protocols that allows different software applications to communicate with each other. In AI, APIs let developers integrate AI capabilities — like text generation or image analysis — into their own applications without building models from scratch.
Our programme follows a structured Level 4 curriculum with project-based learning, practical workflows, and guided implementation across business and career use cases. Funded route available for UK citizens and ILR holders.