Loading
Loading
Quantization is a model optimisation technique that reduces numerical precision of weights and activations, such as from 16-bit to 8-bit, to lower memory and compute requirements.
It enables faster and cheaper inference, especially on edge devices or cost-sensitive production systems.
A team quantizes a chatbot model to run on a mobile device without cloud inference.
Inference Time
Inference time is the time an AI model takes to generate an output after receiving input. It is often measured in milliseconds for real-time systems or seconds for complex generative tasks.
Latency
Latency is the delay between a user's request and the system's response. In AI systems, latency includes model processing time plus network and infrastructure overhead.
Model
In AI, a model is the mathematical representation that a machine learning system builds from training data. It captures the patterns, relationships, and rules discovered during training and uses them to make predictions or generate outputs on new data.
Our programme follows a structured Level 4 curriculum with project-based learning, practical workflows, and guided implementation across business and career use cases. Funded route available for UK citizens and ILR holders.