What is Attention Mechanism?

Techniques

An attention mechanism allows a model to focus on the most relevant parts of its input when producing an output, rather than treating all parts equally. It assigns different weights to different input elements, enabling the model to capture long-range dependencies efficiently.

Why It Matters

Attention mechanisms are the key innovation behind modern language models and have dramatically improved performance in translation, summarisation, and text generation.

In practice

When translating 'The cat sat on the mat' into French, the attention mechanism helps the model focus on 'cat' when generating the French word 'chat'.

Related Terms

Transformer

A transformer is a neural network architecture that processes input data in parallel using attention mechanisms, rather than sequentially like older models. Introduced in 2017, it has become the dominant architecture for language, vision, and multimodal AI systems.

Large Language Model (LLM)

A large language model is an AI system trained on vast quantities of text data that can understand, generate, and reason about human language. LLMs use the transformer architecture and contain billions of parameters, enabling them to perform a wide range of language tasks.

Embedding

An embedding is a way of representing data — such as words, sentences, or images — as a list of numbers (a vector) in a continuous space. Items that are semantically similar end up close together in this space, allowing machines to understand relationships between concepts.

Keep learning with guided projects

Our programme follows a structured Level 4 curriculum with project-based learning, practical workflows, and guided implementation across business and career use cases. Funded route available for UK citizens and ILR holders.

Check If You Qualify See Pricing Options