What is Transformer?

Tools & Models

A transformer is a neural network architecture that processes input data in parallel using attention mechanisms, rather than sequentially like older models. Introduced in 2017, it has become the dominant architecture for language, vision, and multimodal AI systems.

Why It Matters

Transformers are the architecture behind virtually every major language model today — GPT, Claude, Gemini — and understanding them is key to understanding modern AI.

In practice

Google's original 'Attention Is All You Need' paper introduced the transformer, which now powers Google Search, Gmail's Smart Compose, and Google Translate.

Related Terms

Attention Mechanism

An attention mechanism allows a model to focus on the most relevant parts of its input when producing an output, rather than treating all parts equally. It assigns different weights to different input elements, enabling the model to capture long-range dependencies efficiently.

Large Language Model (LLM)

A large language model is an AI system trained on vast quantities of text data that can understand, generate, and reason about human language. LLMs use the transformer architecture and contain billions of parameters, enabling them to perform a wide range of language tasks.

GPT (Generative Pre-trained Transformer)

GPT is a family of large language models developed by OpenAI that use the transformer architecture to generate human-like text. The models are pre-trained on vast amounts of internet text and can then be adapted for tasks such as writing, coding, and analysis.

Keep learning with guided projects

Our programme follows a structured Level 4 curriculum with project-based learning, practical workflows, and guided implementation across business and career use cases. Funded route available for UK citizens and ILR holders.

Check If You Qualify See Pricing Options