logo
Attention is all you need (Transformer) - Model explanation (including math), Inference and Training
Umar Jamil

557,723 views

15,902 likes

Visualizing transformers and attention | Talk for TNG Big Te
Grant Sanderson
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS
Umar Jamil
Variational Autoencoder - Model, ELBO, loss function and mat
Umar Jamil
Vision Transformer Basics
Samuel Albanie
Attention in transformers, step-by-step | Deep Learning Chap
3Blue1Brown
LoRA: Low-Rank Adaptation of Large Language Models - Explain
Umar Jamil
BERT explained: Training, Inference, BERT vs GPT/LLamA, Fin
Umar Jamil
But what is a convolution?
3Blue1Brown
The Attention Mechanism in Large Language Models
Serrano.Academy
How DeepSeek Rewrote the Transformer [MLA]
Welch Labs
How the NSA Hacked Huawei: Operation Shotgiant
Cybernews
This mechanism shrinks when pulled
Veritasium
Coding a Transformer from scratch on PyTorch, with full expl
Umar Jamil
RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Mode
IBM Technology
Andrej Karpathy: Software Is Changing (Again)
Y Combinator
Illustrated Guide to Transformers Neural Network: A step by
The AI Hacker
Transformers Explained | Simple Explanation of Transformers
codebasics
The math behind Attention: Keys, Queries, and Values matrice
Serrano.Academy
Why Gravity Is A Lie, explained in Zero G
Cleo Abram
Let's build GPT: from scratch, in code, spelled out.
Andrej Karpathy