Youtube 2.0

Attention is all you need (Transformer) - Model explanation (including math), Inference and Training

557,723 views

15,902 likes

Visualizing transformers and attention | Talk for TNG Big Te

Grant Sanderson

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS

Umar Jamil

Variational Autoencoder - Model, ELBO, loss function and mat

Umar Jamil

Vision Transformer Basics

Samuel Albanie

Attention in transformers, step-by-step | Deep Learning Chap

3Blue1Brown

LoRA: Low-Rank Adaptation of Large Language Models - Explain

Umar Jamil

BERT explained: Training, Inference, BERT vs GPT/LLamA, Fin

Umar Jamil

But what is a convolution?

3Blue1Brown

The Attention Mechanism in Large Language Models

Serrano.Academy

How DeepSeek Rewrote the Transformer [MLA]

Welch Labs

How the NSA Hacked Huawei: Operation Shotgiant

Cybernews

This mechanism shrinks when pulled

Veritasium

Coding a Transformer from scratch on PyTorch, with full expl

Umar Jamil

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Mode

IBM Technology

Andrej Karpathy: Software Is Changing (Again)

Y Combinator

Illustrated Guide to Transformers Neural Network: A step by

The AI Hacker

Transformers Explained | Simple Explanation of Transformers

codebasics

The math behind Attention: Keys, Queries, and Values matrice

Serrano.Academy

Why Gravity Is A Lie, explained in Zero G

Cleo Abram

Let's build GPT: from scratch, in code, spelled out.

Andrej Karpathy