Understanding Transformers and LLM

This isn’t just a course, it’s a complete curriculum at Stanford University CME295 Transformers and Large-Scale Language Models From fall 2025.

It’s an open course construction with no obstacles. Therefore, the video itself is an actual recording of an on-campus lecture. Lecture slides are also available. There is no homework, but there are two exams, a midterm exam and a final exam, with questions and answers provided.

Content-wise, this course will teach you the fundamentals of AI that you need to build a solid background. With a solid background, you can confidently navigate the AI environment, understand the terminology, and apply that knowledge to building AI-powered products.

It consists of 9 lectures covering everything you need to know.

transformers

Tokenization, attention, and positional embedding

Decoding, MoE, scaling law

LoRA, RLHF, fine tuning

RAG, tool calls, evaluation

RoPE, quantization and optimization tricks

More specifically, the first three lectures cover topics ranging from tokenization and embedding, Word2vec, BERT and its derivations, to prompts, learning in context, and chaining of thoughts.

Chapters 4 and 5 cover LLM training and fine-tuning. Pre-training, quantization RLHF, DPO.

The next three chapters cover LLM inference, agent LLM, and LLM evaluation. Topics: Inference models, search extension generation, function calls, LLM as a judge.

The course concludes with a look at current trends and future prospects.

After all, when should you take this class? If you already have a general understanding of linear algebra concepts, basic machine learning, and Python, and you want to understand how the Transformer architecture works and learn about ongoing LLM trends, you should take it.

That said, even if you are not interested in how the LLM works and are instead a practitioner, the 7 Agent LLM and 8 LLM Assessment lectures, which focus on these topical topics, can be viewed separately from the rest of the study material and will answer many questions such as:

Search Augmentation Generation (RAG) primarily solves which limitation of frozen LLM?

What does the Model Context Protocol (MCP) aim to standardize?

What are the main differences between a standard chatbot and an “agent”?

What is the A2A (Agent2Agent) protocol designed to facilitate?

RAG and Long Context – Why use RAG even if your model has a 1M token context window?

What is a tool invocation workflow?

What does an “LLM as a Judge” typically involve?

Considering that the inner workings of an LLM can be a very scary topic, in retrospect I found this course very easy to understand, as the concepts and terminology were explained very well, even with little experience with this course.

Detailed information

syllabus

YouTube playlist

triple treat machine learning

To stay informed about new articles in I Programmer, sign up for our weekly newsletter, subscribe to our RSS feed, and follow us on Facebook or Linkedin.