New course: Transformers in Practice. You'll get a practical view of how transformer-based LLMs work, so you can reason about their behavior, diagnose problems like slow inference, and make smarter decisions about deployment. This course is built in partnership with
@AMD and taught by
@realSharonZhou.
You'll see how transformers generate text one token at a time, how the model decides which earlier words matter most when predicting the next one, and how techniques like quantization speed up inference on GPUs. This is not a video-only course; interactive visualizations throughout let you play with these concepts and build intuition that sticks.
Skills you'll gain:
- Understand why LLMs hallucinate, and RAG and chain-of-thought shape what they generate
- Look inside the model to see how attention and layers combine to predict the next token
- Diagnose inference bottlenecks and learn the techniques that speed up transformers on GPUs
Join and understand what's really happening inside your LLMs: