Train and deploy a simple LLM, then test it to find its weaknesses
Learning Path
This learning path is from Sebastian Raschka. I’ve been working through his book Building a Large Language Model (from Scratch). It’s excellent, but perhaps quite advanced if you’re new to these ideas. When he refers to chapters, he’s talking about the chapters in this book.
A suggestion for an effective 11-step LLM summer study plan:
- Read* Chapters 1 and 2 on implementing the data loading pipeline (https://lnkd.in/gfruEiwm & https://lnkd.in/gyDm4h3y).
- Watch Karpathy’s video on training a BPE tokenizer from scratch (https://lnkd.in/gZEsdpYc).
- Read Chapters 3 and 4 on implementing the model architecture.
- Watch Karpathy’s video on pretraining the LLM.
- Read Chapter 5 on pretraining the LLM and then loading pretrained weights.
- Read Appendix E on adding additional bells and whistles to the training loop.
- Read Chapters 6 and 7 on finetuning the LLM.
- Read Appendix E on parameter-efficient finetuning with LoRA.
- Check out Karpathy’s repo on coding the LLM in C code (https://lnkd.in/gHpt5uh9).
- Check out LitGPT to see how multi-GPU training is implemented and how different LLM architectures compare (https://lnkd.in/gzYc69c7).
- Build something cool and share it with the world.
(*“Read” = read, run the code, and attempt the exercises 😊)
Objectives
Concepts