Course Description
This advanced course imparts a comprehensive understanding of Large Language Models (LLMs) with a focus on their architecture, application paradigms, and ethical implications. Structured over 15 weeks, the course is tailored for students with a background in machine learning and natural language processing. It features hands-on training, in-depth analysis of scholarly papers, a midterm examination, and a final project centering on the development of a practical LLM application.
Learning Goals
Upon completing this course, students will be able to:
- Understand the architectural intricacies of leading LLMs like BERT, T5, and GPT-3.
- Utilize specialized techniques such as few-shot learning, prompt engineering, and in-context learning in LLMs.
- Investigate and address ethical concerns including bias and data privacy.
- Implement a real-world LLM application as part of the final project.
- Critically evaluate peer projects through a formal review process.
Grading
- Participation: 10%
- Midterm Exam: 25%
- Peer Reviews of Final Project: 5%
- Final Project: 60%
Final Project
The final project mandates students to create a real-world application using a large language model. The project involves data pre-processing, model training/fine-tuning, evaluation, and documentation. Students will also participate in peer reviews to critically evaluate the projects of their peers. The deliverables include a functional LLM application and a research paper.
Course Outline
Week 1: Introduction to Large Language Models
- Architectures: BERT, T5, GPT-3
- Recommended Readings: BERT paper, T5 paper, GPT-3 paper
Week 2: Prompting Techniques
- Few-Shot Learning, In-Context Learning
- Readings: Making Pre-trained Language Models Better Few-shot Learners, How Many Data Points is a Prompt Worth?
Week 3: Efficient Fine-Tuning
- Parameter-Efficient Techniques
- Readings: Prefix-Tuning, The Power of Scale for Parameter-Efficient Prompt Tuning
Week 4: Calibration and Reasoning
- Calibration Methods, Eliciting Reasoning
- Readings: Calibrate Before Use, Chain of Thought Prompting
Week 5: Data in LLMs
- Data Quality and Documentation
- Readings: Documenting Large Webtext Corpora
Week 6: Industry Applications of LLMs
- Real-world Use-cases and Challenges
- Open Discussion and Guest Lecture
Week 7: Bias and Toxicity I
- Evaluation of Bias and Toxicity
- Readings: RealToxicityPrompts, OPT paper, Section 4
Week 8: Midterm Exam
Week 9: Bias and Toxicity II
- Mitigation Strategies
- Readings: Self-Diagnosis and Self-Debiasing
Week 10: Scaling LLMs
- Compute-Optimal Training
- Readings: Training Compute-Optimal Large Language Models
Week 11: Privacy Concerns
- Data Extraction Risks
- Readings: Extracting Training Data from LLMs
Week 12: Alternative Architectures
- Sparse Models, Retrieval-Based Models
- Readings: Switch Transformers, Improving language models by retrieving
Week 13: Human Feedback in LLMs
- Training with Human Feedback
- Readings: Training language models to follow instructions with human feedback
Week 14: Code in LLMs
- Code-Based Language Models
- Readings: Evaluating Large Language Models Trained on Code
Week 15: Final Project Presentations and Recap
- Project Presentations and Peer Reviews
- Course Recap and Concluding Remarks