Large Language Models — Teaching

Course Description

This advanced course imparts a comprehensive understanding of Large Language Models (LLMs) with a focus on their architecture, application paradigms, and ethical implications. Structured over 15 weeks, the course is tailored for students with a background in machine learning and natural language processing. It features hands-on training, in-depth analysis of scholarly papers, a midterm examination, and a final project centering on the development of a practical LLM application.

Learning Goals

Upon completing this course, students will be able to:

Understand the architectural intricacies of leading LLMs like BERT, T5, and GPT-3.
Utilize specialized techniques such as few-shot learning, prompt engineering, and in-context learning in LLMs.
Investigate and address ethical concerns including bias and data privacy.
Implement a real-world LLM application as part of the final project.
Critically evaluate peer projects through a formal review process.

Grading

Participation: 10%
Midterm Exam: 25%
Peer Reviews of Final Project: 5%
Final Project: 60%

Final Project

The final project mandates students to create a real-world application using a large language model. The project involves data pre-processing, model training/fine-tuning, evaluation, and documentation. Students will also participate in peer reviews to critically evaluate the projects of their peers. The deliverables include a functional LLM application and a research paper.

Course Outline

Week 1: Introduction to Large Language Models

Architectures: BERT, T5, GPT-3
Recommended Readings: BERT paper, T5 paper, GPT-3 paper

Week 2: Prompting Techniques

Few-Shot Learning, In-Context Learning
Readings: Making Pre-trained Language Models Better Few-shot Learners, How Many Data Points is a Prompt Worth?

Week 3: Efficient Fine-Tuning

Parameter-Efficient Techniques
Readings: Prefix-Tuning, The Power of Scale for Parameter-Efficient Prompt Tuning

Week 4: Calibration and Reasoning

Calibration Methods, Eliciting Reasoning
Readings: Calibrate Before Use, Chain of Thought Prompting

Week 5: Data in LLMs

Data Quality and Documentation
Readings: Documenting Large Webtext Corpora

Week 6: Industry Applications of LLMs

Real-world Use-cases and Challenges
Open Discussion and Guest Lecture

Week 7: Bias and Toxicity I

Evaluation of Bias and Toxicity
Readings: RealToxicityPrompts, OPT paper, Section 4

Week 8: Midterm Exam

Week 9: Bias and Toxicity II

Mitigation Strategies
Readings: Self-Diagnosis and Self-Debiasing

Week 10: Scaling LLMs

Compute-Optimal Training
Readings: Training Compute-Optimal Large Language Models

Week 11: Privacy Concerns

Data Extraction Risks
Readings: Extracting Training Data from LLMs

Week 12: Alternative Architectures

Sparse Models, Retrieval-Based Models
Readings: Switch Transformers, Improving language models by retrieving

Week 13: Human Feedback in LLMs

Training with Human Feedback
Readings: Training language models to follow instructions with human feedback

Week 14: Code in LLMs

Code-Based Language Models
Readings: Evaluating Large Language Models Trained on Code

Week 15: Final Project Presentations and Recap

Project Presentations and Peer Reviews
Course Recap and Concluding Remarks