LLMs | Scaling Laws | Lec 11

3.9K subscribers

643 views

About
Share

Published On Premiered Sep 4, 2024

tl;dr: This lecture provides an in-depth discussion on the scaling laws governing the performance and emergent abilities of LLMs, offering insights into both the empirical evidence and the ongoing debates about the nature of these abilities in the field of AI research.

🎓 Lecturer: Sourish Dasgupta [https://daiict.ac.in/faculty/sourish-...]
🔗 Get the Slides Here: http://lcs2.in/llm2401
📚 Suggested Readings:
Scaling Laws for Neural Language Models [https://arxiv.org/pdf/2001.08361]
Emergent Abilities of Large Language Models [https://arxiv.org/pdf/2206.07682]
Training Compute-Optimal Large Language Models [https://arxiv.org/pdf/2203.15556]
Are Emergent Abilities of Large Language Models a Mirage? [https://arxiv.org/pdf/2304.15004]

This lecture explores the fascinating concept of scaling laws in the context of Large Language Models (LLMs), focusing on how increasing model size correlates with performance improvements and the emergence of new capabilities. We'll delve into the well-known Kaplan scaling laws for predicting performance based on model size and dataset scale, the more recent Chinchilla scaling laws that propose more compute-efficient scaling, and engage in a critical discussion on whether the emergent abilities of LLMs are intrinsic or simply artefacts of larger model sizes.

Published On Premiered Sep 4, 2024

Share/Embed

Video Link