Deep dive: model merging (part 1)
Julien Simon Julien Simon
111K subscribers
27,078 views
226

 Published On Mar 18, 2024

*** Part 2 is now available at    • Deep dive: model merging, part 2   : Model Breadcrumbs, Model Stock, DELLA

Model merging is an increasingly popular technique that makes it possible to add or remove capabilities to transformer models, without the need for any additional training.

In this video, we first introduce what model merging is. Then, we discuss different merging algorithms implemented in the mergekit library (https://github.com/arcee-ai): model soups, SLERP, Task Arithmetic, TIES, DARE, and Franken-merging.

Slides: https://fr.slideshare.net/slideshow/j...

⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. Follow me on Medium at   / julsimon   or Substack at https://julsimon.substack.com. ⭐️⭐️⭐️

00:00 Introduction
01:16 What is model merging?
07:10 Model soups
14:00 Spherical Linear Interpolation (SLERP)
20:35 Task Arithmetic
27:15 Trim, Extract Sign and Merge (TIES)
36:20 Drop and Rescale (DARE)
43:40 Franken-merging

show more

Share/Embed