Create Training Data for Finetuning LLMs
APC Mastery Path APC Mastery Path
353 subscribers
1,845 views
89

 Published On Jul 2, 2024

🚀 Mastering LLM Fine-Tuning: From PDFs to JSONL Files🚀

Welcome to APC Mastery Path! In this comprehensive tutorial, we dive deep into the process of creating training data for fine-tuning Large Language Models (LLMs). We'll guide you through extracting text data from PDFs using the powerful `marker-pdf` Python library, cleansing the resulting markdown, and converting it into a JSONL format ready for LLMs.


đź””Agenda:
00:08 Intro
00:59 Part 1: Main concept of the solution
01:50 Part 2: Marker PDF Package Overview & Installation
05:46 Part 2-1: Single File Conversion
08:29 Part 2-2: Multiple File Conversion & Conversion to JSONL format
15:03 Part 3: Finetuning LLMs using extracted data
22:01 Outro

At APC Mastery Path, we offer bespoke mentoring and teaching packages to RICS APC candidates. Enhance your APC journey with our expert guidance and tailored support.

Don’t forget to subscribe, like, and share! Let’s embark on this LLM fine-tuning journey together! 🚀✨

đź”— General Links & Resources:
âš«Our Website: www.apcmasterypath.co.uk
âš«All APC Mastery Path Blogposts: https://www.apcmasterypath.co.uk/blog...
⚫Personal Linkedin Page:   / mohamed-ashour-0727  
⚫APC Mastery Path Linkedin Page:   / apc-mastery-path  

📽️Useful videos:
⚫Finetune your LLMs on custom datasets using Unsloth:    • Finetune Your LLM on Custom Datasets ...  
⚫Deploy Open WebUI with Zero Coding Skills :    • Unlocking Local AI: Deploy Open WebUI...  

đź“ťPrerequisites & Dependencies:
âš«Nvidia Cuda Toolkit v 12.1: https://developer.nvidia.com/cuda-12-...
âš«Windows subsystem for Linux : https://learn.microsoft.com/en-us/win...
âš«Anaconda for Linux: https://repo.anaconda.com/archive/Ana...
âš« Pytorch: https://pytorch.org/
âš«Ollama : www.ollama.com/download
âš«Docker: https://desktop.docker.com/win/main/a...
âš«Open WebUI on Github: https://github.com/open-webui/open-webui

đź“šGithub & Huggingface repositories:
âš«Unsloth available LLMs: https://huggingface.co/unsloth
âš«Marker PDF on GitHub: https://github.com/VikParuchuri/marker
âš«Unsloth GitHub Repository: https://github.com/unslothai/unsloth?...

#LLM #MachineLearning #DataScience #AI #Python #PDFConversion #JSONL #MarkerPDF #FineTuning #APCMasteryPath #RICSAPC #Mentoring #Education #TechTutorials

show more

Share/Embed