Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix header-level for proper formatting #266

Merged
merged 1 commit into from
Jan 7, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions notebooks/en/fine_tuning_vlm_trl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"id": "gSHmDKNFoqjC"
},
"source": [
"# 1. Install Dependencies\n",
"## 1. Install Dependencies\n",
"\n",
"Let’s start by installing the essential libraries we’ll need for fine-tuning! 🚀\n"
]
Expand Down Expand Up @@ -180,7 +180,7 @@
"id": "g9QXwbJ7ovM5"
},
"source": [
"# 2. Load Dataset 📁\n",
"## 2. Load Dataset 📁\n",
"\n",
"In this section, we’ll load the [HuggingFaceM4/ChartQA](https://huggingface.co/datasets/HuggingFaceM4/ChartQA) dataset. This dataset contains chart images paired with related questions and answers, making it ideal for training on visual question answering tasks.\n",
"\n",
Expand Down Expand Up @@ -388,7 +388,7 @@
"id": "YY1Y_KDtoycB"
},
"source": [
"# 3. Load Model and Check Performance! 🤔\n",
"## 3. Load Model and Check Performance! 🤔\n",
"\n",
"Now that we’ve loaded the dataset, let’s start by loading the model and evaluating its performance using a sample from the dataset. We’ll be using [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), a Vision Language Model (VLM) capable of understanding both visual data and text.\n",
"\n",
Expand Down Expand Up @@ -1165,7 +1165,7 @@
"id": "YIZOIVEzQqNg"
},
"source": [
"# 4. Fine-Tune the Model using TRL\n"
"## 4. Fine-Tune the Model using TRL\n"
]
},
{
Expand All @@ -1174,7 +1174,7 @@
"id": "yIrR9gP2z90z"
},
"source": [
"## 4.1 Load the Quantized Model for Training ⚙️\n",
"### 4.1 Load the Quantized Model for Training ⚙️\n",
"\n",
"Next, we’ll load the quantized model using [bitsandbytes](https://huggingface.co/docs/bitsandbytes/main/en/index). If you want to learn more about quantization, check out [this blog post](https://huggingface.co/blog/merve/quantization) or [this one](https://www.maartengrootendorst.com/blog/quantization/).\n"
]
Expand Down Expand Up @@ -1246,7 +1246,7 @@
"id": "65wfO29isQlX"
},
"source": [
"## 4.2 Set Up QLoRA and SFTConfig 🚀\n",
"### 4.2 Set Up QLoRA and SFTConfig 🚀\n",
"\n",
"Next, we will configure [QLoRA](https://github.com/artidoro/qlora) for our training setup. QLoRA enables efficient fine-tuning of large language models while significantly reducing the memory footprint compared to traditional methods. Unlike standard LoRA, which reduces memory usage by applying a low-rank approximation, QLoRA takes it a step further by quantizing the weights of the LoRA adapters. This leads to even lower memory requirements and improved training efficiency, making it an excellent choice for optimizing our model's performance without sacrificing quality.\n",
"\n",
Expand Down Expand Up @@ -1361,7 +1361,7 @@
"id": "pOUrD9P-y-Kf"
},
"source": [
"## 4.3 Training the Model 🏃"
"### 4.3 Training the Model 🏃"
]
},
{
Expand Down Expand Up @@ -1556,7 +1556,7 @@
"id": "6yx_sGW42dN3"
},
"source": [
"# 5. Testing the Fine-Tuned Model 🔍\n",
"## 5. Testing the Fine-Tuned Model 🔍\n",
"\n",
"Now that we've successfully fine-tuned our Vision Language Model (VLM), it's time to evaluate its performance! In this section, we will test the model using examples from the ChartQA dataset to see how well it answers questions based on chart images. Let's dive in and explore the results! 🚀\n",
"\n"
Expand Down Expand Up @@ -1993,7 +1993,7 @@
"id": "daUMWw5xxhSc"
},
"source": [
"# 6. Compare Fine-Tuned Model vs. Base Model + Prompting 📊\n",
"## 6. Compare Fine-Tuned Model vs. Base Model + Prompting 📊\n",
"\n",
"We have explored how fine-tuning the VLM can be a valuable option for adapting it to our specific needs. Another approach to consider is directly using prompting or implementing a RAG system, which is covered in another [recipe](https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_vlms).\n",
"\n",
Expand Down Expand Up @@ -2205,7 +2205,7 @@
"id": "Wgv0-sy8TLPE"
},
"source": [
"# 7. Continuing the Learning Journey 🧑‍🎓️\n",
"## 7. Continuing the Learning Journey 🧑‍🎓️\n",
"\n",
"To further enhance your understanding and skills in working with multimodal models, check out the following resources:\n",
"\n",
Expand Down