Skip to content

Commit

Permalink
Merge pull request #266 from alvarobartt/fix-header-level
Browse files Browse the repository at this point in the history
Fix header-level for proper formatting
  • Loading branch information
stevhliu authored Jan 7, 2025
2 parents e115257 + 9c30250 commit 9c52ddf
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions notebooks/en/fine_tuning_vlm_trl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"id": "gSHmDKNFoqjC"
},
"source": [
"# 1. Install Dependencies\n",
"## 1. Install Dependencies\n",
"\n",
"Let’s start by installing the essential libraries we’ll need for fine-tuning! 🚀\n"
]
Expand Down Expand Up @@ -180,7 +180,7 @@
"id": "g9QXwbJ7ovM5"
},
"source": [
"# 2. Load Dataset 📁\n",
"## 2. Load Dataset 📁\n",
"\n",
"In this section, we’ll load the [HuggingFaceM4/ChartQA](https://huggingface.co/datasets/HuggingFaceM4/ChartQA) dataset. This dataset contains chart images paired with related questions and answers, making it ideal for training on visual question answering tasks.\n",
"\n",
Expand Down Expand Up @@ -388,7 +388,7 @@
"id": "YY1Y_KDtoycB"
},
"source": [
"# 3. Load Model and Check Performance! 🤔\n",
"## 3. Load Model and Check Performance! 🤔\n",
"\n",
"Now that we’ve loaded the dataset, let’s start by loading the model and evaluating its performance using a sample from the dataset. We’ll be using [Qwen/Qwen2-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct), a Vision Language Model (VLM) capable of understanding both visual data and text.\n",
"\n",
Expand Down Expand Up @@ -1165,7 +1165,7 @@
"id": "YIZOIVEzQqNg"
},
"source": [
"# 4. Fine-Tune the Model using TRL\n"
"## 4. Fine-Tune the Model using TRL\n"
]
},
{
Expand All @@ -1174,7 +1174,7 @@
"id": "yIrR9gP2z90z"
},
"source": [
"## 4.1 Load the Quantized Model for Training ⚙️\n",
"### 4.1 Load the Quantized Model for Training ⚙️\n",
"\n",
"Next, we’ll load the quantized model using [bitsandbytes](https://huggingface.co/docs/bitsandbytes/main/en/index). If you want to learn more about quantization, check out [this blog post](https://huggingface.co/blog/merve/quantization) or [this one](https://www.maartengrootendorst.com/blog/quantization/).\n"
]
Expand Down Expand Up @@ -1246,7 +1246,7 @@
"id": "65wfO29isQlX"
},
"source": [
"## 4.2 Set Up QLoRA and SFTConfig 🚀\n",
"### 4.2 Set Up QLoRA and SFTConfig 🚀\n",
"\n",
"Next, we will configure [QLoRA](https://github.com/artidoro/qlora) for our training setup. QLoRA enables efficient fine-tuning of large language models while significantly reducing the memory footprint compared to traditional methods. Unlike standard LoRA, which reduces memory usage by applying a low-rank approximation, QLoRA takes it a step further by quantizing the weights of the LoRA adapters. This leads to even lower memory requirements and improved training efficiency, making it an excellent choice for optimizing our model's performance without sacrificing quality.\n",
"\n",
Expand Down Expand Up @@ -1361,7 +1361,7 @@
"id": "pOUrD9P-y-Kf"
},
"source": [
"## 4.3 Training the Model 🏃"
"### 4.3 Training the Model 🏃"
]
},
{
Expand Down Expand Up @@ -1556,7 +1556,7 @@
"id": "6yx_sGW42dN3"
},
"source": [
"# 5. Testing the Fine-Tuned Model 🔍\n",
"## 5. Testing the Fine-Tuned Model 🔍\n",
"\n",
"Now that we've successfully fine-tuned our Vision Language Model (VLM), it's time to evaluate its performance! In this section, we will test the model using examples from the ChartQA dataset to see how well it answers questions based on chart images. Let's dive in and explore the results! 🚀\n",
"\n"
Expand Down Expand Up @@ -1993,7 +1993,7 @@
"id": "daUMWw5xxhSc"
},
"source": [
"# 6. Compare Fine-Tuned Model vs. Base Model + Prompting 📊\n",
"## 6. Compare Fine-Tuned Model vs. Base Model + Prompting 📊\n",
"\n",
"We have explored how fine-tuning the VLM can be a valuable option for adapting it to our specific needs. Another approach to consider is directly using prompting or implementing a RAG system, which is covered in another [recipe](https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_vlms).\n",
"\n",
Expand Down Expand Up @@ -2205,7 +2205,7 @@
"id": "Wgv0-sy8TLPE"
},
"source": [
"# 7. Continuing the Learning Journey 🧑‍🎓️\n",
"## 7. Continuing the Learning Journey 🧑‍🎓️\n",
"\n",
"To further enhance your understanding and skills in working with multimodal models, check out the following resources:\n",
"\n",
Expand Down

0 comments on commit 9c52ddf

Please sign in to comment.