huggingface · stevhliu · Jan 7, 2025 · Jan 4, 2025 · Jan 4, 2025 · Jan 7, 2025
diff --git a/notebooks/en/fine_tuning_smol_vlm_sft_trl.ipynb b/notebooks/en/fine_tuning_smol_vlm_sft_trl.ipynb
diff --git a/notebooks/en/fine_tuning_vlm_dpo_smolvlm_instruct.ipynb b/notebooks/en/fine_tuning_vlm_dpo_smolvlm_instruct.ipynb
@@ -41,7 +41,7 @@
         "id": "R-7khk_xFuZZ"
       },
       "source": [
-        "# 1. Install Dependencies\n",
+        "## 1. Install Dependencies\n",
         "\n",
         "Let’s start by installing the essential libraries we’ll need for fine-tuning! 🚀"
       ]
@@ -232,7 +232,7 @@
         "id": "t-zGbB9OGTo6"
       },
       "source": [
-        "# 3. Fine-Tune the Model using TRL\n",
+        "## 3. Fine-Tune the Model using TRL\n",
         "\n"
       ]
     },
@@ -242,7 +242,7 @@
         "id": "irI99bhxzpVM"
       },
       "source": [
-        "## 3.1 Load the Quantized Model for Training ⚙️\n",
+        "### 3.1 Load the Quantized Model for Training ⚙️\n",
         "\n",
         "\n",
         "Let's first load a quantized version of the SmolVLM-Instruct model using bitsandbytes, and let's also load the processor. We'll use [SmolVLM-Instruct](https://huggingface.co/HuggingFaceTB/SmolVLM-Instruct)."
@@ -297,7 +297,7 @@
         "id": "AwDDBxIqGjDV"
       },
       "source": [
-        "## 3.2 Set Up QLoRA and DPOConfig 🚀\n",
+        "### 3.2 Set Up QLoRA and DPOConfig 🚀\n",
         "\n",
         "In this step, we’ll configure [QLoRA](https://github.com/artidoro/qlora) for our training setup. **QLoRA** is a powerful fine-tuning technique designed to reduce the memory footprint, making it possible to fine-tune large models efficiently, even on limited hardware.\n",
         "\n",
@@ -463,7 +463,7 @@
         "id": "n2eD3ZwHzl-U"
       },
       "source": [
-        "# 4. Testing the Fine-Tuned Model 🔍\n",
+        "## 4. Testing the Fine-Tuned Model 🔍\n",
         "\n",
         "With our Vision Language Model (VLM) fine-tuned, it’s time to evaluate its performance! In this section, we’ll test the model using examples from the [HuggingFaceH4/rlaif-v_formatted](https://huggingface.co/datasets/HuggingFaceH4/rlaif-v_formatted) dataset. Let’s dive into the results and assess how well the model aligns with the preferred responses! 🚀\n",
         "\n",
@@ -770,11 +770,7 @@
       },
       "outputs": [
         {
-          "output_type": "execute_result",
           "data": {
-            "text/plain": [
-              "<IPython.lib.display.IFrame at 0x7926fa3b35e0>"
-            ],
             "text/html": [
               "\n",
               "        <iframe\n",
@@ -786,10 +782,14 @@
               "            \n",
               "        ></iframe>\n",
               "        "
+            ],
+            "text/plain": [
+              "<IPython.lib.display.IFrame at 0x7926fa3b35e0>"
             ]
           },
+          "execution_count": 1,
           "metadata": {},
-          "execution_count": 1
+          "output_type": "execute_result"
         }
       ],
       "source": [
@@ -804,7 +804,7 @@
         "id": "Znti4_dk39av"
       },
       "source": [
-        "# 6. Continuing the Learning Journey 🧑‍🎓️\n",
+        "## 5. Continuing the Learning Journey 🧑‍🎓️\n",
         "\n",
         "Expand your knowledge of Vision Language Models and related tools with these resources:\n",
         "\n",
@@ -838,4 +838,4 @@
   },
   "nbformat": 4,
   "nbformat_minor": 0
-}
+}
diff --git a/notebooks/en/multiagent_rag_system.ipynb b/notebooks/en/multiagent_rag_system.ipynb
diff --git a/notebooks/en/search_and_learn.ipynb b/notebooks/en/search_and_learn.ipynb
@@ -36,7 +36,7 @@
         "id": "twKCzVIg71Xa"
       },
       "source": [
-        "# 1. Install Dependencies\n",
+        "## 1. Install Dependencies\n",
         "\n",
         "Let’s start by installing the [search-and-learn](https://github.com/huggingface/search-and-learn) repository! 🚀  \n",
         "This repo is designed to replicate the experimental results and is not a Python pip package. However, we can still use it to generate our system. To do so, we’ll need to install it from source with the following steps:"
@@ -130,7 +130,7 @@
         "id": "wX07zCTA8MWL"
       },
       "source": [
-        "# 2. Setup the Large Language Model (LLM) and the Process Reward Model (PRM) 💬\n",
+        "## 2. Setup the Large Language Model (LLM) and the Process Reward Model (PRM) 💬\n",
         "\n",
         "As illustrated in the diagram, the system consists of an LLM that generates intermediate answers based on user input, a [PRM model](https://huggingface.co/papers/2211.14275) that evaluates and scores these answers, and a search strategy that uses the PRM feedback to guide the subsequent steps in the search process until reaching the final answer.\n",
         "\n",
@@ -395,7 +395,7 @@
         "id": "xYtPn0_V_YRx"
       },
       "source": [
-        "## 2.1 Instantiate the Question, Search Strategy, and Call the Pipeline\n",
+        "### 2.1 Instantiate the Question, Search Strategy, and Call the Pipeline\n",
         "\n",
         "Now that we've set up the LLM and PRM, let's proceed by defining the question, selecting a search strategy to retrieve relevant information, and calling the pipeline to process the question through the models.\n",
         "\n",
@@ -470,7 +470,7 @@
         "id": "lsLHD_6C_15p"
       },
       "source": [
-        "## 2.2 Display the Final Result\n",
+        "### 2.2 Display the Final Result\n",
         "\n",
         "Once the pipeline has processed the question through the LLM and PRM, we can display the final result. This result will be the model's output after considering the intermediate answers and scoring them using the PRM.\n",
         "\n",
@@ -606,7 +606,7 @@
         "id": "4uCpYzAw_4o9"
       },
       "source": [
-        "# 3. Assembling It All! 🧑‍🏭️\n",
+        "## 3. Assembling It All! 🧑‍🏭️\n",
         "\n",
         "Now, let's create a method that encapsulates the entire pipeline. This will allow us to easily reuse the process in future applications, making it efficient and modular.\n",
         "\n",
@@ -673,7 +673,7 @@
         "id": "RWbOqkiKPVd2"
       },
       "source": [
-        "## ⏳  3.1 Comparing Thinking Time for Each Strategy\n",
+        "### ⏳  3.1 Comparing Thinking Time for Each Strategy\n",
         "\n",
         "Let’s compare the **thinking time** of three methods: `best_of_n`, `beam_search`, and `dvts`. Each method is evaluated using the same number of answers during the search process, measuring the time spent thinking in seconds and the number of generated tokens.\n",
         "\n",
@@ -694,7 +694,7 @@
         "id": "2ROJwROGX8q-"
       },
       "source": [
-        "### 1. **Best of n**\n",
+        "#### 1. **Best of n**\n",
         "\n",
         "We’ll begin by using the `best_of_n` strategy. Here’s how to track the thinking time for this method:"
       ]
@@ -779,7 +779,7 @@
         "id": "7S9AwP5lQvUN"
       },
       "source": [
-        "### 2. **Beam Search**\n",
+        "#### 2. **Beam Search**\n",
         "\n",
         "Now, let's try using the `beam_search` strategy."
       ]
@@ -886,7 +886,7 @@
         "id": "GxBBUd7HQzhd"
       },
       "source": [
-        "### 3. **Diverse Verifier Tree Search (DVTS)**\n",
+        "#### 3. **Diverse Verifier Tree Search (DVTS)**\n",
         "\n",
         "Finally, let's try the `dvts` strategy."
       ]
@@ -988,7 +988,7 @@
         "id": "5PM9HHwBSYWk"
       },
       "source": [
-        "## 🙋 3.2 Testing the System with a Simple Question\n",
+        "### 🙋 3.2 Testing the System with a Simple Question\n",
         "\n",
         "In this final example, we’ll test the system using a straightforward question to observe how it performs in simpler cases. This allows us to verify that the system works as expected even for basic queries.\n",
         "\n",
@@ -1073,7 +1073,7 @@
         "id": "92znAyJ0AOPY"
       },
       "source": [
-        "# 4. Continuing the Journey and Resources 🧑‍🎓️\n",
+        "## 4. Continuing the Journey and Resources 🧑‍🎓️\n",
         "\n",
         "If you're eager to continue exploring, be sure to check out the original experimental [blog](https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute) and all the references mentioned within it. These resources will deepen your understanding of test-time compute, its benefits, and its applications in LLMs.\n",
         "\n",