exploration notebook for beginners

dataforgoodfr · Feb 1, 2025 · 620e165 · 620e165
1 parent 726806d
commit 620e165
Showing 1 changed file with 196 additions and 0 deletions.
diff --git a/notebooks/financement_asso.ipynb b/notebooks/financement_asso.ipynb
@@ -0,0 +1,196 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Notebook pour l'implémentation d'une première pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Le but de ce notebook est de faire une première pipeline qui, à partir d'un ensemble typique de documents, génère la demande de financements souhaitée."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Load documents"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Here, load the documents in python"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## (Optional in the beginning) Chunk and embedd documents"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Chunking and embedding documents is a way to implement a RAG (Retrieval Augmented Generation). \n",
+    "\n",
+    "To learn about this concept, you can check the following links :\n",
+    "\n",
+    "Here are also useful resources to implement a RAG in python using langchain :\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "!! It is important to note that while RAG is a common way to provide LLMs with context, specific methods can be used for this project. For instance, maybe that all documents have an \"information about x\" section that can be directly retrieved with regex methods to provide the model with.\n",
+    "\n",
+    "For regex methods, you can find documentation here :\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Here split the document into chunks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Here embed those chunks"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# (Optional) Here you can store those embedded chunks into a vector store"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## call a large language model via an API (e.g. Mistral API call - use free tiers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here we're gonna call a model (and pass him the context if already implemented before)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "vscode": {
+     "languageId": "plaintext"
+    }
+   },
+   "source": [
+    "Some links you can check to learn more if you don't know how it works :\n",
+    "\n",
+    "Langchain (one of the classic tools for this kind of task)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<b>To run a model locally</b>\n",
+    "\n",
+    "With Ollama :\n",
+    "\n",
+    "With huggingface : "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Here, first write your credentials for API call (don't push it on git !! Use environment variables)\n",
+    "or load the model in the notebook kernel if you want to use a model locally\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\"\"\"\n",
+    "Then, implement API calling (langchain chain + prompt engineering)\n",
+    "You can divide the whole process in several sub-questions if the model can't take enough context at once,\n",
+    "or if it does not perform well enough.\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## (Very very optional) Implement a langgraph to enhance generation performances with agentic behavior"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This step should not be necessary but once everything else is set up, you can play with it.\n",
+    "\n",
+    "Documentation : "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Langgraph implementation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}