Skip to content

Commit

Permalink
suggestions added by reviewer- rahul
Browse files Browse the repository at this point in the history
  • Loading branch information
SurajBaloni committed Jan 3, 2025
1 parent 60dda12 commit 862f21e
Showing 1 changed file with 23 additions and 88 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"source": [
"As text data continues to grow rapidly, extracting meaningful insights from large amounts of information is more important than ever. Large language models (LLMs) have emerged as powerful tools for processing unstructured data, significantly enhancing the accuracy and efficiency of information extraction. One of the key tasks that can be performed using large language models is entity extraction, which involves identifying and classifying entities—such as names, organizations, locations, dates, and other specific details—within a text.\n",
"\n",
"In this sample, we will explore how information extraction works using the Mistral language model in the `EntityRecognizer` class of the arcgis.learn API with the Cheshire fire incident reports dataset. The Cheshire fire dataset typically includes incident reports detailing fire incidents in Cheshire, covering information like locations, times, types of incidents, and response actions. This data can be valuable for analysis in understanding patterns, improving response strategies, and enhancing safety measures.\n",
"In this sample, we will explore how information extraction works using the Mistral language model in the `EntityRecognizer` class of the arcgis.learn API with the Cheshire fire incident reports dataset. The Cheshire fire dataset includes incident reports detailing fire incidents in Cheshire, covering information like locations, times, types of incidents, and response actions. This data can be valuable for analysis in understanding patterns, improving response strategies, and enhancing safety measures.\n",
"\n",
"Key entities to extract from fire incident reports include:\n",
"- **Address**\n",
Expand Down Expand Up @@ -180,27 +180,6 @@
"filepath = training_data.download(file_name=training_data.name)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "34b2cf65-b18d-40c8-b0bf-83b65acb5b5a",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'C:\\\\Users\\\\sur11226\\\\AppData\\\\Local\\\\Temp\\\\information_extraction_from_cheshire_fire_incident_reports_using_mistral_language_model.zip'"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"filepath"
]
},
{
"cell_type": "code",
"execution_count": 6,
Expand All @@ -225,21 +204,10 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 1,
"id": "059f765d-07c6-4354-91dd-fe4d26f188ca",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'C:\\\\Users\\\\sur11226\\\\AppData\\\\Local\\\\Temp\\\\information_extraction_from_cheshire_fire_incident_reports_using_mistral_language_model'"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"os.path.splitext(filepath)[0]"
]
Expand Down Expand Up @@ -433,37 +401,11 @@
"source": [
"## EntityRecognizer model\n",
"\n",
"`EntityRecognizer` model in `arcgis.learn` can be used with spaCy's [EntityRecognizer](https://spacy.io/api/entityrecognizer), [Hugging Face Transformers](https://huggingface.co/transformers/v3.0.2/index.html) or with larze language model backbones. For this sample use case we will use the Mistral model backbone to extract entities from te text.\n",
"`EntityRecognizer` model in `arcgis.learn` can be used with [Hugging Face Transformers](https://huggingface.co/transformers/v3.0.2/index.html) or with large language model backbones. For this sample use case we will use the Mistral model backbone to extract entities from the text.\n",
"\n",
"Run the command below to see what backbones are supported for the **entity recognition** task."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "b8d41792-0630-422d-acbb-6a315caf0107",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('mistral',)\n"
]
}
],
"source": [
"print(EntityRecognizer.available_backbone_models(\"llm\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a186b92b-5e53-4248-954a-7c08cfa66794",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 11,
Expand Down Expand Up @@ -524,6 +466,21 @@
" )"
]
},
{
"cell_type": "markdown",
"id": "848bb3ac-ee27-4b61-a98a-e3f24be767ac",
"metadata": {},
"source": [
"The Mistral model will automatically infer the classes from the dataset. The list of inferred class names is as follows:\n",
"\n",
"- Address \n",
"- Date_and_Time \n",
"- Incident_Type \n",
"- Number_of_Engines \n",
"- Title \n",
"- Time_spent_at_incident "
]
},
{
"cell_type": "markdown",
"id": "1bea5234-52f3-43c3-9ff3-4575efd1f44b",
Expand All @@ -537,7 +494,7 @@
"id": "a519c182-9fe5-4309-9879-3b54f233f641",
"metadata": {},
"source": [
"The Mistral model utilizes in-context learning to generate predictions. Unlike traditional models that depend on lengthy training cycles, it can adapt using just a few examples and a prompt. By incorporating this information into the input, the Mistral model gains a better understanding and can make more accurate predictions without needing retraining."
"The Mistral model utilizes in-context learning to generate predictions. Unlike traditional models that depend on lengthy training cycles, it can understand the task using just a few examples and a prompt. By incorporating this information into the input, the Mistral model gains a better understanding and can make more accurate predictions without needing retraining."
]
},
{
Expand Down Expand Up @@ -983,28 +940,6 @@
"Now we can use the trained model to extract entities from new text documents using `extract_entities()` method. This method expects the folder path of where new text document are located, or a list of text documents."
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "5cc7d161-794c-441a-a8a6-fd2eca430c2e",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('C:\\\\Users\\\\sur11226\\\\AppData\\\\Local\\\\Temp\\\\information_extraction_from_cheshire_fire_incident_reports_using_mistral_language_model',\n",
" '.zip')"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"os.path.splitext(filepath)"
]
},
{
"cell_type": "code",
"execution_count": 20,
Expand Down Expand Up @@ -1251,9 +1186,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:conda-dl_10_Sept_1] *",
"display_name": "Python [conda env:conda-arcgis_13_DEC_24] *",
"language": "python",
"name": "conda-env-conda-dl_10_Sept_1-py"
"name": "conda-env-conda-arcgis_13_DEC_24-py"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -1265,7 +1200,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.9"
"version": "3.11.10"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 862f21e

Please sign in to comment.