From 62915316d8bc513d2259543d0b0b2ee320bb50af Mon Sep 17 00:00:00 2001 From: Felix <65565033+fexfl@users.noreply.github.com> Date: Sun, 12 Jan 2025 16:58:04 +0100 Subject: [PATCH] Added comments for highlighting --- notebook/demo.ipynb | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/notebook/demo.ipynb b/notebook/demo.ipynb index 7c52e5e..429e6de 100644 --- a/notebook/demo.ipynb +++ b/notebook/demo.ipynb @@ -35,7 +35,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Below, the input files are loaded from the given `input_dir` directory. You can provide relative or absolute paths to the directory that contains your `eml` or `html` files. All files of the `eml` or `htlm` file type in that directory will be considered input files." + "The cell below defines a function used to display the result in the end, and highlight all named entities found in the text. It is used for demonstration purposes in this example." ] }, { @@ -67,6 +67,13 @@ " return text" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Below, the input files are loaded from the given `input_dir` directory. You can provide relative or absolute paths to the directory that contains your `eml` or `html` files. All files of the `eml` or `html` file type in that directory will be considered input files." + ] + }, { "cell_type": "code", "execution_count": null, @@ -99,7 +106,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In the cell below, the emails are looped over and the text is extracted. The text is then split into sentences and the sentences are pseudonymized. The pseudonymized sentences are then joined back into a text and saved to a new file." + "In the cell below, the emails are looped over and the text is extracted. The text is then split into sentences and the sentences are pseudonymized. The pseudonymized sentences are then joined back into a text and saved to a new file.\n", + "\n", + "The input text is displayed and the found named entities are highlighted for demonstration. Note that emails (all words containing '@') are filtered out seperately and thus not highlighted here." ] }, {