Named entity removal from a bank email. I want to remove and collect name, account numbers, orgnization names #13004

vrunm · 2023-09-24T16:08:07Z

vrunm
Sep 24, 2023

I want to remove and collect:- names, account numbers, organization names and date for other task and after removing named entity, process rest of the email through NLP process. I tried using spacy ('en_core_web_sm') module, but its wrongly detecting information such as amount and account number as date, account type and sometimes greeting word such as 'Hello team' as organization.

sample of mail is :- " Hello Team, please convert account 12345678911 to business account, my contact number to reach is 909-500-6000 and address is Client academy aa-77-05-10, 12543. warm regards, vanica be denial.

Sample Code I have used:

nlp=spacy.load('en_core_web_sm)
t=nlp(text)
for i in t.ents:
    print(i.text, i.label)

Output:
Hello Team ORG, 12345678911 Date, Business Account ORG, Vanica be Name, Denial Name, 12542 Date, 6000 cardinal

I want the output to be like this:
output:- 12345678911 account number, Business Account Account type, Vanica be denial Name, Client academy aa-77-05-10,12543 address, 909-500-6000 contact number

Answered by igormorgado

Sep 24, 2023

Hello Vrunm, if you follow the Spacy101 course youwill be able to handle this case, and many others. You need to add some entity patterns, since spacy do not know anything about account numbers . Even this phone formatting isn't valid in english/american formats, you need to add these patterns too. You need then to add these patterns using rule matching OR train using some annotated data.

You can also follow the guides about linguistic features and rule-based matching if your cases do not go far from it.

View full answer

igormorgado · 2023-09-24T20:13:13Z

igormorgado
Sep 24, 2023

Hello Vrunm, if you follow the Spacy101 course youwill be able to handle this case, and many others. You need to add some entity patterns, since spacy do not know anything about account numbers . Even this phone formatting isn't valid in english/american formats, you need to add these patterns too. You need then to add these patterns using rule matching OR train using some annotated data.

You can also follow the guides about linguistic features and rule-based matching if your cases do not go far from it.

0 replies

vrunm · 2023-10-13T20:42:01Z

vrunm
Oct 13, 2023
Author

@igormorgado. I have completed pre processing part, can you suggest me which word2vec pretrained model should I try first. For bank finance data, where we are classifying different request types, such as: passbook print request, statement request, monetary transactions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Named entity removal from a bank email. I want to remove and collect name, account numbers, orgnization names #13004

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Named entity removal from a bank email. I want to remove and collect name, account numbers, orgnization names #13004

vrunm Sep 24, 2023

Replies: 2 comments

igormorgado Sep 24, 2023

vrunm Oct 13, 2023 Author

vrunm
Sep 24, 2023

igormorgado
Sep 24, 2023

vrunm
Oct 13, 2023
Author