Named entity removal from a bank email. I want to remove and collect name, account numbers, orgnization names #13004
-
I want to remove and collect:- names, account numbers, organization names and date for other task and after removing named entity, process rest of the email through NLP process. I tried using spacy ('en_core_web_sm') module, but its wrongly detecting information such as amount and account number as date, account type and sometimes greeting word such as 'Hello team' as organization. sample of mail is :- " Hello Team, please convert account 12345678911 to business account, my contact number to reach is 909-500-6000 and address is Client academy aa-77-05-10, 12543. warm regards, vanica be denial. Sample Code I have used:
Output: I want the output to be like this: |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hello Vrunm, if you follow the Spacy101 course youwill be able to handle this case, and many others. You need to add some entity patterns, since spacy do not know anything about account numbers . Even this phone formatting isn't valid in english/american formats, you need to add these patterns too. You need then to add these patterns using rule matching OR train using some annotated data. You can also follow the guides about linguistic features and rule-based matching if your cases do not go far from it. |
Beta Was this translation helpful? Give feedback.
-
@igormorgado. I have completed pre processing part, can you suggest me which word2vec pretrained model should I try first. For bank finance data, where we are classifying different request types, such as: passbook print request, statement request, monetary transactions. |
Beta Was this translation helpful? Give feedback.
Hello Vrunm, if you follow the Spacy101 course youwill be able to handle this case, and many others. You need to add some entity patterns, since spacy do not know anything about account numbers . Even this phone formatting isn't valid in english/american formats, you need to add these patterns too. You need then to add these patterns using rule matching OR train using some annotated data.
You can also follow the guides about linguistic features and rule-based matching if your cases do not go far from it.