Skip to content

SergioMG97/Neural-Machine-Translation-AI-model-from-scratch---LSTM-with-attention-mechanism

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LSTM with attention - NMT from scratch

This model for Neural machine Translation(english to German) was developed from scratch during my NLP specialization. I programmed the architecture of the model and its training with Trax. The architecture is an LSTM network encoder decoder in which the attention mechanism has been implemented. This type of models, called seq2seq with attention, replaced the traditional Seq2Seq models, to avoid the loss of quality and fading of the information when trying to send all the information of variable length sequences in a fixed length memory context vector from the encoder to the decoder.

I programed the architecture of the model following the structure in the diagram below: NTM -LSTM with attention drawio

Subsequently, the model was trained, for which a dataset from https://opus.nlpl.eu/ was used, specifically a subset of medical texts containing translations from English to German were used.

In the decoding and sampling steps, functions were programmed to perform random sampling. With the generated samples, minimum Risk Bayes was implemented, in which each sample was compared with the other using the ROUGE score, in such a way that the sample with the highest ROUGE average is selected. At the end of the notebook some examples of inference with the model are carried out using temperature = 0 (greedy decoding) and temperature = 0.6.

Captura

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published