Skip to content

Text synthesis bot that learns from a given text corpus using an LSTM-based deep neural network and produces new text given a "seed" prompt.

Notifications You must be signed in to change notification settings

naviddianati/BabbleBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BabbleBot

Text synthesis bot that learns from a given text corpus using an LSTM-based deep neural network and produces new text given a "seed" prompt.

We treat the text as one long list of tokens, break it up into <batch_size> contiguous sequences, divide this stack of long sequences into batches, and process the batches in order using a deep neural network with a stateful LSTM. The learning task is to predict the next token in each sequence given the current timestamp and the current state of the LSTM layer.

The example used is the "Sample European Parliament Proceedings Parallel Corpus" downloaded directly using the NLTK package:

    bot = babble.BabbleBot()
    
    # Load the text corpus as a single list of words
    text = load_text_corpus()
    
    # Parse the data and generate the encodings
    # required for training
    bot.parse_text(text)
    
    # Instantiate the model and train
    bot.learn(epochs=100)
    
    # Give the bot a seed string to get it babbling!
    seed = "Today"
    for i in range(20):
        print(bot.babble(seed, reset=False))

About

Text synthesis bot that learns from a given text corpus using an LSTM-based deep neural network and produces new text given a "seed" prompt.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages