Skip to content

pradeepjung45/words-embedding-techniques-using-keras

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

                               Word Embedding Techniques with Keras

This project demonstrates how to implement word embedding techniques using the Keras library in TensorFlow. The model uses an embedding layer to convert text data into dense vectors of fixed size, which can be used for natural language processing tasks.

                              Project Overview

The objective of this project is to:

Convert sentences into one-hot encoded vectors using Keras. Apply padding to ensure uniform sequence length. Use Keras' Embedding layer to generate word embeddings for each word in the sentences. Predict and output the dense vectors for each sentence using the trained embedding model.

                              Dependencies

Make sure you have the following dependencies installed in your environment:

Python 3.x TensorFlow 2.x (2.15.0 or later) Keras (Included in TensorFlow) NumPy You can install these dependencies by running the following commands:

bash Copy code pip install tensorflow numpy Dataset The dataset consists of a few simple sentences:

markdown Copy code

  1. the glass of milk

  2. the glass of juice

  3. the cup of tea

  4. I am a good boy

  5. I am a good developer

  6. understand the meaning of words

  7. your videos are good These sentences are processed and converted into one-hot encoded representations.

                                   Project Structure
    

Word Embedding Creation: The text data is first preprocessed using one-hot encoding. The Keras embedding layer is then used to transform these one-hot encoded vectors into dense word embeddings.

Model Definition: The model is defined using Keras' Sequential API. The embedding layer converts the input sequences into word embeddings of a specified dimension (dim=10 in this case).

Padding: Since the input sequences are of varying lengths, they are padded using pad_sequences() to ensure uniform input shape to the embedding layer.

Code Usage

  1. Clone the repository or download the project files. bash Copy code git clone https://github.com/your-username/word-embedding-keras.git cd word-embedding-keras
  2. Install the necessary dependencies. bash Copy code pip install tensorflow numpy
  3. Run the word_embedding.py script. You can run the code as follows to see the word embeddings generated for your sentences:

bash Copy code python word_embedding.py 4. Understanding the Output The model generates a dense vector representation (embedding) for each word in the sentence. Here is an example of a predicted embedding vector:

lua Copy code [[[ 0.04343951 -0.04441161 -0.00095301 ... ]] Each word is represented as a dense vector of size 10, which is the dimensionality specified for the embedding.

Model Summary The model consists of an embedding layer:

markdown Copy code Model: "sequential"


Layer (type) Output Shape Param #

embedding (Embedding) (None, 8, 10) 100000

Voc_size: 10000 Vocabulary size of 10,000 unique tokens.

Dim: 10 Embedding vector size for each word.

Input_length: 8 Maximum sentence length after padding.

                        Future Improvements

Expand the dataset to include more complex sentences. Experiment with different embedding dimensions. Use pre-trained word embeddings such as Word2Vec, GloVe, or FastText. Integrate the embedding layer with a deep learning model (e.g., LSTM, GRU) for sequence modeling. License This project is licensed under the MIT License. See the LICENSE file for more information.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published