Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inference with mobilenet #16

Open
wants to merge 5 commits into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,577 changes: 1,577 additions & 0 deletions scenes/Mobilenet_train_50k.ipynb

Large diffs are not rendered by default.

75 changes: 74 additions & 1 deletion scenes/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,74 @@
# Scene description/ image captioning.
# Scene description/ image captioning.
Requirements:
Tensorflow 2.0

Pickle

Numpy

PIL

**Descrtiption**

This code uses subclassed models to create the model. Sequential models and Functional models are data structures that represent a DAG of layers. As such, they can be safely serialized and deserialized.

A subclassed model differs in that it's not a data structure, it's a piece of code. The architecture of the model is defined via the body of the call method. This means that the architecture of the model cannot be safely serialized. **To load a model, you'll need to have access to the code that created it (the code of the model subclass)**. Alternatively, you could be serializing this code as bytecode (e.g. via pickling), but that's unsafe and generally not portable.

For more information about these differences, see the article ["What are Symbolic and Imperative APIs in TensorFlow 2.0?"](https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021).

**Saving the model**:
First of all, a subclassed model that has never been used cannot be saved.

That's because a subclassed model needs to be called on some data in order to create its weights.

Until the model has been called, it does not know the shape and dtype of the input data it should be expecting, and thus cannot create its weight variables. You may remember that in the Functional model from the first section, the shape and dtype of the inputs was specified in advance (via keras.Input(...)) -- that's why Functional models have a state as soon as they're instantiated.

There are 3 approaches to save a subclassed model. In this repository, we used [save_weights](https://www.tensorflow.org/guide/keras/save_and_serialize) to create a TensorFlow SavedModel checkpoint, which will contain the value of all variables associated with the model:

1-The layers' weights

2-The optimizer's state

3-Any variables associated with stateful model metrics (if any)

To restore your model, you will need access to the code that created the model object.
We hae three models to Save: 1-imoagenet weights 2-NN Enoder 3-RNN deoder

Since these are subclassed Keras models and not Functional or Sequential one, so I could not use model.save and model.load directly.

Instead I had to use **model.save_weights** and **model.load_weights**.

#Saving the enoder model

model.load_weights can be called only after model.build and model.build requires input_shape parameter which has to be tuple (not list of tuples). For the NN enoder the input_shape is (49, 1280)/

#Saving the Attention weights and the RNN deoder weights

For our RNN decoder, **the input_shape annot be defined since we have multiple inputs**. Keras docs specify no way to call model.build with multiple inputs.

So to save the RNN deoder:

1- save each weight matrix in .npy files

2- re-create the subclassed models, but this time you use [Keras initializers](https://keras.io/initializers/) for each weight in each layer.

3-Instantiate the Encoder and Decoder classes as you normally would


Original Notebook: https://colab.research.google.com/drive/12YtCH2X0pwIBBXPW0TXmeA520MyVv9AF#forceEdit=true&sandboxMode=true&scrollTo=8Q44tNQVRPFt

**What’s new?**

1-saving the imagenet weights in a h5 file

2-saving the tokenizer in a pkl file and calling it at inference time

3-Using mobilenet 2 for transfer learning (The original notebook used ineption3) Sine it's speed performance is better

3-Training on 50000 images (instead of 30000) and 20 EPOcHS

**Usage**:

if you just want to test on your own images :run mobilenet_inference.py

if you want to train: run mobilenet_inference.ipynb in Google olaboratory
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added scenes/decoder_weights_50k.h5
Binary file not shown.
Binary file added scenes/encoder_weights_50k.h5
Binary file not shown.
193 changes: 193 additions & 0 deletions scenes/mobilenet_inference.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# -*- coding: utf-8 -*-
"""mobilenet_inference.ipynb

Automatically generated by Colaboratory.

Original file is located at
https://colab.research.google.com/drive/1whK7r7_iNZYfVgfzGt22ns1lpixu5cTp
"""

# Commented out IPython magic to ensure Python compatibility.
# %tensorflow_version only exists in Colab.
# %tensorflow_version 2.x

import tensorflow as tf

import numpy as np
import pickle
from PIL import Image

tf.__version__

#max_length of_train_sequences
max_length=49

# Feel free to change these parameters according to your system's configuration

BATCH_SIZE = 64
BUFFER_SIZE = 1000
embedding_dim = 256
units = 512
len_tokenizer_word_index=10253
vocab_size = len_tokenizer_word_index + 1
len_img_name_train=40000
num_steps = len_img_name_train // BATCH_SIZE
# Shape of the vector extracted from mobilenetv2 is (49, 1792)
# These two variables represent that vector shape
features_shape = 1792
attention_features_shape = 49

image_model=tf.keras.applications.MobileNetV2(include_top=False, weights='mobilenetv2_weights.hdf5' )
new_input=image_model.input
hidden_layer=image_model.layers[-1].output
image_features_extract_model=tf.keras.Model(new_input,hidden_layer)

def evaluateTest(image):
attention_plot = np.zeros((max_length, attention_features_shape))

hidden = D.reset_state(batch_size=1)

temp_input = tf.expand_dims(load_image(image)[0], 0)
img_tensor_val = image_features_extract_model(temp_input)
img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0], -1, img_tensor_val.shape[3]))

features = E(img_tensor_val)
infile = open('tokenizer.pickle','rb')
toketokenizer= pickle.load(infile)
infile.close()

dec_input = tf.expand_dims([toketokenizer.word_index["<start>"]], 0)
result = []
print("SHAPES", dec_input.shape, features.shape, hidden.shape)
for i in range(max_length):
predictions, hidden, attention_weights = D(dec_input, features, hidden)

attention_plot[i] = tf.reshape(attention_weights, (-1, )).numpy()

predicted_id = tf.argmax(predictions[0]).numpy()

result.append(toketokenizer.index_word[predicted_id])
if toketokenizer.index_word[predicted_id] == '<end>':
return result, attention_plot

dec_input = tf.expand_dims([predicted_id], 0)

attention_plot = attention_plot[:len(result), :]
return result, attention_plot


# Commented out IPython magic to ensure Python compatibility.
# %cd "drive/My Drive/safing_obilenet_10k"

class CNN_Encoder(tf.keras.Model):
# Since you have already extracted the features and dumped it using pickle
# This encoder passes those features through a Fully connected layer
def __init__(self, embedding_dim):
super(CNN_Encoder, self).__init__()
# shape after fc == (batch_size, 64, embedding_dim

self.fc = tf.keras.layers.Dense(embedding_dim)

def call(self, x):
x = self.fc(x)
x = tf.nn.relu(x)
return x

class BahdanauAttentionTest(tf.keras.Model):
def __init__(self, units):
super(BahdanauAttentionTest, self).__init__()
C = tf.keras.initializers.Constant
w1, w2, w3, w4, w5, w6 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(4, "bahdanau_attention", j)) \
for j in range(6)]
self.W1 = tf.keras.layers.Dense(units, kernel_initializer=C(w1), bias_initializer=C(w2))
self.W2 = tf.keras.layers.Dense(units, kernel_initializer=C(w3), bias_initializer=C(w4))
self.V = tf.keras.layers.Dense(1, kernel_initializer=C(w5), bias_initializer=C(w6))

def call(self, features, hidden):

hidden_with_time_axis = tf.expand_dims(hidden, 1)
score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))

# attention_weights shape == (batch_size, 64, 1)
# you get 1 at the last axis because you are applying score to self.V
attention_weights = tf.nn.softmax(self.V(score), axis=1)

# context_vector shape after sum == (batch_size, hidden_size)
context_vector = attention_weights * features
context_vector = tf.reduce_sum(context_vector, axis=1)

return context_vector, attention_weights


class RNN_DecoderTest(tf.keras.Model):
def __init__(self, embedding_dim, units, vocab_size):
super(RNN_DecoderTest, self).__init__()
self.units = units

C = tf.keras.initializers.Constant
w_emb = np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(0, "embedding", 0))
w_gru_1, w_gru_2, w_gru_3 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(1, "gru", j)) for j in range(3)]
w1, w2 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(2, "dense_1", j)) for j in range(2)]
w3, w4 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(3, "dense_2", j)) for j in range(2)]

self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim, embeddings_initializer=C(w_emb))
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
kernel_initializer=C(w_gru_1),
recurrent_initializer=C(w_gru_2),
bias_initializer=C(w_gru_3)
)
self.fc1 = tf.keras.layers.Dense(self.units, kernel_initializer=C(w1), bias_initializer=C(w2))
self.fc2 = tf.keras.layers.Dense(vocab_size, kernel_initializer=C(w3), bias_initializer=C(w4))

self.attention = BahdanauAttentionTest(self.units)

def call(self, x, features, hidden):
# defining attention as a separate model
context_vector, attention_weights = self.attention(features, hidden)

# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)

# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)

# passing the concatenated vector to the GRU
output, state = self.gru(x)

# shape == (batch_size, max_length, hidden_size)
x = self.fc1(output)

# x shape == (batch_size * max_length, hidden_size)
x = tf.reshape(x, (-1, x.shape[2]))

# output shape == (batch_size * max_length, vocab)
x = self.fc2(x)

return x, state, attention_weights

def reset_state(self, batch_size):
return tf.zeros((batch_size, self.units))

def load_image(image_path):
img = tf.io.read_file(image_path)
img = tf.image.decode_jpeg(img, channels=3)
img = tf.image.resize(img, (224, 224))
img = tf.keras.applications.mobilenet_v2.preprocess_input(img)
return img, image_path

if __name__ == '__main__':
E = CNN_Encoder(embedding_dim)
E.build(input_shape=(49,1280))
E.load_weights("encoder_weights_50k.h5")
D = RNN_DecoderTest(embedding_dim, units, vocab_size)
image_path = 'test.jpg'
result, attention_plot = evaluateTest(image_path)
print ('Prediction Caption:', ' '.join(result))
#plot_attention(image_path, result, attention_plot)
# opening the image
Image.open(image_path)



Binary file added scenes/mobilenetv2_weights.hdf5
Binary file not shown.
Binary file added scenes/street3.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/test.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/test0.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/test3.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/test4.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/test6.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/test7.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/test8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added scenes/tokenizer.pickle
Binary file not shown.
Binary file added scenes/zag.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.