canesee-project · Rawan19 · Mar 17, 2020 · Mar 17, 2020 · Mar 17, 2020 · Mar 17, 2020
diff --git a/scenes/Mobilenet_train_50k.ipynb b/scenes/Mobilenet_train_50k.ipynb
diff --git a/scenes/README.md b/scenes/README.md
@@ -1 +1,74 @@
-# Scene description/ image captioning.
+# Scene description/ image captioning.
+Requirements:
+Tensorflow 2.0
+
+Pickle
+
+Numpy
+
+PIL 
+
+**Descrtiption**
+
+This code uses subclassed models to create the model. Sequential models and Functional models are data structures that represent a DAG of layers. As such, they can be safely serialized and deserialized.
+
+A subclassed model differs in that it's not a data structure, it's a piece of code. The architecture of the model is defined via the body of the call method. This means that the architecture of the model cannot be safely serialized. **To load a model, you'll need to have access to the code that created it (the code of the model subclass)**. Alternatively, you could be serializing this code as bytecode (e.g. via pickling), but that's unsafe and generally not portable.
+
+For more information about these differences, see the article ["What are Symbolic and Imperative APIs in TensorFlow 2.0?"](https://medium.com/tensorflow/what-are-symbolic-and-imperative-apis-in-tensorflow-2-0-dfccecb01021).
+
+**Saving the model**:
+First of all, a subclassed model that has never been used cannot be saved.
+
+That's because a subclassed model needs to be called on some data in order to create its weights.
+
+Until the model has been called, it does not know the shape and dtype of the input data it should be expecting, and thus cannot create its weight variables. You may remember that in the Functional model from the first section, the shape and dtype of the inputs was specified in advance (via keras.Input(...)) -- that's why Functional models have a state as soon as they're instantiated.
+
+There are 3 approaches to save a subclassed model. In this repository, we used  [save_weights](https://www.tensorflow.org/guide/keras/save_and_serialize)  to create a TensorFlow SavedModel checkpoint, which will contain the value of all variables associated with the model:
+
+1-The layers' weights
+
+2-The optimizer's state
+
+3-Any variables associated with stateful model metrics (if any)
+
+To restore your model, you will need access to the code that created the model object.
+We hae three models to Save: 1-imoagenet weights 2-NN Enoder 3-RNN deoder
+
+Since these are subclassed Keras models and not Functional or Sequential one, so I could not use model.save and model.load directly.
+
+Instead I had to use **model.save_weights** and **model.load_weights**. 
+
+#Saving the enoder model
+
+model.load_weights can be called only after model.build and model.build requires input_shape parameter which has to be tuple (not list of tuples). For the NN enoder the input_shape is (49,  1280)/ 
+
+#Saving the Attention weights and the RNN deoder weights
+
+For our RNN decoder, **the input_shape annot be defined since we have multiple inputs**. Keras docs specify no way to call model.build with multiple inputs.
+
+So to save the RNN deoder:
+
+1- save each weight matrix in .npy files
+
+2- re-create the subclassed models, but this time you use [Keras initializers](https://keras.io/initializers/) for each weight in each layer. 
+
+3-Instantiate the Encoder and Decoder classes as you normally would
+
+
+Original Notebook: https://colab.research.google.com/drive/12YtCH2X0pwIBBXPW0TXmeA520MyVv9AF#forceEdit=true&sandboxMode=true&scrollTo=8Q44tNQVRPFt
+
+**What’s new?**
+
+1-saving the imagenet weights in a h5 file
+
+2-saving the tokenizer in a pkl file and calling it at inference time
+
+3-Using mobilenet 2 for transfer learning (The original notebook used ineption3) Sine it's speed performance is better
+
+3-Training on 50000 images (instead of 30000) and 20 EPOcHS
+
+**Usage**:
+
+if you just want to test on your own images :run mobilenet_inference.py 
+
+if you want to train: run mobilenet_inference.ipynb in Google olaboratory
diff --git a/scenes/decoder_layer_weights/layer_0_embedding_weights_0.npy b/scenes/decoder_layer_weights/layer_0_embedding_weights_0.npy
diff --git a/scenes/decoder_layer_weights/layer_1_gru_weights_0.npy b/scenes/decoder_layer_weights/layer_1_gru_weights_0.npy
diff --git a/scenes/decoder_layer_weights/layer_1_gru_weights_1.npy b/scenes/decoder_layer_weights/layer_1_gru_weights_1.npy
diff --git a/scenes/decoder_layer_weights/layer_1_gru_weights_2.npy b/scenes/decoder_layer_weights/layer_1_gru_weights_2.npy
diff --git a/scenes/decoder_layer_weights/layer_2_dense_1_weights_0.npy b/scenes/decoder_layer_weights/layer_2_dense_1_weights_0.npy
diff --git a/scenes/decoder_layer_weights/layer_2_dense_1_weights_1.npy b/scenes/decoder_layer_weights/layer_2_dense_1_weights_1.npy
diff --git a/scenes/decoder_layer_weights/layer_3_dense_2_weights_0.npy b/scenes/decoder_layer_weights/layer_3_dense_2_weights_0.npy
diff --git a/scenes/decoder_layer_weights/layer_3_dense_2_weights_1.npy b/scenes/decoder_layer_weights/layer_3_dense_2_weights_1.npy
diff --git a/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_0.npy b/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_0.npy
diff --git a/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_1.npy b/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_1.npy
diff --git a/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_2.npy b/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_2.npy
diff --git a/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_3.npy b/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_3.npy
diff --git a/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_4.npy b/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_4.npy
diff --git a/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_5.npy b/scenes/decoder_layer_weights/layer_4_bahdanau_attention_weights_5.npy
diff --git a/scenes/decoder_weights_50k.h5 b/scenes/decoder_weights_50k.h5
diff --git a/scenes/encoder_weights_50k.h5 b/scenes/encoder_weights_50k.h5
diff --git a/scenes/mobilenet_inference.py b/scenes/mobilenet_inference.py
@@ -0,0 +1,193 @@
+# -*- coding: utf-8 -*-
+"""mobilenet_inference.ipynb
+
+Automatically generated by Colaboratory.
+
+Original file is located at
+    https://colab.research.google.com/drive/1whK7r7_iNZYfVgfzGt22ns1lpixu5cTp
+"""
+
+# Commented out IPython magic to ensure Python compatibility.
+  # %tensorflow_version only exists in Colab.
+#   %tensorflow_version 2.x
+
+import tensorflow as tf
+
+import numpy as np
+import pickle
+from PIL import Image
+
+tf.__version__
+
+#max_length of_train_sequences
+max_length=49
+
+# Feel free to change these parameters according to your system's configuration
+
+BATCH_SIZE = 64
+BUFFER_SIZE = 1000
+embedding_dim = 256
+units = 512
+len_tokenizer_word_index=10253
+vocab_size = len_tokenizer_word_index + 1
+len_img_name_train=40000
+num_steps = len_img_name_train // BATCH_SIZE
+# Shape of the vector extracted from mobilenetv2 is (49, 1792)
+# These two variables represent that vector shape
+features_shape = 1792
+attention_features_shape = 49
+
+image_model=tf.keras.applications.MobileNetV2(include_top=False, weights='mobilenetv2_weights.hdf5' )
+new_input=image_model.input
+hidden_layer=image_model.layers[-1].output
+image_features_extract_model=tf.keras.Model(new_input,hidden_layer)
+
+def evaluateTest(image):
+    attention_plot = np.zeros((max_length, attention_features_shape))
+
+    hidden = D.reset_state(batch_size=1)
+
+    temp_input = tf.expand_dims(load_image(image)[0], 0)
+    img_tensor_val = image_features_extract_model(temp_input)
+    img_tensor_val = tf.reshape(img_tensor_val, (img_tensor_val.shape[0], -1, img_tensor_val.shape[3]))
+
+    features = E(img_tensor_val)
+    infile = open('tokenizer.pickle','rb')
+    toketokenizer= pickle.load(infile)
+    infile.close()
+
+    dec_input = tf.expand_dims([toketokenizer.word_index["<start>"]], 0)
+    result = []
+    print("SHAPES", dec_input.shape, features.shape, hidden.shape)
+    for i in range(max_length):
+        predictions, hidden, attention_weights = D(dec_input, features, hidden)
+
+        attention_plot[i] = tf.reshape(attention_weights, (-1, )).numpy()
+
+        predicted_id = tf.argmax(predictions[0]).numpy()
+
+        result.append(toketokenizer.index_word[predicted_id])
+        if toketokenizer.index_word[predicted_id] == '<end>':
+            return result, attention_plot
+
+        dec_input = tf.expand_dims([predicted_id], 0)
+
+    attention_plot = attention_plot[:len(result), :]
+    return result, attention_plot
+
+
+# Commented out IPython magic to ensure Python compatibility.
+# %cd "drive/My Drive/safing_obilenet_10k"
+
+class CNN_Encoder(tf.keras.Model):
+    # Since you have already extracted the features and dumped it using pickle
+    # This encoder passes those features through a Fully connected layer
+    def __init__(self, embedding_dim):
+        super(CNN_Encoder, self).__init__()
+        # shape after fc == (batch_size, 64, embedding_dim
+
+        self.fc = tf.keras.layers.Dense(embedding_dim)
+
+    def call(self, x):
+        x = self.fc(x)
+        x = tf.nn.relu(x)
+        return x
+
+class BahdanauAttentionTest(tf.keras.Model):
+  def __init__(self, units):
+    super(BahdanauAttentionTest, self).__init__()
+    C = tf.keras.initializers.Constant
+    w1, w2, w3, w4, w5, w6 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(4, "bahdanau_attention", j)) \
+                                  for j in range(6)]
+    self.W1 = tf.keras.layers.Dense(units, kernel_initializer=C(w1), bias_initializer=C(w2))
+    self.W2 = tf.keras.layers.Dense(units, kernel_initializer=C(w3), bias_initializer=C(w4))
+    self.V = tf.keras.layers.Dense(1, kernel_initializer=C(w5), bias_initializer=C(w6))
+
+  def call(self, features, hidden):
+
+    hidden_with_time_axis = tf.expand_dims(hidden, 1)
+    score = tf.nn.tanh(self.W1(features) + self.W2(hidden_with_time_axis))
+
+    # attention_weights shape == (batch_size, 64, 1)
+    # you get 1 at the last axis because you are applying score to self.V
+    attention_weights = tf.nn.softmax(self.V(score), axis=1)
+
+    # context_vector shape after sum == (batch_size, hidden_size)
+    context_vector = attention_weights * features
+    context_vector = tf.reduce_sum(context_vector, axis=1)
+
+    return context_vector, attention_weights
+
+
+class RNN_DecoderTest(tf.keras.Model):
+  def __init__(self, embedding_dim, units, vocab_size):
+    super(RNN_DecoderTest, self).__init__()
+    self.units = units
+
+    C = tf.keras.initializers.Constant
+    w_emb = np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(0, "embedding", 0))
+    w_gru_1, w_gru_2, w_gru_3 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(1, "gru", j)) for j in range(3)]
+    w1, w2 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(2, "dense_1", j)) for j in range(2)]
+    w3, w4 = [np.load("decoder_layer_weights/layer_%s_%s_weights_%s.npy" %(3, "dense_2", j)) for j in range(2)]
+
+    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim, embeddings_initializer=C(w_emb))
+    self.gru = tf.keras.layers.GRU(self.units,
+                                   return_sequences=True,
+                                   return_state=True,
+                                   kernel_initializer=C(w_gru_1),
+                                   recurrent_initializer=C(w_gru_2),
+                                   bias_initializer=C(w_gru_3)
+                                   )
+    self.fc1 = tf.keras.layers.Dense(self.units, kernel_initializer=C(w1), bias_initializer=C(w2))
+    self.fc2 = tf.keras.layers.Dense(vocab_size, kernel_initializer=C(w3), bias_initializer=C(w4))
+
+    self.attention = BahdanauAttentionTest(self.units)
+
+  def call(self, x, features, hidden):
+    # defining attention as a separate model
+    context_vector, attention_weights = self.attention(features, hidden)
+
+    # x shape after passing through embedding == (batch_size, 1, embedding_dim)
+    x = self.embedding(x)
+
+    # x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
+    x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
+
+    # passing the concatenated vector to the GRU
+    output, state = self.gru(x)
+
+    # shape == (batch_size, max_length, hidden_size)
+    x = self.fc1(output)
+
+    # x shape == (batch_size * max_length, hidden_size)
+    x = tf.reshape(x, (-1, x.shape[2]))
+
+    # output shape == (batch_size * max_length, vocab)
+    x = self.fc2(x)
+
+    return x, state, attention_weights
+
+  def reset_state(self, batch_size):
+    return tf.zeros((batch_size, self.units))
+
+def load_image(image_path):
+    img = tf.io.read_file(image_path)
+    img = tf.image.decode_jpeg(img, channels=3)
+    img = tf.image.resize(img, (224, 224))
+    img = tf.keras.applications.mobilenet_v2.preprocess_input(img) 
+    return img, image_path
+
+if __name__ == '__main__':
+  E = CNN_Encoder(embedding_dim)
+  E.build(input_shape=(49,1280))
+  E.load_weights("encoder_weights_50k.h5")
+  D = RNN_DecoderTest(embedding_dim, units, vocab_size)
+  image_path = 'test.jpg'
+  result, attention_plot = evaluateTest(image_path)
+  print ('Prediction Caption:', ' '.join(result))
+#plot_attention(image_path, result, attention_plot)
+# opening the image
+  Image.open(image_path)
+
+
+
diff --git a/scenes/mobilenetv2_weights.hdf5 b/scenes/mobilenetv2_weights.hdf5
diff --git a/scenes/street3.jpg b/scenes/street3.jpg
diff --git a/scenes/test.jpg b/scenes/test.jpg
diff --git a/scenes/test0.jpg b/scenes/test0.jpg
diff --git a/scenes/test3.jpg b/scenes/test3.jpg
diff --git a/scenes/test4.jpg b/scenes/test4.jpg
diff --git a/scenes/test6.jpg b/scenes/test6.jpg
diff --git a/scenes/test7.jpg b/scenes/test7.jpg
diff --git a/scenes/test8.png b/scenes/test8.png
diff --git a/scenes/tokenizer.pickle b/scenes/tokenizer.pickle
diff --git a/scenes/zag.jpg b/scenes/zag.jpg