From 1b7d70b3cff7b2d954450f72ec01bd3d948ec597 Mon Sep 17 00:00:00 2001
From: Gideon Dresdner <gideond@gmail.com>
Date: Mon, 16 Dec 2019 13:47:25 +0100
Subject: [PATCH 1/2] small notation / latex changes

---
 boosting_bbvi_tutorial.ipynb | 46 +++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 16 deletions(-)
diff --git a/boosting_bbvi_tutorial.ipynb b/boosting_bbvi_tutorial.ipynb
index a6fcaa5..6f2682a 100644
--- a/boosting_bbvi_tutorial.ipynb
+++ b/boosting_bbvi_tutorial.ipynb
@@ -31,26 +31,26 @@
     "\n",
     "Briefly, Variational Inference allows us to find approximations of probability densities which are intractable to compute analytically. For instance, one might have observed variables $\\textbf{x}$, latent variables $\\textbf{z}$ and a joint distribution $p(\\textbf{x}, \\textbf{z})$. One can then use Variational Inference to approximate $p(\\textbf{z}|\\textbf{x})$. To do so, one first chooses a set of tractable densities, a variational family, and then tries to find the element of this set which most closely approximates the target distribution $p(\\textbf{z}|\\textbf{x})$.\n",
     "This approximating density is found by maximizing the Evidence Lower BOund (ELBO):\n",
-    "$$ \\mathbb{E}_q[\\text{log} p(\\mathbf{x}, \\mathbf{z})] - \\mathbb{E}_q[\\text{log} q(\\mathbf{z})]$$\n",
+    "$$ \\mathbb{E}_q[\\log p(\\mathbf{x}, \\mathbf{z})] - \\mathbb{E}_q[\\log q(\\mathbf{z})]$$\n",
     "\n",
     "where $s(\\mathbf{z})$ is the approximating density.\n",
     "\n",
     "### Boosting Black Box Variational Inference <a class=\"anchor\" id=\"bbbvi\"></a>\n",
     "\n",
     "In boosting black box Variational inference (BBBVI), we approximate the target density with a mixture of densities from the variational family:\n",
-    "$$q^T(\\mathbb{z}) = \\sum_{t=1}^T \\gamma_t s_t(\\mathbf{z})$$\n",
+    "$$q^t(\\mathbf{z}) = \\sum_{i=1}^t \\gamma_i s_i(\\mathbf{z})$$\n",
     "\n",
-    "$$\\text{where} \\sum_{t=1}^T \\gamma_t =1$$\n",
+    "$$\\text{where} \\sum_{i=1}^t \\gamma_i =1$$\n",
     "\n",
     "and $s_t(\\mathbf{z})$ are elements of the variational family.\n",
     "\n",
     "The components of the approximation are selected greedily by maximising the so-called Residual ELBO (RELBO) with respect to the next component $s_{t+1}(\\mathbf{z})$:\n",
     "\n",
-    "$$\\mathbb{E}_s[\\text{log} p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\text{log}s(\\mathbf{z})] - \\mathbb{E}_s[\\text{log} q^t(\\mathbf{z})]$$\n",
+    "$$\\mathbb{E}_s[\\log p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})] - \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$$\n",
     "\n",
     "Where the first two terms are the same as in the ELBO and the last term is the cross entropy between the next component $s_{t+1}(\\mathbf{z})$ and the current approximation $q^t(\\mathbf{z})$.\n",
     "\n",
-    "It's called *black box* Variational Inference because this optimization does not have to be tailored to the variational family which is being used. By setting $\\lambda$ (the regularization factor of the entropy term) to 1, standard SVI methods can be used to compute $\\mathbb{E}_s[\\text{log} p(\\mathbf{x}, \\mathbf{z})] - \\lambda \\mathbb{E}_s[\\text{log}s(\\mathbf{z})]$. See the explanation of [the section on the implementation of the RELBO](#the-relbo) below for an explanation of how we compute the term  $- \\mathbb{E}_s[\\text{log} q^t(\\mathbf{z})]$. Imporantly, we do not need to make any additional assumptions about the variational family that's being used to ensure that this algorithm converges. \n",
+    "It's called *black box* Variational Inference because this optimization does not have to be tailored to the variational family which is being used. By setting $\\lambda$ (the regularization factor of the entropy term) to 1, standard SVI methods can be used to compute $\\mathbb{E}_s[\\log p(\\mathbf{x}, \\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})]$. See the explanation of [the section on the implementation of the RELBO](#the-relbo) below for an explanation of how we compute the term  $- \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$. Imporantly, we do not need to make any additional assumptions about the variational family that's being used to ensure that this algorithm converges. \n",
     "\n",
     "In [1], a number of different ways of finding the mixture weights $\\gamma_t$ are suggested, ranging from fixed step sizes based on the iteration to solving the optimisation problem of finding $\\gamma_t$ that will minimise the RELBO. Here, we used the fixed step size method.\n",
     "For more details on the theory behind boosting black box variational inference, please refer to [1]."
@@ -79,7 +79,9 @@
   {
    "cell_type": "code",
    "execution_count": 26,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "import os\n",
@@ -117,7 +119,9 @@
   {
    "cell_type": "code",
    "execution_count": 23,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "def model(data):\n",
@@ -148,7 +152,9 @@
   {
    "cell_type": "code",
    "execution_count": 6,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "def guide(data, index):\n",
@@ -164,19 +170,21 @@
     "### The RELBO <a class=\"anchor\" id=\"the-relbo\"></a>\n",
     "\n",
     "We implement the RELBO as a function which can be passed to Pyro's SVI class in place of ELBO to find the approximation components $s_t(z)$. Recall that the RELBO has the following form:\n",
-    "$$\\mathbb{E}_s[\\text{log} p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\text{log}s(\\mathbf{z})] - \\mathbb{E}_s[\\text{log} q^t(\\mathbf{z})]$$\n",
+    "$$\\mathbb{E}_s[\\log p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})] - \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$$\n",
     "\n",
     "Conveniently, this is very similar to the regular ELBO which allows us to reuse Pyro's existing ELBO. Specifically, we compute \n",
-    "$$E_s[\\text{log} p(x,z)] - \\lambda E_s[\\text{log}s]$$\n",
+    "$$E_s[\\log p(x,z)] - \\lambda E_s[\\log s]$$\n",
     "using Pyro's `Trace_ELBO` and then compute \n",
-    "$$ - E_s[\\text{log} q^t]$$\n",
+    "$$ - E_s[\\log q^t]$$\n",
     "using Poutine. For more information on how this works, we recommend going through the Pyro tutorials [on Poutine](https://pyro.ai/examples/effect_handlers.html) and [custom SVI objectives](https://pyro.ai/examples/custom_objectives.html)."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 19,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "def relbo(model, guide, *args, **kwargs):\n",
@@ -221,7 +229,9 @@
   {
    "cell_type": "code",
    "execution_count": 169,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "def approximation(data, components, weights):\n",
@@ -247,7 +257,9 @@
   {
    "cell_type": "code",
    "execution_count": 88,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "initial_approximation = partial(guide, index=0)\n",
@@ -339,7 +351,9 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {},
+   "metadata": {
+    "collapsed": true
+   },
    "outputs": [],
    "source": [
     "# Plot the resulting approximation\n",
@@ -586,7 +600,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.5"
+   "version": "3.6.7"
   },
   "mimetype": "text/x-python",
   "name": "python",

From bf257e9d880a7ca7b97a19dedf50cb4c5693bba1 Mon Sep 17 00:00:00 2001
From: Gideon Dresdner <gideond@gmail.com>
Date: Mon, 16 Dec 2019 13:51:20 +0100
Subject: [PATCH 2/2] more small latex changes

---
 boosting_bbvi_tutorial.ipynb | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/boosting_bbvi_tutorial.ipynb b/boosting_bbvi_tutorial.ipynb
index 6f2682a..d3092a5 100644
--- a/boosting_bbvi_tutorial.ipynb
+++ b/boosting_bbvi_tutorial.ipynb
@@ -173,9 +173,9 @@
     "$$\\mathbb{E}_s[\\log p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})] - \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$$\n",
     "\n",
     "Conveniently, this is very similar to the regular ELBO which allows us to reuse Pyro's existing ELBO. Specifically, we compute \n",
-    "$$E_s[\\log p(x,z)] - \\lambda E_s[\\log s]$$\n",
+    "$$\\mathbb{E}_s[\\log p(x,z)] - \\lambda \\mathbb{E}_s[\\log s]$$\n",
     "using Pyro's `Trace_ELBO` and then compute \n",
-    "$$ - E_s[\\log q^t]$$\n",
+    "$$ - \\mathbb{E}_s[\\log q^t]$$\n",
     "using Poutine. For more information on how this works, we recommend going through the Pyro tutorials [on Poutine](https://pyro.ai/examples/effect_handlers.html) and [custom SVI objectives](https://pyro.ai/examples/custom_objectives.html)."
    ]
   },
@@ -221,7 +221,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Our implementation of the approximation $q^T(z) = \\sum_{t=1}^T \\gamma_t s_t(z)$ consists of a list of components, i.e. the guides from the greedy selection steps, and a list containing the mixture weights of the components. To sample from the approximation, we thus first sample a component according to the mixture weights. In a second step, we draw a sample from the corresponding component.\n",
+    "Our implementation of the approximation $q^t(z) = \\sum_{i=1}^t \\gamma_i s_i(z)$ consists of a list of components, i.e. the guides from the greedy selection steps, and a list containing the mixture weights of the components. To sample from the approximation, we thus first sample a component according to the mixture weights. In a second step, we draw a sample from the corresponding component.\n",
     "\n",
     "Similarly as with the guide, we use `partial(approximation, components=components, weights=weights)` to get an approximation function which has the same signature as the model."
    ]