From 1b7d70b3cff7b2d954450f72ec01bd3d948ec597 Mon Sep 17 00:00:00 2001 From: Gideon Dresdner Date: Mon, 16 Dec 2019 13:47:25 +0100 Subject: [PATCH 1/2] small notation / latex changes --- boosting_bbvi_tutorial.ipynb | 46 +++++++++++++++++++++++------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/boosting_bbvi_tutorial.ipynb b/boosting_bbvi_tutorial.ipynb index a6fcaa5..6f2682a 100644 --- a/boosting_bbvi_tutorial.ipynb +++ b/boosting_bbvi_tutorial.ipynb @@ -31,26 +31,26 @@ "\n", "Briefly, Variational Inference allows us to find approximations of probability densities which are intractable to compute analytically. For instance, one might have observed variables $\\textbf{x}$, latent variables $\\textbf{z}$ and a joint distribution $p(\\textbf{x}, \\textbf{z})$. One can then use Variational Inference to approximate $p(\\textbf{z}|\\textbf{x})$. To do so, one first chooses a set of tractable densities, a variational family, and then tries to find the element of this set which most closely approximates the target distribution $p(\\textbf{z}|\\textbf{x})$.\n", "This approximating density is found by maximizing the Evidence Lower BOund (ELBO):\n", - "$$ \\mathbb{E}_q[\\text{log} p(\\mathbf{x}, \\mathbf{z})] - \\mathbb{E}_q[\\text{log} q(\\mathbf{z})]$$\n", + "$$ \\mathbb{E}_q[\\log p(\\mathbf{x}, \\mathbf{z})] - \\mathbb{E}_q[\\log q(\\mathbf{z})]$$\n", "\n", "where $s(\\mathbf{z})$ is the approximating density.\n", "\n", "### Boosting Black Box Variational Inference \n", "\n", "In boosting black box Variational inference (BBBVI), we approximate the target density with a mixture of densities from the variational family:\n", - "$$q^T(\\mathbb{z}) = \\sum_{t=1}^T \\gamma_t s_t(\\mathbf{z})$$\n", + "$$q^t(\\mathbf{z}) = \\sum_{i=1}^t \\gamma_i s_i(\\mathbf{z})$$\n", "\n", - "$$\\text{where} \\sum_{t=1}^T \\gamma_t =1$$\n", + "$$\\text{where} \\sum_{i=1}^t \\gamma_i =1$$\n", "\n", "and $s_t(\\mathbf{z})$ are elements of the variational family.\n", "\n", "The components of the approximation are selected greedily by maximising the so-called Residual ELBO (RELBO) with respect to the next component $s_{t+1}(\\mathbf{z})$:\n", "\n", - "$$\\mathbb{E}_s[\\text{log} p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\text{log}s(\\mathbf{z})] - \\mathbb{E}_s[\\text{log} q^t(\\mathbf{z})]$$\n", + "$$\\mathbb{E}_s[\\log p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})] - \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$$\n", "\n", "Where the first two terms are the same as in the ELBO and the last term is the cross entropy between the next component $s_{t+1}(\\mathbf{z})$ and the current approximation $q^t(\\mathbf{z})$.\n", "\n", - "It's called *black box* Variational Inference because this optimization does not have to be tailored to the variational family which is being used. By setting $\\lambda$ (the regularization factor of the entropy term) to 1, standard SVI methods can be used to compute $\\mathbb{E}_s[\\text{log} p(\\mathbf{x}, \\mathbf{z})] - \\lambda \\mathbb{E}_s[\\text{log}s(\\mathbf{z})]$. See the explanation of [the section on the implementation of the RELBO](#the-relbo) below for an explanation of how we compute the term $- \\mathbb{E}_s[\\text{log} q^t(\\mathbf{z})]$. Imporantly, we do not need to make any additional assumptions about the variational family that's being used to ensure that this algorithm converges. \n", + "It's called *black box* Variational Inference because this optimization does not have to be tailored to the variational family which is being used. By setting $\\lambda$ (the regularization factor of the entropy term) to 1, standard SVI methods can be used to compute $\\mathbb{E}_s[\\log p(\\mathbf{x}, \\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})]$. See the explanation of [the section on the implementation of the RELBO](#the-relbo) below for an explanation of how we compute the term $- \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$. Imporantly, we do not need to make any additional assumptions about the variational family that's being used to ensure that this algorithm converges. \n", "\n", "In [1], a number of different ways of finding the mixture weights $\\gamma_t$ are suggested, ranging from fixed step sizes based on the iteration to solving the optimisation problem of finding $\\gamma_t$ that will minimise the RELBO. Here, we used the fixed step size method.\n", "For more details on the theory behind boosting black box variational inference, please refer to [1]." @@ -79,7 +79,9 @@ { "cell_type": "code", "execution_count": 26, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "import os\n", @@ -117,7 +119,9 @@ { "cell_type": "code", "execution_count": 23, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "def model(data):\n", @@ -148,7 +152,9 @@ { "cell_type": "code", "execution_count": 6, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "def guide(data, index):\n", @@ -164,19 +170,21 @@ "### The RELBO \n", "\n", "We implement the RELBO as a function which can be passed to Pyro's SVI class in place of ELBO to find the approximation components $s_t(z)$. Recall that the RELBO has the following form:\n", - "$$\\mathbb{E}_s[\\text{log} p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\text{log}s(\\mathbf{z})] - \\mathbb{E}_s[\\text{log} q^t(\\mathbf{z})]$$\n", + "$$\\mathbb{E}_s[\\log p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})] - \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$$\n", "\n", "Conveniently, this is very similar to the regular ELBO which allows us to reuse Pyro's existing ELBO. Specifically, we compute \n", - "$$E_s[\\text{log} p(x,z)] - \\lambda E_s[\\text{log}s]$$\n", + "$$E_s[\\log p(x,z)] - \\lambda E_s[\\log s]$$\n", "using Pyro's `Trace_ELBO` and then compute \n", - "$$ - E_s[\\text{log} q^t]$$\n", + "$$ - E_s[\\log q^t]$$\n", "using Poutine. For more information on how this works, we recommend going through the Pyro tutorials [on Poutine](https://pyro.ai/examples/effect_handlers.html) and [custom SVI objectives](https://pyro.ai/examples/custom_objectives.html)." ] }, { "cell_type": "code", "execution_count": 19, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "def relbo(model, guide, *args, **kwargs):\n", @@ -221,7 +229,9 @@ { "cell_type": "code", "execution_count": 169, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "def approximation(data, components, weights):\n", @@ -247,7 +257,9 @@ { "cell_type": "code", "execution_count": 88, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "initial_approximation = partial(guide, index=0)\n", @@ -339,7 +351,9 @@ { "cell_type": "code", "execution_count": null, - "metadata": {}, + "metadata": { + "collapsed": true + }, "outputs": [], "source": [ "# Plot the resulting approximation\n", @@ -586,7 +600,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.5" + "version": "3.6.7" }, "mimetype": "text/x-python", "name": "python", From bf257e9d880a7ca7b97a19dedf50cb4c5693bba1 Mon Sep 17 00:00:00 2001 From: Gideon Dresdner Date: Mon, 16 Dec 2019 13:51:20 +0100 Subject: [PATCH 2/2] more small latex changes --- boosting_bbvi_tutorial.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/boosting_bbvi_tutorial.ipynb b/boosting_bbvi_tutorial.ipynb index 6f2682a..d3092a5 100644 --- a/boosting_bbvi_tutorial.ipynb +++ b/boosting_bbvi_tutorial.ipynb @@ -173,9 +173,9 @@ "$$\\mathbb{E}_s[\\log p(\\mathbf{x},\\mathbf{z})] - \\lambda \\mathbb{E}_s[\\log s(\\mathbf{z})] - \\mathbb{E}_s[\\log q^t(\\mathbf{z})]$$\n", "\n", "Conveniently, this is very similar to the regular ELBO which allows us to reuse Pyro's existing ELBO. Specifically, we compute \n", - "$$E_s[\\log p(x,z)] - \\lambda E_s[\\log s]$$\n", + "$$\\mathbb{E}_s[\\log p(x,z)] - \\lambda \\mathbb{E}_s[\\log s]$$\n", "using Pyro's `Trace_ELBO` and then compute \n", - "$$ - E_s[\\log q^t]$$\n", + "$$ - \\mathbb{E}_s[\\log q^t]$$\n", "using Poutine. For more information on how this works, we recommend going through the Pyro tutorials [on Poutine](https://pyro.ai/examples/effect_handlers.html) and [custom SVI objectives](https://pyro.ai/examples/custom_objectives.html)." ] }, @@ -221,7 +221,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Our implementation of the approximation $q^T(z) = \\sum_{t=1}^T \\gamma_t s_t(z)$ consists of a list of components, i.e. the guides from the greedy selection steps, and a list containing the mixture weights of the components. To sample from the approximation, we thus first sample a component according to the mixture weights. In a second step, we draw a sample from the corresponding component.\n", + "Our implementation of the approximation $q^t(z) = \\sum_{i=1}^t \\gamma_i s_i(z)$ consists of a list of components, i.e. the guides from the greedy selection steps, and a list containing the mixture weights of the components. To sample from the approximation, we thus first sample a component according to the mixture weights. In a second step, we draw a sample from the corresponding component.\n", "\n", "Similarly as with the guide, we use `partial(approximation, components=components, weights=weights)` to get an approximation function which has the same signature as the model." ]