#separator:tab
#html:true
#notetype column:1
#deck column:2
#tags column:5
Basic	Part III Notes::Astrostatistics	MGF of the Univariate Normal RV	\[\phi_X(t) = e^{i\mu t - \frac{1}{2} \sigma^2t^2}\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Easiest way to find the distribution of the sum of two random variables	Use the property that the MGF of a sum of RVs is the product of their MGFs: \[\phi_{X+Y}(t) = \phi_X(t) * \phi_Y(t)\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Variance of the sample mean	\[{\mathrm{Var}}(\hat \mu) = \frac{\sigma^2}{n}\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Fisher Information Matrix	\[\bigl[\mathcal{I}(\theta)\bigr]_{i, j} =   \operatorname{E}\left[\left.     \left(\frac{\partial}{\partial\theta_i} \log f(X;\theta)\right)     \left(\frac{\partial}{\partial\theta_j} \log f(X;\theta)\right)   \,\, \right| \,\,\theta\right]\] Or, equivalently, given some regularity conditions, \[ \bigl[\mathcal{I}(\theta) \bigr]_{i, j} =   -\operatorname{E}\left[\left.     \frac{\partial^2}{\partial\theta_i\, \partial\theta_j} \log f(X;\theta)    \,\, \right| \,\, \theta\right]\,\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Univariate Unbiased Cramer-Rao Lower Bound	Given an unbiased estimator \(T\) for \(\theta\), we have: \[{\mathrm{Var}}(\hat \theta) \geq \frac{1}{I(\theta)}\] (this is achieved by unbiased MLEs)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Multivariate Cramer-Rao Lower Bound	For any unbiased estimator \(\vec T\) of \(\vec \theta\), \[ {\mathrm{Cov}} (\vec T) - I(\theta){^{-1}} \] must be positive-semi-definite. In particular, this means that \[{\mathrm{Var}}(T_i) \geq [I(\vec\theta){^{-1}}]_{ii}\] for any index \(i\).	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Definition of CDF	\[F(x) = P(X\leq x) = \int_{-\infty}^x f(x)\,dx\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Characteristic Function (Statistics)	\[\varphi_X(t) = {\mathbb{E}}(e^{itX})\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> The Delta Method	Just a second-order Taylor approximation with a statistics coat of paint: \[{\mathbb{E}}[g(X)] \approx g({\mathbb{E}} [X]) + \frac{1}{2} g''({\mathbb{E}}[X]) {\mathrm{Var}}(X)\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Univariate General Cramer-Rao Lower Bound	Given any estimator \(T\) for \(\theta\), we have: \[{\mathrm{Var}}(\hat \theta) \geq  \frac{\left(1 + \frac{d}{d\theta} {\mathrm{Bias}}(\hat \theta)\right)^2}{I(\theta)}\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	How do you solve a problem with selection effects?	Use Bayes' Theorem conditioning on the selection function (usually a step function), should end up with a normal CDF normalizing factor	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	State the Frequentist model of statistics (in the context of a regression)	We have some true variables we want to compute, but we can only view data that is confounded with some source of error (following a distribution)<br> <br> So the underlying (latent) variables we seek to understand are deterministic, there's just a bunch of stochastic error complicating matters.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	State the Bayesian model of statistics (in the context of a regression)	Interpret probability as the degree of certainty in an event rather than its long-run chance. We start with some guess as to how we think the parameter is distributed, and use the data to update that prior into a more accurate posterior. <br> <br> The parameters are themselves random variables! We estimate their distribution, not just a single value.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Monte Carlo Integration<br>(for integrating over some interval I)	\[\hat I = \frac{1}{m} \sum_{i=1}^m f(\theta_i)\] where \(f(\theta)\) is some data processing to estimate some statistic (mean, variance, etc)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Monte Carlo integrator for the posterior mean	\[f(\theta) = \theta\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Monte Carlo integrator for the posterior variance	\[f(\theta) = (\theta - {\mathbb{E}}[\theta {\,|\,} D ])^2\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Monte Carlo integrator in an interval \([a,b]\)	\(f(\theta) = I_{[a, b]}(\theta)\)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Variance of a Monte Carlo Integrator	\[{\mathrm{Var}}(\hat I) = \frac{1}{m} {\mathrm{Var}}[f(\theta)]\] Note that this is independent of the dimension of \(\theta\)! <br>The Monte Carlo error is just the square root of this (the standard deviation)<br> <br>Note that this can also be used to estimate the sample variance: \[\hat {\mathrm{Var}}(\{ f(\theta_i) \}) = \frac{1}{m-1} \sum_{i=1}^m(f(\theta_i) - \hat I)\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Direct Sampling method for estimating a posterior distribution	In the case where a posterior distribution can be broken down into the product of named distributions, simply draw from one distribution, feed into the next, etc, and do this to get a bunch of samples. <br> <br> Note that it's easy to get the marginals this way: just ignore all but one value!	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Kernel Density Estimation	Creates a smooth histogram from a bunch of samples \(\theta_i\): <br>\[\hat P(\theta {\,|\,} D) = \frac{1}{m} \sum_{i=1}^m  N(\theta {\,|\,} D_i, b_w^2)\] <br>where \(b_w\) is the bandwidth.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Silverman's Rule of Thumb	\[b_w = \left( \frac{4 \hat \sigma^5}{3m} \right)^{1/5}\] <br>where \(\sigma\) is the estimated sample standard deviation, \(\hat \sigma^2 = \hat {\mathrm{Var}}(\{ \theta_i \})\)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> The Metropolis Algorithm	1. Pick some \(\mu_0\)<br> 2. Choose a new \(\mu_{\text{prop}} \sim {\mathrm{N}}(\mu_{\text{prev}}, \tau^2)\)<br> Note: the jump distribution has to be symmetric: \(J(a|b) = J(b|a)\)<br> 3. Accept with probability \(\min(1, r),\,\, r = \frac{{\mathbb{P}}(\mu_{\text{prop}}{\,|\,}\vec y)}{{\mathbb{P}}(\mu_{\text{prev}}{\,|\,}\vec y)}\)<br> 4. Repeat 2 and 3 until you have enough samples.<br> <br> This works for higher dimensions as well, just draw from a multivariate Gaussian jump distribution.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	State two post-processing strategies to properly format MCMC results	1. Get rid of burn-in: throw out about the first 20 percent of your samples<br> 2. Thinning: Throw out every other sample	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Stationary Distribution	Given some distribution \(P\) and some random process kernel \(T\): \[\sum_x P(x)T(x \to x') = P(x')\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Detailed Balance<br>(and what it implies)	Given some distribution \(P\) and some random process kernel \(T\): \[P(x)T(x \to x') = P(x')T(x'\to x)\] This implies that \(P\) is a stationary distribution: just sum both sides over \(x\)!	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Metropolis-Hastings	1. Pick some \(\mu_0\)<br> 2. Choose a new \(\mu_{\text{prop}} \sim J(\mu_{\text{prev}}, \tau^2)\)<br> Note: the jump distribution need not be symmetric: \(J(a|b) \neq J(b|a)\)<br> 3. Accept with probability \(\min(1, r),\,\, r = \frac{{\mathbb{P}}(\mu_{\text{prop}}{\,|\,}\vec y) / J(\mu_{\text{prop}}{\,|\,} \mu_\text{prev})}{{\mathbb{P}}(\mu_{\text{prev}}{\,|\,}\vec y)/ J(\mu_{\text{prev}}{\,|\,} \mu_\text{prop})}\)<br> 4. Repeat 2 and 3 until you have enough samples.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Gibbs Sampling	Assumes that we have marginal distributions along every parameter alone. <br>1. Pick some \(\vec \mu_0\)<br> 2. Run one Gibbs cycle: <br> for each \(i \in [d]\),  set \(\mu_{next,i} \sim {\mathbb{P}}(\mu_{prev, i}{\,|\,} \mu_{prev, -i}, y)\)<br> 4. Repeat 2 until you have enough samples.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Metropolis-Within-Gibbs	If we can't express the conditional distributions necessary to run Gibbs sampling, estimate them by attempting to take a Metropolis step	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	How do we choose a good jump function when tuning the Metropolis algorithm?	We can use Laplace approximation to show that the best choice is \(c^2A{^{-1}}\), where \(A\) is the Hessian of the log-posterior and \(c \approx \frac{2.4}{\sqrt{\dim \theta}}\)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	What acceptance rate should we aim for when using the Metropolis algorithm?	44 percent in one dimension, 23 percent in dimensions greater than 5	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	How can we compare the convergence of two different MCMC methods?	Check that your Gelman-Rubin ratio is about 1.<br> Check the autocorrelation timescale, compare effective sample sizes.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Explain mixed samplers and parameter blocking	Mixed samplers describe the process of choosing different means of sampling different parameters of the function we want to sample from.<br> <br> Parameter blocking refers to updating some parameters together (in blocks), especially ones that are highly correlated (to prevent zig-zagging during Gibbs, for instance).	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Precision	The inverse of variance, often \(\sigma^{-2}\) or \(\Sigma^{-1}\)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Posterior of two multivariate normals, one determining the mean of the other<br>((same as proportional product of two normal pdfs))	Given \(N(\vec \mu_1, \Sigma_1)\) and \(N(\vec \mu_2, \Sigma_2)\), the product of their pdfs is a new \(N(\mu^*, \Sigma^*)\) with \[\Sigma^* = \left(\Sigma_1^{-1} + \Sigma_2^{-1}\right){^{-1}}\] \[\mu^* = \Sigma^* \left(\Sigma_1^{-1}\vec\mu_1 + \Sigma_2^{-1}\vec\mu_2\right)\] For the posterior result, just use \(\mu_2 := A\).<br> <br>The precisions just sum. The mean is the precision-weighted average.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Shrinkage Estimator	The idea is to bias the individual estimates towards the population mean to reduce the final MSE.<br> This is equivalent to just using Hierarchical Bayes with \(\tau^2\), where you estimate \(\tau^2\) given data first, then consider that for your other estimators.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Posterior Odds Ratio<br>(and the Bayes factor)	\[ \frac{P(M_1 {\,|\,} D)}{P(M_2 {\,|\,} D)} = \frac{P(D {\,|\,} M_1)}{P(D {\,|\,} M_2)} \frac{P(M_1)}{P(M_2)} \] \(M_1\) and \(M_2\) are our models. The first multiplicand is the Bayes factor, the second multiplicand is the prior odds of one model versus the other.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Evidence	\[P(D {\,|\,} M_1) = \int_\Theta P(D {\,|\,} \theta, M_1)P(\theta {\,|\,} M_1) d\theta\] also called the marginal likelihood.<br> <br>Note that \(P(\theta {\,|\,} M_1)\) needs to be a proper prior!	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> The Jefferies Scale	Let \(\alpha\) be a placeholder for the natural log of the Bayes factor.<br> \(\alpha &lt; 1\): Terrible<br> \(1 &lt; \alpha &lt; 2.5\): Weak but significant<br> \(2.5 &lt; \alpha &lt; 5\): Significant<br> \(5 &lt; \alpha\): Decisive	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Harmonic Mean Estimator	\[P(D) \approx \left[ \frac{1}{M} \sum_{i=1}^M P(D {\,|\,} \theta_i){^{-1}}  \right]{^{-1}}\] This is unstable in practice	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Savage-Dickey Ratio	When we are comparing nested models, it's possible to simplify the Bayes factor to \[ \frac{P(\psi {\,|\,} D, M_1)}{P(\psi {\,|\,} M_1)} \] where \(\psi = 0\).	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Bayesian Model Averaging	Bayesian model comparison between many models, not just two candidates.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Marginalization	\[P(x) = \int P(x,y)dy = \int P(x{\,|\,} y) P(y) dy = {\mathbb{E}}_{y}[P(x {\,|\,} y)] \]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Split Multivariate Gaussian Marginals	Suppose we have a partioned multivariate Gaussian like  \[\boldsymbol{f} = \begin{pmatrix} \boldsymbol{U} \\\\ \boldsymbol{V} \end{pmatrix} \sim N\left( \begin{bmatrix} \boldsymbol{\mu}_U \\\\ \boldsymbol{\mu}_V \end{bmatrix}, \begin{bmatrix} \boldsymbol{\Sigma}_U & \boldsymbol{\Sigma}_{UV} \\\\ \boldsymbol{\Sigma}_{VU} & \boldsymbol{\Sigma}_V \end{bmatrix} \right)\] Then its marginals are exactly what we'd expect: \[P(U) = \int P(U,V)dV = N(U {\,|\,} \mu_U, \Sigma_U)\] <br>\[P(V) = \int P(U,V)dU = N(V {\,|\,} \mu_V, \Sigma_V)\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Split Multivariate Gaussian Conditional Probabilities	\[ U {\,|\,} V \sim N({\mathbb{E}}[U {\,|\,} V], {\mathrm{Var}}[U {\,|\,} V]) \] <br>\[{\mathbb{E}} [U {\,|\,} V] = \mu_U + \Sigma_{UV} \Sigma_V{^{-1}} (V - \mu_V)\] <br>\[{\mathrm{Var}}[U {\,|\,} V] = \Sigma_U - \Sigma_{UV} \Sigma_V{^{-1}} \Sigma_{VU}\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	How can we construct a joint probability from \[V \sim N(V_0, \Sigma_V)\] and \[U {\,|\,} V \sim N(U_0 + XV, \Sigma_{U {\,|\,} V})\]	\[\begin{pmatrix} \boldsymbol{U} \\\\ \boldsymbol{V} \end{pmatrix} \sim N\left( \begin{pmatrix} \boldsymbol{U}_0 + \boldsymbol{X}\boldsymbol{V}_0 \\\\ \boldsymbol{V}_0 \end{pmatrix}, \begin{pmatrix} \boldsymbol{X}\boldsymbol{\Sigma}_V \boldsymbol{X}^T + \boldsymbol{\Sigma}_{U|V} & \boldsymbol{X}\boldsymbol{\Sigma}_V \\\\ \boldsymbol{\Sigma}_V \boldsymbol{X}^T & \boldsymbol{\Sigma}_V \end{pmatrix} \right)\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> MLE and Unbiased Estimator for Normal RV \(\sigma\)	MLE: \[\hat \sigma^2 = \frac{(x_i-\hat \mu)^2}{N}\] Unbiased Estimator: \[\hat \sigma^2 = \frac{(x_i-\hat \mu)^2}{N-1}\] where \(\hat \mu = \bar x\).	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Gaussian Process	Time-dependent Gaussian distribution \[f(t) \sim GP(m(t), K_{A, \tau^2}(t, t'))\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Squared Exponential Kernel	\[k(t,t') = A^2 \exp(-|t-t'|^2 / \tau^2)\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Stationary Gaussain Process	\[K(t,t') = K(t + a, t' + a)\] for all \(a\)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> How to draw probabilistic graphical models	Open circle / square nodes: latent parameter, unobserved data<br> Shaded nodes: Something we condition on (ie data)<br> Filled dot: Known constant<br> Plate: iid replications of what's inside	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Occam Factor<br>(and how to derive it)	Define \(g(\theta) = P(\theta {\,|\,} D, M)P(D {\,|\,} M) = P(D {\,|\,} \theta, M)P(\theta {\,|\,} M)\). <br> Find a MAP estimate \(\theta_0 = {\underset{\Theta}{\mathrm{argmax}}\,\,} \ln(g(\theta))\), use a second order Taylor expansion for \(\ln(g(\theta))\) then exponentiate to get: \[g(\theta) \approx g(\theta) \exp(-\frac{1}{2} (\theta - \theta_0){^{\mathrm{T}}} A (\theta - \theta_0))\] where \(A\) is the Hessian of the log of \(g\) at the MAP<br>  From there see that \[\int g(\theta)d\theta = \int P(\theta {\,|\,} D, M)P(D {\,|\,} M) d\theta  = P(D{\,|\,} M) \int P(\theta {\,|\,} D, M)d\theta = P(D {\,|\,} M)\] So we have: \[P(D|M) \approx g(\theta_{MAP}) \det(A / 2\pi)^{-1/2} =  P(D {\,|\,} \theta_0) P(\theta_0) \det(A / 2\pi)^{-1/2}\] \(P(\theta_0) \det(A / 2\pi)^{-1/2}\) is called the Occam factor.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Laplace Approximation for the Evidence	Find a MAP estimate \(\theta_0 = \underset{\Theta}{\mathrm{argmax}}\,\, \ln(P^*(\theta \,|\, D))\), use a second order Taylor expansion for \(\ln(P^*(\theta \,|\, D))\) then exponentiate to get: \[P^*(\theta \,|\, D) \approx P^*(\theta_0 \,|\, D) \exp(-\frac{1}{2} (\theta - \theta_0)^{\mathrm{T}} A (\theta - \theta_0))\] where \(A\) is the Hessian at the MAP<br>	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Inverse of a 2x2 Matrix	For a \(2 \times 2\) matrix: \[A = \begin{pmatrix} a & b \\\\ c & d \end{pmatrix}\] its inverse is: \[A^{-1} = \frac{1}{ad - bc}\begin{pmatrix} d & -b \\\\ -c & a \end{pmatrix}\] <br>To invert, swap the entries on the diagonal, negate the off-diagonal, and divide everything by the determinant.	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Goal of Nested Sampling	We need to be able to evaluate the likelihood \(L(\theta) : = P(D {\,|\,} \theta, M)\) and the prior \(\pi(\theta) := P(\theta {\,|\,} M)\), we want to approximate the integral \[Z = \int P(D {\,|\,} \theta, M) P(\theta | M) d \theta = \int L(\theta)\pi(\theta) d\theta\]<br> <br>(remember the prior is diffuse, and the likelihood is peaked, which makes this problem difficult)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Nested Sampling	1. Take \(N_{live}\) points and their likelihoods, start with \(Z = 0\).<br> 2. Kill the point with the smallest likelihood, call it \(L_i^*\) for step \(i\)<br> 3. (trivial) Compute the shrinkage factor \(t_i = \frac{X_i}{X_{i-1}} \approx e^{-1/N_{live}}\) (approximate with the mean of the beta-distribution \(\mathrm{Beta(N_{live}, 1)}\))<br> 4. Accumulate evidence: \[\Delta Z = L_i^*(X_{i-1} - X_i) = L_i^* (1 - t_i) X_{i-1}\] 5. Sample new point with likelihood greater than \(L_i^*\)<br> 6. Repeat steps 2-5 until convergence<br> 7. Add information about remaining points to the evidence: \[\Delta Z  = \bar L X_{end} = \bar L e^{-m/N_{live}} \] where \(\bar L\) is the average likelihood of remaining points, and \(m\) is the number of steps taken so far.<br> 8. Compute the weighted posterior samples: \[w_i = \frac{L_i^*(1-t_i)X_{i-1}}{Z}\]	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Variational Inference	Idea: approximate the evidence with a distribution that is 'close' (measured by Kullback-Leibler divergence)	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Kullback-Leibler Divergence	\[KL(q(\theta) {\,||\,}  p(\theta) ) = - \int q(\theta) \ln \left( \frac{p(\theta)}{q(\theta)} \right)d\theta\] Nonnegative and zero only when \(p(\theta) = q(\theta)\), but NOT symmetric and so not a metric!	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> ELBO<br>(and how to derive it)	Assume we want to minimize \(KL(q(\theta) {\,||\,} P(\theta {\,|\,} D))\). Break down the KL divergence to extract the \(p(D)\) piece, yielding \[KL(q(\theta) {\,||\,} P(\theta {\,|\,} D)) + {\mathbb{E}}_q[\ln(q(\theta)) - \ln (P(D {\,|\,} \theta)P(\theta))] = \ln(P(D))\] Call the expectation bit the Evidence (Log) Lower Bound (ELBO). If we want to minimize the KL divergence, we have to minimize it (can't ever do better than \(\ln(P(D))\))	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> The Apparent Magnitude Equation	\[m = M + \mu\] where \(m\) is the true apparent magnitude, \(M\) is the absolute magnitude, and \(\mu\) is the distance modulus \(\mu = 25 + 5 \log_{10} (d)\).	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	<b>Definition:</b> Parallax Equation	\[\frac{\omega}{\mathrm{arcsec}} = \frac{\mathrm{parsec}}{d}\] where \(\omega\) is the true parallax angle and \(d\) is the distance to the star.<br> (note this uses small angle approximations, is not valid for large \(\omega\))	Astrostatistics PartIIINotes
Basic	Part III Notes::Astrostatistics	Goal of ELBO	We want to find a \(q\) that minimizes  \[KL(q(\theta) {\,||\,} P(\theta {\,|\,} D))\] (approximate our evidence with a 'close' \(q\) distribution)	Astrostatistics PartIIINotes