#separator:tab #html:true #notetype column:1 #deck column:2 #tags column:5 Basic Part III Notes::Astrostatistics MGF of the Univariate Normal RV \[\phi_X(t) = e^{i\mu t - \frac{1}{2} \sigma^2t^2}\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Easiest way to find the distribution of the sum of two random variables Use the property that the MGF of a sum of RVs is the product of their MGFs: \[\phi_{X+Y}(t) = \phi_X(t) * \phi_Y(t)\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Variance of the sample mean \[{\mathrm{Var}}(\hat \mu) = \frac{\sigma^2}{n}\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Fisher Information Matrix \[\bigl[\mathcal{I}(\theta)\bigr]_{i, j} = \operatorname{E}\left[\left. \left(\frac{\partial}{\partial\theta_i} \log f(X;\theta)\right) \left(\frac{\partial}{\partial\theta_j} \log f(X;\theta)\right) \,\, \right| \,\,\theta\right]\] Or, equivalently, given some regularity conditions, \[ \bigl[\mathcal{I}(\theta) \bigr]_{i, j} = -\operatorname{E}\left[\left. \frac{\partial^2}{\partial\theta_i\, \partial\theta_j} \log f(X;\theta) \,\, \right| \,\, \theta\right]\,\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Univariate Unbiased Cramer-Rao Lower Bound Given an unbiased estimator \(T\) for \(\theta\), we have: \[{\mathrm{Var}}(\hat \theta) \geq \frac{1}{I(\theta)}\] (this is achieved by unbiased MLEs) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Multivariate Cramer-Rao Lower Bound For any unbiased estimator \(\vec T\) of \(\vec \theta\), \[ {\mathrm{Cov}} (\vec T) - I(\theta){^{-1}} \] must be positive-semi-definite. In particular, this means that \[{\mathrm{Var}}(T_i) \geq [I(\vec\theta){^{-1}}]_{ii}\] for any index \(i\). Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition of CDF \[F(x) = P(X\leq x) = \int_{-\infty}^x f(x)\,dx\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Characteristic Function (Statistics) \[\varphi_X(t) = {\mathbb{E}}(e^{itX})\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: The Delta Method Just a second-order Taylor approximation with a statistics coat of paint: \[{\mathbb{E}}[g(X)] \approx g({\mathbb{E}} [X]) + \frac{1}{2} g''({\mathbb{E}}[X]) {\mathrm{Var}}(X)\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Univariate General Cramer-Rao Lower Bound Given any estimator \(T\) for \(\theta\), we have: \[{\mathrm{Var}}(\hat \theta) \geq \frac{\left(1 + \frac{d}{d\theta} {\mathrm{Bias}}(\hat \theta)\right)^2}{I(\theta)}\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics How do you solve a problem with selection effects? Use Bayes' Theorem conditioning on the selection function (usually a step function), should end up with a normal CDF normalizing factor Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics State the Frequentist model of statistics (in the context of a regression) We have some true variables we want to compute, but we can only view data that is confounded with some source of error (following a distribution)

So the underlying (latent) variables we seek to understand are deterministic, there's just a bunch of stochastic error complicating matters. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics State the Bayesian model of statistics (in the context of a regression) Interpret probability as the degree of certainty in an event rather than its long-run chance. We start with some guess as to how we think the parameter is distributed, and use the data to update that prior into a more accurate posterior.

The parameters are themselves random variables! We estimate their distribution, not just a single value. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Monte Carlo Integration
(for integrating over some interval I) \[\hat I = \frac{1}{m} \sum_{i=1}^m f(\theta_i)\] where \(f(\theta)\) is some data processing to estimate some statistic (mean, variance, etc) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Monte Carlo integrator for the posterior mean \[f(\theta) = \theta\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Monte Carlo integrator for the posterior variance \[f(\theta) = (\theta - {\mathbb{E}}[\theta {\,|\,} D ])^2\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Monte Carlo integrator in an interval \([a,b]\) \(f(\theta) = I_{[a, b]}(\theta)\) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Variance of a Monte Carlo Integrator \[{\mathrm{Var}}(\hat I) = \frac{1}{m} {\mathrm{Var}}[f(\theta)]\] Note that this is independent of the dimension of \(\theta\)!
The Monte Carlo error is just the square root of this (the standard deviation)

Note that this can also be used to estimate the sample variance: \[\hat {\mathrm{Var}}(\{ f(\theta_i) \}) = \frac{1}{m-1} \sum_{i=1}^m(f(\theta_i) - \hat I)\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Direct Sampling method for estimating a posterior distribution In the case where a posterior distribution can be broken down into the product of named distributions, simply draw from one distribution, feed into the next, etc, and do this to get a bunch of samples.

Note that it's easy to get the marginals this way: just ignore all but one value! Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Kernel Density Estimation Creates a smooth histogram from a bunch of samples \(\theta_i\):
\[\hat P(\theta {\,|\,} D) = \frac{1}{m} \sum_{i=1}^m N(\theta {\,|\,} D_i, b_w^2)\]
where \(b_w\) is the bandwidth. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Silverman's Rule of Thumb \[b_w = \left( \frac{4 \hat \sigma^5}{3m} \right)^{1/5}\]
where \(\sigma\) is the estimated sample standard deviation, \(\hat \sigma^2 = \hat {\mathrm{Var}}(\{ \theta_i \})\) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: The Metropolis Algorithm 1. Pick some \(\mu_0\)
2. Choose a new \(\mu_{\text{prop}} \sim {\mathrm{N}}(\mu_{\text{prev}}, \tau^2)\)
Note: the jump distribution has to be symmetric: \(J(a|b) = J(b|a)\)
3. Accept with probability \(\min(1, r),\,\, r = \frac{{\mathbb{P}}(\mu_{\text{prop}}{\,|\,}\vec y)}{{\mathbb{P}}(\mu_{\text{prev}}{\,|\,}\vec y)}\)
4. Repeat 2 and 3 until you have enough samples.

This works for higher dimensions as well, just draw from a multivariate Gaussian jump distribution. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics State two post-processing strategies to properly format MCMC results 1. Get rid of burn-in: throw out about the first 20 percent of your samples
2. Thinning: Throw out every other sample Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Stationary Distribution Given some distribution \(P\) and some random process kernel \(T\): \[\sum_x P(x)T(x \to x') = P(x')\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Detailed Balance
(and what it implies) Given some distribution \(P\) and some random process kernel \(T\): \[P(x)T(x \to x') = P(x')T(x'\to x)\] This implies that \(P\) is a stationary distribution: just sum both sides over \(x\)! Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Metropolis-Hastings 1. Pick some \(\mu_0\)
2. Choose a new \(\mu_{\text{prop}} \sim J(\mu_{\text{prev}}, \tau^2)\)
Note: the jump distribution need not be symmetric: \(J(a|b) \neq J(b|a)\)
3. Accept with probability \(\min(1, r),\,\, r = \frac{{\mathbb{P}}(\mu_{\text{prop}}{\,|\,}\vec y) / J(\mu_{\text{prop}}{\,|\,} \mu_\text{prev})}{{\mathbb{P}}(\mu_{\text{prev}}{\,|\,}\vec y)/ J(\mu_{\text{prev}}{\,|\,} \mu_\text{prop})}\)
4. Repeat 2 and 3 until you have enough samples. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Gibbs Sampling Assumes that we have marginal distributions along every parameter alone.
1. Pick some \(\vec \mu_0\)
2. Run one Gibbs cycle:
for each \(i \in [d]\), set \(\mu_{next,i} \sim {\mathbb{P}}(\mu_{prev, i}{\,|\,} \mu_{prev, -i}, y)\)
4. Repeat 2 until you have enough samples. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Metropolis-Within-Gibbs If we can't express the conditional distributions necessary to run Gibbs sampling, estimate them by attempting to take a Metropolis step Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics How do we choose a good jump function when tuning the Metropolis algorithm? We can use Laplace approximation to show that the best choice is \(c^2A{^{-1}}\), where \(A\) is the Hessian of the log-posterior and \(c \approx \frac{2.4}{\sqrt{\dim \theta}}\) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics What acceptance rate should we aim for when using the Metropolis algorithm? 44 percent in one dimension, 23 percent in dimensions greater than 5 Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics How can we compare the convergence of two different MCMC methods? Check that your Gelman-Rubin ratio is about 1.
Check the autocorrelation timescale, compare effective sample sizes. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Explain mixed samplers and parameter blocking Mixed samplers describe the process of choosing different means of sampling different parameters of the function we want to sample from.

Parameter blocking refers to updating some parameters together (in blocks), especially ones that are highly correlated (to prevent zig-zagging during Gibbs, for instance). Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Precision The inverse of variance, often \(\sigma^{-2}\) or \(\Sigma^{-1}\) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Posterior of two multivariate normals, one determining the mean of the other
((same as proportional product of two normal pdfs)) Given \(N(\vec \mu_1, \Sigma_1)\) and \(N(\vec \mu_2, \Sigma_2)\), the product of their pdfs is a new \(N(\mu^*, \Sigma^*)\) with \[\Sigma^* = \left(\Sigma_1^{-1} + \Sigma_2^{-1}\right){^{-1}}\] \[\mu^* = \Sigma^* \left(\Sigma_1^{-1}\vec\mu_1 + \Sigma_2^{-1}\vec\mu_2\right)\] For the posterior result, just use \(\mu_2 := A\).

The precisions just sum. The mean is the precision-weighted average. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Shrinkage Estimator The idea is to bias the individual estimates towards the population mean to reduce the final MSE.
This is equivalent to just using Hierarchical Bayes with \(\tau^2\), where you estimate \(\tau^2\) given data first, then consider that for your other estimators. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Posterior Odds Ratio
(and the Bayes factor) \[ \frac{P(M_1 {\,|\,} D)}{P(M_2 {\,|\,} D)} = \frac{P(D {\,|\,} M_1)}{P(D {\,|\,} M_2)} \frac{P(M_1)}{P(M_2)} \] \(M_1\) and \(M_2\) are our models. The first multiplicand is the Bayes factor, the second multiplicand is the prior odds of one model versus the other. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Evidence \[P(D {\,|\,} M_1) = \int_\Theta P(D {\,|\,} \theta, M_1)P(\theta {\,|\,} M_1) d\theta\] also called the marginal likelihood.

Note that \(P(\theta {\,|\,} M_1)\) needs to be a proper prior! Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: The Jefferies Scale Let \(\alpha\) be a placeholder for the natural log of the Bayes factor.
\(\alpha < 1\): Terrible
\(1 < \alpha < 2.5\): Weak but significant
\(2.5 < \alpha < 5\): Significant
\(5 < \alpha\): Decisive Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Harmonic Mean Estimator \[P(D) \approx \left[ \frac{1}{M} \sum_{i=1}^M P(D {\,|\,} \theta_i){^{-1}} \right]{^{-1}}\] This is unstable in practice Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Savage-Dickey Ratio When we are comparing nested models, it's possible to simplify the Bayes factor to \[ \frac{P(\psi {\,|\,} D, M_1)}{P(\psi {\,|\,} M_1)} \] where \(\psi = 0\). Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Bayesian Model Averaging Bayesian model comparison between many models, not just two candidates. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Marginalization \[P(x) = \int P(x,y)dy = \int P(x{\,|\,} y) P(y) dy = {\mathbb{E}}_{y}[P(x {\,|\,} y)] \] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Split Multivariate Gaussian Marginals Suppose we have a partioned multivariate Gaussian like \[\boldsymbol{f} = \begin{pmatrix} \boldsymbol{U} \\\\ \boldsymbol{V} \end{pmatrix} \sim N\left( \begin{bmatrix} \boldsymbol{\mu}_U \\\\ \boldsymbol{\mu}_V \end{bmatrix}, \begin{bmatrix} \boldsymbol{\Sigma}_U & \boldsymbol{\Sigma}_{UV} \\\\ \boldsymbol{\Sigma}_{VU} & \boldsymbol{\Sigma}_V \end{bmatrix} \right)\] Then its marginals are exactly what we'd expect: \[P(U) = \int P(U,V)dV = N(U {\,|\,} \mu_U, \Sigma_U)\]
\[P(V) = \int P(U,V)dU = N(V {\,|\,} \mu_V, \Sigma_V)\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Split Multivariate Gaussian Conditional Probabilities \[ U {\,|\,} V \sim N({\mathbb{E}}[U {\,|\,} V], {\mathrm{Var}}[U {\,|\,} V]) \]
\[{\mathbb{E}} [U {\,|\,} V] = \mu_U + \Sigma_{UV} \Sigma_V{^{-1}} (V - \mu_V)\]
\[{\mathrm{Var}}[U {\,|\,} V] = \Sigma_U - \Sigma_{UV} \Sigma_V{^{-1}} \Sigma_{VU}\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics How can we construct a joint probability from \[V \sim N(V_0, \Sigma_V)\] and \[U {\,|\,} V \sim N(U_0 + XV, \Sigma_{U {\,|\,} V})\] \[\begin{pmatrix} \boldsymbol{U} \\\\ \boldsymbol{V} \end{pmatrix} \sim N\left( \begin{pmatrix} \boldsymbol{U}_0 + \boldsymbol{X}\boldsymbol{V}_0 \\\\ \boldsymbol{V}_0 \end{pmatrix}, \begin{pmatrix} \boldsymbol{X}\boldsymbol{\Sigma}_V \boldsymbol{X}^T + \boldsymbol{\Sigma}_{U|V} & \boldsymbol{X}\boldsymbol{\Sigma}_V \\\\ \boldsymbol{\Sigma}_V \boldsymbol{X}^T & \boldsymbol{\Sigma}_V \end{pmatrix} \right)\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: MLE and Unbiased Estimator for Normal RV \(\sigma\) MLE: \[\hat \sigma^2 = \frac{(x_i-\hat \mu)^2}{N}\] Unbiased Estimator: \[\hat \sigma^2 = \frac{(x_i-\hat \mu)^2}{N-1}\] where \(\hat \mu = \bar x\). Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Gaussian Process Time-dependent Gaussian distribution \[f(t) \sim GP(m(t), K_{A, \tau^2}(t, t'))\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Squared Exponential Kernel \[k(t,t') = A^2 \exp(-|t-t'|^2 / \tau^2)\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Stationary Gaussain Process \[K(t,t') = K(t + a, t' + a)\] for all \(a\) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: How to draw probabilistic graphical models Open circle / square nodes: latent parameter, unobserved data
Shaded nodes: Something we condition on (ie data)
Filled dot: Known constant
Plate: iid replications of what's inside Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Occam Factor
(and how to derive it) Define \(g(\theta) = P(\theta {\,|\,} D, M)P(D {\,|\,} M) = P(D {\,|\,} \theta, M)P(\theta {\,|\,} M)\).
Find a MAP estimate \(\theta_0 = {\underset{\Theta}{\mathrm{argmax}}\,\,} \ln(g(\theta))\), use a second order Taylor expansion for \(\ln(g(\theta))\) then exponentiate to get: \[g(\theta) \approx g(\theta) \exp(-\frac{1}{2} (\theta - \theta_0){^{\mathrm{T}}} A (\theta - \theta_0))\] where \(A\) is the Hessian of the log of \(g\) at the MAP
From there see that \[\int g(\theta)d\theta = \int P(\theta {\,|\,} D, M)P(D {\,|\,} M) d\theta = P(D{\,|\,} M) \int P(\theta {\,|\,} D, M)d\theta = P(D {\,|\,} M)\] So we have: \[P(D|M) \approx g(\theta_{MAP}) \det(A / 2\pi)^{-1/2} = P(D {\,|\,} \theta_0) P(\theta_0) \det(A / 2\pi)^{-1/2}\] \(P(\theta_0) \det(A / 2\pi)^{-1/2}\) is called the Occam factor. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Laplace Approximation for the Evidence Find a MAP estimate \(\theta_0 = \underset{\Theta}{\mathrm{argmax}}\,\, \ln(P^*(\theta \,|\, D))\), use a second order Taylor expansion for \(\ln(P^*(\theta \,|\, D))\) then exponentiate to get: \[P^*(\theta \,|\, D) \approx P^*(\theta_0 \,|\, D) \exp(-\frac{1}{2} (\theta - \theta_0)^{\mathrm{T}} A (\theta - \theta_0))\] where \(A\) is the Hessian at the MAP
Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Inverse of a 2x2 Matrix For a \(2 \times 2\) matrix: \[A = \begin{pmatrix} a & b \\\\ c & d \end{pmatrix}\] its inverse is: \[A^{-1} = \frac{1}{ad - bc}\begin{pmatrix} d & -b \\\\ -c & a \end{pmatrix}\]
To invert, swap the entries on the diagonal, negate the off-diagonal, and divide everything by the determinant. Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Goal of Nested Sampling We need to be able to evaluate the likelihood \(L(\theta) : = P(D {\,|\,} \theta, M)\) and the prior \(\pi(\theta) := P(\theta {\,|\,} M)\), we want to approximate the integral \[Z = \int P(D {\,|\,} \theta, M) P(\theta | M) d \theta = \int L(\theta)\pi(\theta) d\theta\]

(remember the prior is diffuse, and the likelihood is peaked, which makes this problem difficult) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Nested Sampling 1. Take \(N_{live}\) points and their likelihoods, start with \(Z = 0\).
2. Kill the point with the smallest likelihood, call it \(L_i^*\) for step \(i\)
3. (trivial) Compute the shrinkage factor \(t_i = \frac{X_i}{X_{i-1}} \approx e^{-1/N_{live}}\) (approximate with the mean of the beta-distribution \(\mathrm{Beta(N_{live}, 1)}\))
4. Accumulate evidence: \[\Delta Z = L_i^*(X_{i-1} - X_i) = L_i^* (1 - t_i) X_{i-1}\] 5. Sample new point with likelihood greater than \(L_i^*\)
6. Repeat steps 2-5 until convergence
7. Add information about remaining points to the evidence: \[\Delta Z = \bar L X_{end} = \bar L e^{-m/N_{live}} \] where \(\bar L\) is the average likelihood of remaining points, and \(m\) is the number of steps taken so far.
8. Compute the weighted posterior samples: \[w_i = \frac{L_i^*(1-t_i)X_{i-1}}{Z}\] Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Variational Inference Idea: approximate the evidence with a distribution that is 'close' (measured by Kullback-Leibler divergence) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Kullback-Leibler Divergence \[KL(q(\theta) {\,||\,} p(\theta) ) = - \int q(\theta) \ln \left( \frac{p(\theta)}{q(\theta)} \right)d\theta\] Nonnegative and zero only when \(p(\theta) = q(\theta)\), but NOT symmetric and so not a metric! Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: ELBO
(and how to derive it) Assume we want to minimize \(KL(q(\theta) {\,||\,} P(\theta {\,|\,} D))\). Break down the KL divergence to extract the \(p(D)\) piece, yielding \[KL(q(\theta) {\,||\,} P(\theta {\,|\,} D)) + {\mathbb{E}}_q[\ln(q(\theta)) - \ln (P(D {\,|\,} \theta)P(\theta))] = \ln(P(D))\] Call the expectation bit the Evidence (Log) Lower Bound (ELBO). If we want to minimize the KL divergence, we have to minimize it (can't ever do better than \(\ln(P(D))\)) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: The Apparent Magnitude Equation \[m = M + \mu\] where \(m\) is the true apparent magnitude, \(M\) is the absolute magnitude, and \(\mu\) is the distance modulus \(\mu = 25 + 5 \log_{10} (d)\). Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Definition: Parallax Equation \[\frac{\omega}{\mathrm{arcsec}} = \frac{\mathrm{parsec}}{d}\] where \(\omega\) is the true parallax angle and \(d\) is the distance to the star.
(note this uses small angle approximations, is not valid for large \(\omega\)) Astrostatistics PartIIINotes Basic Part III Notes::Astrostatistics Goal of ELBO We want to find a \(q\) that minimizes \[KL(q(\theta) {\,||\,} P(\theta {\,|\,} D))\] (approximate our evidence with a 'close' \(q\) distribution) Astrostatistics PartIIINotes