GP LSD
Least squares deconvolution¶
Least square deconvolution is a very simple technique applied for the enhancement of polarimetric signals needed for the analysis of stellar magnetism. The idea is that the signal-to-noise ratio per spectral line in solar type stars is not enough to detect the polarimetric signal (typically circular polarization, Stokes $V$). This was solved originally by M. Semel by adding all photospheric spectral lines when written in a common velocity axis. The method is based on the idea that noise will add incoherently, while the signal will add coherently. After adding many lines, the signal will appear.
This line addition method was superseeded by the least squares deconvolution (LSD) method. This method is based on the idea that, when the spectral lines are in the weak-field (the Zeeman splitting is smaller than the broadening of the line) and weak-line (the line is not saturated) regime, the Stokes $V$ profiles have the same shape: $$ V_i(v) = \alpha_i Z(v) $$
where $\alpha_i$ is just a proportionality constant that is usually set to $\alpha_i=\lambda_i d_i g_i$, with $\lambda_i$ the central wavelength of the line, $d_i$ the depth of the line, $g_i$ the Land\'e factor of the line and $v$ is the velocity axis.
If this is the case, one can obtain the value of $Z(v_j)$ by solving the least-squares problem for velocity point $j$:
$$ \chi^2 = \sum_i \left[ \frac{V_i(v_j) - \alpha_i Z(v_j)}{\sigma_{ij}} \right]^2 $$Gaussian process prior¶
The LSD method introduces no correlation at all between the value of $Z(v)$ at different velocities. It is interesting to do this because the theory states that the shape of the $Z(v)$ profile should be relatively smooth. We introduce this using a Gaussian Process.
Let us consider the posterior distribution for the profile $Z(v)$ seen as a vector:
$$ p(\mathbf{Z},\alpha_Z|\mathbf{V}) = p(\mathbf{V}|\mathbf{Z}) p(\mathbf{Z}|\alpha_Z) p(\alpha_Z) $$where $p(\mathbf{V}|\mathbf{Z})$ is the sampling distribution or likelihood, while $p(\mathbf{Z}|\alpha_Z)$ is the prior distribution for the profile, that depends on a set of hyperparameters $\alpha_Z$. Finally, we set a prior distribution for the hyperparameters. The observations $\mathbf{V}$ represent the Stokes $V$ profiles of line $i$ at wavelength $j$ ordered as: $\mathbf{V} = [V_1(v_1),V_2(v_1),V_3(v_1),\ldots,V_1(v_2),V_2(v_2),\ldots]^T$. Likewise, the $\mathbf{Z}$ vector is given by $\mathbf{Z}=[Z(v_1),Z(v_2),\ldots]^T$.
Using this notation, we note that the model we are proposing for the Stokes $V$ profile of each line is given by the matrix equation:
$$ \mathbf{V} = \mathbf{W} \mathbf{Z} $$where the matrix $\mathbf{W}$ is written as:
$$ \left[ \begin{array}{ccccc} \alpha_1 & 0 & 0 & 0 & \cdots \\ \alpha_2 & 0 & 0 & 0 & \cdots \\ \alpha_3 & 0 & 0 & 0 & \cdots \\ \cdots & \cdots & \cdots & \cdots & \cdots\\ 0 & \alpha_1 & 0 & 0 & \cdots \\ 0 & \alpha_2 & 0 & 0 & \cdots \\ 0 & \alpha_3 & 0 & 0 & \cdots \\ \cdots & \cdots & \cdots & \cdots & \cdots\\ \end{array} \right] $$In other words, $W_{ij}=\alpha_j \delta_{j,i \bmod j}$.
The previous posterior distribution describes a hierarchical model for the analysis of the common profiles $Z(v)$. Under the assumption of uncorrelated and iid Gaussian noise, the likelihood can be written as:
$$ p(\mathbf{V}|\mathbf{Z}) = N(\mathbf{V}|\mathbf{W} \mathbf{Z}, \sigma^2 \mathbf{I}) $$We put a Gaussian process prior on $\mathbf{Z}$, so that
$$ p(\mathbf{Z}|\alpha_Z) = N(\mathbf{Z}|0,\mathbf{K}(\alpha_Z)), $$which has zero mean and covariance matrix $\mathbf{K}(\alpha_Z)$ that depends on the set of hyperparameters $\alpha_Z$. We can take advantage of the Gaussian character of the likelihood and the prior to marginalize the value of $\mathbf{Z}$ from the model, so that:
$$ p(\alpha_Z|\mathbf{V}) = p(\alpha_Z) \int d\mathbf{Z} p(\mathbf{V}|\mathbf{Z}) p(\mathbf{Z}|\alpha_Z) $$Using the standard properties of the Gaussian distribution, the marginal posterior for $\alpha_Z$ is proportional to another Gaussian distribution:
$$ p(\alpha_Z|\mathbf{V}) = p(\alpha_Z) N(\mathbf{V}|\mathbf{0},\sigma^2 \mathbf{I} + \mathbf{W} \mathbf{K}(\alpha_Z) \mathbf{W}^T) $$Following a Type-II maximum-likelihood strategy, we optimize the previous marginal distribution with respect to $\alpha_Z$ and use these values later to estimate the distribution of $\mathbf{Z}$. Taking logarithms in the previous expression, we have to maximize the following expression:
$$ \log p(\alpha_Z|\mathbf{V}) = -\frac{1}{2} \mathbf{V}^T \left( \sigma^2 \mathbf{I} + \mathbf{W} \mathbf{K}(\alpha_Z) \mathbf{W}^T \right)^{-1} \mathbf{V} - \frac{1}{2} \log \left| \sigma^2 \mathbf{I} + \mathbf{W} \mathbf{K}(\alpha_Z) \mathbf{W}^T \right| $$Once the values of $\alpha_Z$ that maximize the marginal posterior are computed, we fix them to their ML values $\hat \alpha_Z$. Then, we compute the posterior for $\mathbf{Z}$ using the properties of the Gaussian distribution:
$$ p(\mathbf{Z}|\hat \alpha_Z,\mathbf{V}) = N(\mathbf{Z}|\mu_Z,\Sigma_Z) $$where
$$ \Sigma_Z = \left( K^{-1} + \sigma^{-2} \mathbf{W}^T \mathbf{W} \right)^{-1} \\ \mu_Z = \sigma^{-2} \mathbf{W}^T \mathbf{V} $$A simple example¶
Let us consider a simple example that mimicks what one wants to solve using LSD. Let us generate a set of artificial lines.
import numpy as np
import matplotlib.pyplot as pl
import seaborn as sn
%matplotlib inline
nLines = 50
nLambda = 50
sigma = 1.0
epsilon = 1e-8
alpha = np.random.rand(nLines)
vel = np.linspace(-4.0,4.0,50)
Z = vel * np.exp(-vel**2)
pl.plot(v,Z)
The artificial spectral lines are obtained multiplying the coefficient $\alpha$ and the common profile.
noise = sigma * np.random.randn(nLines,nLambda)
VObs = alpha[:,None] * Z[None,:] + noise
VObsVector = np.zeros(nLines*nLambda)
loop = 0
for i in range(nLambda):
for j in range(nLines):
VObsVector[loop] = alpha[j] * Z[i] + noise[j,i]
loop += 1
fig, ax = pl.subplots(nrows=3,ncols=3)
ax = ax.flatten()
for i in range(9):
ax[i].plot(VObs[i,:])
Let us apply standard LSD.
ZLSD = np.zeros(nLambda)
for i in range(nLambda):
ZLSD[i] = np.sum(VObs[:,i] * alpha) / np.sum(alpha**2)
pl.plot(v,ZLSD)
pl.plot(v,Z)
Now let us apply our GP method. First test if the marginal posterior for $\mathbf{Z}$ is correct using fixed values of the hyperparameters. We use a squared-exponential covariance function.
K = np.zeros((nLambda,nLambda)) for i in range(nLambda): K[i,:] = np.exp(-(vel[i]-vel)2 / 1.22) pl.imshow(K)
W = np.zeros((nLines*nLambda,nLambda))
loop = 0
for i in range(nLambda):
for j in range(nLines):
W[loop,i] = alpha[j]
loop += 1
KInv = np.linalg.inv(K + epsilon)
SigmaInv = KInv + 1.0 / sigma**2 * np.dot(W.T,W)
SigmaZ = np.linalg.inv(SigmaInv)
muZ = np.dot(SigmaZ,np.dot(W.T,VObsVector)) / sigma**2
variance = np.diagonal(SigmaZ)
pl.plot(v,Z)
pl.plot(v,muZ)
pl.fill_between(v, muZ-np.sqrt(variance), muZ+np.sqrt(variance))
def cholInvert(A):
L = np.linalg.cholesky(A)
LInv = np.linalg.inv(L)
AInv = np.dot(LInv.T, LInv)
logDeterminant = -2.0 * np.sum(np.log(np.diag(LInv))) # Why the minus sign?
return AInv, logDeterminant
Now we define a class that will carry out all the calculations.
class GPLSD:
def __init__(self, velocities, noise, linesObserved, alpha):
self.v = velocities
self.noise = noise
self.nLambda = len(velocities)
self.nLines = len(alpha)
self.linesObserved = linesObserved
self.K = np.zeros((self.nLambda,self.nLambda))
self.W = np.zeros((self.nLines*self.nLambda,self.nLambda))
loop = 0
for i in range(self.nLambda):
for j in range(self.nLines):
W[loop,i] = alpha[j]
loop += 1
def covariance(self, pars):
lambdaGP, sigmaGP = np.exp(pars)
K = np.zeros((self.nLambda,self.nLambda))
for i in range(nLambda):
K[i,:] = lambdaGP * np.exp(-(self.v[i]-self.v)**2 / sigmaGP**2)
return K
def marginal(self, pars):
K = self.covariance(pars)
Sigma = sigma**2 * np.identity(self.nLambda*self.nLines) + np.dot(W,np.dot(K,W.T))
SigmaInv, logD = cholInvert(Sigma)
return -0.5 * np.dot(self.linesObserved.T, np.dot(SigmaInv, self.linesObserved)) - 0.5 * logD
gp = GPLSD(v, sigma, VObsVector, alpha)
res=gp.marginal([1.0,0.5])
VObsVector.shape