A new philosophy for inversion codes
In general, an inversion code is a computer program/algorithm that allows you to extract information about the physical parameters of the system you have under study from the interpretation of the observables. In Solar Physics, inversion codes are routinely applied since the '70s to get the thermodynamical and magnetic properties of different regions of the solar atmosphere from the interpretation of the Stokes parameters.
With the use of fast slit spectropolarimeters and also two-dimensional filterpolarimeters, now we routinely have 2D maps of regions in the solar atmosphere observed at several points along one or several spectral lines. The interpretation of these observations have been done in the past by assuming that all pixels are completely uncorrelated and applying the inversion codes in a pixel-by-pixel basis. After this pixel-by-pixel inversion, one just plots maps of derived quantities and hope that these maps show some kind of smoothness, an indication that the inversion did not get crazy. It has become more and more important to realize that full 2D inversions are the way to proceed, even when assuming local thermodynamical equilibrium (so neglecting any effect of horizontal radiative coupling). van Noort (2012) proposed and developed an inversion code that inverts a map as a whole by taking advantage of the fact that the point spread function (PSF) of the telescope couples the observed Stokes profiles of nearby pixels.
In this contribution, I want to go a step further and propose a strategy to do real 2D inversions of spectropolarimetric data using the ideas of regularization induced by the assumption of sparsity, which has been shown in the last years to be extremely successful.
The idea is very simple. Let us assume that we have a set of model parameters that we encode on the vector $\mathbf{p}=(\mathbf{p}_1,\mathbf{p}_1,\ldots)$. Each parameter $\mathbf{p}_i$ is a map of a given physical parameters. For instance, we can be interested on the value of the temperature, velocity and magnetic field at different heights. Instead of working directly on the real space, we move to a transformed space, where the parameters are given by:
$$ \mathbf{\hat{p}}_i = \mathbf{W}[\mathbf{p}_1], $$where $\mathbf{W}$ is a linear operator that transforms from the real space to the transformed space. This transformation is such that the resulting transformed image is sparse, i.e., many elements of the transformed image are zero or very close to zero in absolute value. The wavelet transform is known to be of this kind for general natural images. The Fourier transform is known to produce sparse transformed images when signals are periodic.
Working on the transformed space provides two obvious improvements:
- The solution to the inversion has to be sparse. If our map is $N \times N$, only a small fraction of the potential $N^2$ coefficients have to be non-zero. This introduces a strong regularization because the number of unknowns is heavily reduced. Typical values of the sparsity will be around 10-20%.
- The transform is a global operation so, modifying the value of a single coefficient in $\mathbf{\hat{p}}_i$ induces changes in all the pixels of the original space. Consequently, the information encoded in a certain pixel and extracted by the inversion process is propagated to all the coefficients $\mathbf{\hat{p}}_i$ that we need to obtain.
A typical inversion code works by optimizing a merit function that measures the $\ell_2$ norm of the residuals:
$$ \chi^2 = \sum_i \frac{\left[S(\lambda_i)-O(\lambda_i)\right]^2}{\sigma_i^2} $$where $S(\lambda_i)$ refers to the synthetic Stokes profiles at wavelength position $i$ and $O(\lambda_i)$ is the observed Stokes profile at this wavelength, while $\sigma_i$ is the standard deviation of the noise. The optimization is usually carried out by the direct application of the Levenberg-Marquardt (LM) algorithm, which is specially suited to the optimization of such quadratic functions.
The problem with the LM algorithm is that is is hard to impose any sparsity contraint. The reason is that, in reality, what we want to solve in this new philosophy for inversion codes is the problem:
$$ \min_\mathbf{\hat{p}} \chi^2 = \sum_i \frac{\left[S(\lambda_i,\mathbf{\hat{p}})-O(\lambda_i)\right]^2}{\sigma_i^2}, \mathrm{subject\, to\,} \Vert \mathbf{\hat{p}} \Vert_0 \leq s $$In other words, we want to minimize the value of the $\chi^2$ but using the smallest number of non-zero elements of $\mathbf{\hat{p}}$. We remind that the $\ell_0$ norm just counts the number of non-zero elements, with $s$ the number of such non-zero elements that we want. The previous problem is exactly the one that solves all the compressed sensing algorithms, but in the linear case. Our case here is much more complicated because the dependence of the $\chi^2$ on the values of $\mathbf{\hat{p}}$ is, in general, nonlinear. Fortunately, we have recent extensions of the Iterative Hard Thresholding greedy algorithm to nonlinear compressed sensing and it has been demonstrated that the previous problem can be solved, under some conditions, with the following iteration:
$$ \mathbf{\hat{p}}_{t+1} = \mathcal{P}_C \left[ \mathbf{\hat{p}}_t - \frac{1}{L} \nabla \chi^2(\mathbf{\hat{p}}_t) \right] $$In essence, this iteration is just a gradient descent algorithm but using a hard thresholding projection operator in each iteration. In other words, after moving the solution on the direction of the negative gradient (controlled by a step-size $L$), then one sets to zero all elements that do not fulfill the sparsity constraint. I witness several options here for the $\mathcal{P}_C$ operator:
- Set to zero a fixed fraction of the elements of $\mathbf{\hat{p}}$, for instance 90% of them.
- Set to zero all elements below a certain threshold below the largest element of $\mathbf{\hat{p}}$.
- Combinations of the previous
The gradient of the $\chi^2$ can be easily obtained in terms of the response functions (derivatives of the Stokes profiles with respect to the model parameters) and also taking into account the transformation. Therefore:
$$ \frac{\partial \chi^2}{\partial \mathbf{\hat{p}}} = \mathbf{W}^{-1} \frac{\partial \chi^2}{\partial \mathbf{p}} $$Some details are still missing like the election of the step size $L$. A possibility is to use line search until the Wolfe conditions are met.
The gradient descent method is known to be fast when approaching the minimum, but slow for refinement. That is precisely the reason why the Levenberg-Marquardt method is a combination of a gradient descent method and a Newton method (controlled by the Hessian). There is a modification of the gradient descent method that
Example¶
As a working example, let us take a region of $64 \times 64$ pixels observed with Hinode and focus on the Stokes I profile of the Fe I line at 630.15 nm. We want to explain this line with the following simplistic model:
$$ I(\lambda) = I_c \left[ 1- d_c \exp \left( \frac{(\lambda-\lambda_0-\lambda_0 v/c)^2}{\Delta^2}\right) \right]. $$The parameters of our model are, for all the pixels, then the continuum intensity $I_c$, the line absorption $d_c$, the Doppler velocity $v$ and the width of the line $\Delta$. We use the wavelet transform with the db1 mother function and keep only 10% of the coefficients for every parameter. Here is a Python code that does the work. Note that the only difference with all existing inversion codes relies on the thresholding and the computation of the gradient, that needs to take into account the transformation.
We use PyWavelets as the module to deal with wavelet transforms. First I define a small class to transform from a to the wavelet domain using images instead of a list of coefficients because this is clearer.
import numpy as np
import pywt
import numpy as np
import matplotlib.pyplot as pl
import pywt
import scipy.optimize as op
import pdb
import pyfits as pf
import glob
import socket
from scipy.special import wofz
%matplotlib inline
#------------------------------------------
# Returns the Voigt function for an axis of wavelengths l and damping parameter a
#------------------------------------------
def fvoigt(l, a):
z = l + 1j*a
return wofz(z).real
class wavelet(object):
def __init__(self, nPixel, waveletClass):
self.nPixel = nPixel
a = np.zeros((nPixel,nPixel))
res = pywt.wavedec2(a, waveletClass)
self.nLevels = len(res)
def waveletToArray(self, coeffs):
"""
Transform the list of tuples returned by the 2D decomposition of PyWavelet to an array ordered in the typical recursive way for wavelets
[[cA],[cH]]
[[cV],[cD]]
"""
res = coeffs[0][0][:,np.newaxis]
for i in range(1,self.nLevels):
resH1 = np.hstack((res, coeffs[i][0]))
resH2 = np.hstack((coeffs[i][1], coeffs[i][2]))
res = np.vstack((resH1, resH2))
return res
def arrayToWavelet(self, image):
"""
Transform an array ordered in the typical recursive way for wavelets into the list of tuples returned by the 2D decomposition of PyWavelet
[[cA],[cH]]
[[cV],[cD]]
"""
arr = image
out = []
for i in range(0,self.nLevels-1):
res = np.vsplit(arr, 2)
resH1 = np.hsplit(res[0],2)
resH2 = np.hsplit(res[1],2)
arr = resH1[0]
out.append((resH1[1],resH2[0],resH2[1]))
out.append(resH1[0])
return out[::-1]
class intenME(object):
def __init__(self):
self.nPix = 64
self.noise = 1e-3
self.wavelet = wavelet(64, 'db1')
self.hostName = socket.gethostname()
self.lambda0 = 6301.5e0
self.loadObservations()
self.parameters = {'cont': np.zeros(self.nPix**2), 'dc': np.zeros(self.nPix**2), 'vel': np.zeros(self.nPix**2), 'delta': np.zeros(self.nPix**2)}
self.final = {'cont': np.zeros((self.nPix,self.nPix)), 'dc': np.zeros((self.nPix,self.nPix)), 'vel': np.zeros((self.nPix,self.nPix)), 'delta': np.zeros((self.nPix,self.nPix))}
self.parNames = ['cont', 'dc', 'vel', 'delta']
def loadObservations(self):
if (self.hostName == 'viga'):
root = '/scratch1/aasensio/HINODE/MAP_LITES_LEV1/'
else:
root = '/scratch/OBSERVATIONS/mapLitesLev1/'
files = glob.glob(root+'*.fits')
files.sort()
nFiles = len(files)
self.wavelength = np.loadtxt(root+'wavelengthHinode.txt')[0:55]
deltaWavelength = self.wavelength[1]-self.wavelength[0]
self.nWavelength = len(self.wavelength)
f = files[0]
hdu = pf.open(f)
nSlits = hdu[0].header['NAXIS2']
self.stI = np.zeros((self.nPix,self.nPix,self.nWavelength))
hdu.close()
for i in range(self.nPix):
f = files[i]
hdu = pf.open(f)
for j in range(self.nPix):
self.stI[i,j,:] = hdu[0].data[0,j,0:55].astype(float)
#self.stI[i,j,:] /= self.stI[i,j,0]
self.stI = self.stI.reshape((self.nPix*self.nPix,self.nWavelength)).T
avgCont = np.mean(self.stI[0,:])
self.stI /= avgCont
def evalModel(self, cont, dc, vel, delta):
p = (self.wavelength[:,np.newaxis] - self.lambda0 - self.lambda0 * 1e5 * vel[np.newaxis,:] / 3e10) / delta
profile = fvoigt(p, 0.0)
model = cont * (1.0 - dc * profile)
contDiff = 1.0 - dc * profile
dcDiff = -cont * profile
vDiff = -cont * dc * (-2.0 * p * profile) * (-1e5 * self.lambda0 / (3e10 * delta))
deltaDiff = -cont * dc * (-2.0 * p * profile) * (-p / delta)
return model, [contDiff, dcDiff, vDiff, deltaDiff]
def logPosterior(self, pars):
for par in range(4):
left = par * self.nPix**2
right = (par+1) * self.nPix**2
temp = pars[left:right]
coeff = self.wavelet.arrayToWavelet(temp.reshape(self.nPix,self.nPix))
self.parameters[self.parNames[par]] = pywt.waverec2(coeff, 'db1').reshape(self.nPix*self.nPix)
model, gradModel = self.evalModel(self.parameters['cont'], self.parameters['dc'], self.parameters['vel'], self.parameters['delta'])
chi2 = np.sum( (model-self.stI)**2 ) / (2.0*self.noise**2)
grad = np.zeros(4*self.nPix*self.nPix)
for par in range(4):
left = par * self.nPix**2
right = (par+1) * self.nPix**2
dChi2 = gradModel[par] * (model-self.stI) / self.noise**2
for i in range(self.nWavelength):
res = pywt.wavedec2(dChi2[i,:].reshape(self.nPix,self.nPix), 'db1')
grad[left:right] += self.wavelet.waveletToArray(res).reshape(self.nPix*self.nPix)
return chi2, grad
def testDerivative(self):
pars = np.ones(4*self.nPix*self.nPix)
l1, d1 = self.logPosterior(pars)
indices = [0,100,4096,4196,8192,8292,12288,12388]
for which in indices:
print pars[which]
pars[which] += 1e-5*pars[which]
print pars[which]
l2, d2 = self.logPosterior(pars)
print (l2-l1) / (1e-5*pars[which]), d1[which]
pars = np.ones(4*self.nPix*self.nPix)
def threshold(self, x, thr):
ind = np.argsort(np.abs(x))
x[ind[0:(1-thr)*ind.size]] = 0.0
return x
def plotImages(self):
fig, ax = pl.subplots(ncols=2, nrows=2, figsize=(10,10))
ax = ax.flatten()
for i in range(4):
im = ax[i].imshow(self.final[self.parNames[i]])
ax[i].figure.colorbar(im, ax=ax[i], orientation="vertical", shrink=0.8)
ax[i].set_title(self.parNames[i])
def optimize(self, threshold=0.2):
self.x = np.zeros(4*self.nPix*self.nPix)
# Initial values
initial = (self.stI[0,:].reshape(self.nPix,self.nPix), 0.5*np.ones((self.nPix,self.nPix)), np.zeros((self.nPix,self.nPix)), 0.08*np.ones((self.nPix,self.nPix)))
for par in range(4):
left = par * self.nPix**2
right = (par+1) * self.nPix**2
coeffs = pywt.wavedec2(initial[par], 'db1')
self.x[left:right] = self.wavelet.waveletToArray(coeffs).reshape(self.nPix*self.nPix)
logLikeOld = 1e10
for i in range(50):
logLike, dlogLike = self.logPosterior(self.x)
delta = self.threshold(dlogLike * 3e-9, threshold)
self.x -= delta
deltaLogLike = logLike - logLikeOld
logLikeOld = logLike
print i, logLike / (self.nWavelength * self.nPix**2), deltaLogLike / (self.nWavelength * self.nPix**2)
def finalTransform(self):
for par in range(4):
left = par * self.nPix**2
right = (par+1) * self.nPix**2
temp = self.x[left:right]
coeff = self.wavelet.arrayToWavelet(temp.reshape(self.nPix,self.nPix))
self.final[self.parNames[par]] = pywt.waverec2(coeff, 'db1')
out = intenME()
out.optimize(threshold=0.1)
out.finalTransform()
out.plotImages()