A new philosophy for inversion codes

  |   Source

In general, an inversion code is a computer program/algorithm that allows you to extract information about the physical parameters of the system you have under study from the interpretation of the observables. In Solar Physics, inversion codes are routinely applied since the '70s to get the thermodynamical and magnetic properties of different regions of the solar atmosphere from the interpretation of the Stokes parameters.

With the use of fast slit spectropolarimeters and also two-dimensional filterpolarimeters, now we routinely have 2D maps of regions in the solar atmosphere observed at several points along one or several spectral lines. The interpretation of these observations have been done in the past by assuming that all pixels are completely uncorrelated and applying the inversion codes in a pixel-by-pixel basis. After this pixel-by-pixel inversion, one just plots maps of derived quantities and hope that these maps show some kind of smoothness, an indication that the inversion did not get crazy. It has become more and more important to realize that full 2D inversions are the way to proceed, even when assuming local thermodynamical equilibrium (so neglecting any effect of horizontal radiative coupling). van Noort (2012) proposed and developed an inversion code that inverts a map as a whole by taking advantage of the fact that the point spread function (PSF) of the telescope couples the observed Stokes profiles of nearby pixels.

In this contribution, I want to go a step further and propose a strategy to do real 2D inversions of spectropolarimetric data using the ideas of regularization induced by the assumption of sparsity, which has been shown in the last years to be extremely successful.

The idea is very simple. Let us assume that we have a set of model parameters that we encode on the vector $\mathbf{p}=(\mathbf{p}_1,\mathbf{p}_1,\ldots)$. Each parameter $\mathbf{p}_i$ is a map of a given physical parameters. For instance, we can be interested on the value of the temperature, velocity and magnetic field at different heights. Instead of working directly on the real space, we move to a transformed space, where the parameters are given by:

$$ \mathbf{\hat{p}}_i = \mathbf{W}[\mathbf{p}_1], $$

where $\mathbf{W}$ is a linear operator that transforms from the real space to the transformed space. This transformation is such that the resulting transformed image is sparse, i.e., many elements of the transformed image are zero or very close to zero in absolute value. The wavelet transform is known to be of this kind for general natural images. The Fourier transform is known to produce sparse transformed images when signals are periodic.

Working on the transformed space provides two obvious improvements:

  • The solution to the inversion has to be sparse. If our map is $N \times N$, only a small fraction of the potential $N^2$ coefficients have to be non-zero. This introduces a strong regularization because the number of unknowns is heavily reduced. Typical values of the sparsity will be around 10-20%.
  • The transform is a global operation so, modifying the value of a single coefficient in $\mathbf{\hat{p}}_i$ induces changes in all the pixels of the original space. Consequently, the information encoded in a certain pixel and extracted by the inversion process is propagated to all the coefficients $\mathbf{\hat{p}}_i$ that we need to obtain.

A typical inversion code works by optimizing a merit function that measures the $\ell_2$ norm of the residuals:

$$ \chi^2 = \sum_i \frac{\left[S(\lambda_i)-O(\lambda_i)\right]^2}{\sigma_i^2} $$

where $S(\lambda_i)$ refers to the synthetic Stokes profiles at wavelength position $i$ and $O(\lambda_i)$ is the observed Stokes profile at this wavelength, while $\sigma_i$ is the standard deviation of the noise. The optimization is usually carried out by the direct application of the Levenberg-Marquardt (LM) algorithm, which is specially suited to the optimization of such quadratic functions.

The problem with the LM algorithm is that is is hard to impose any sparsity contraint. The reason is that, in reality, what we want to solve in this new philosophy for inversion codes is the problem:

$$ \min_\mathbf{\hat{p}} \chi^2 = \sum_i \frac{\left[S(\lambda_i,\mathbf{\hat{p}})-O(\lambda_i)\right]^2}{\sigma_i^2}, \mathrm{subject\, to\,} \Vert \mathbf{\hat{p}} \Vert_0 \leq s $$

In other words, we want to minimize the value of the $\chi^2$ but using the smallest number of non-zero elements of $\mathbf{\hat{p}}$. We remind that the $\ell_0$ norm just counts the number of non-zero elements, with $s$ the number of such non-zero elements that we want. The previous problem is exactly the one that solves all the compressed sensing algorithms, but in the linear case. Our case here is much more complicated because the dependence of the $\chi^2$ on the values of $\mathbf{\hat{p}}$ is, in general, nonlinear. Fortunately, we have recent extensions of the Iterative Hard Thresholding greedy algorithm to nonlinear compressed sensing and it has been demonstrated that the previous problem can be solved, under some conditions, with the following iteration:

$$ \mathbf{\hat{p}}_{t+1} = \mathcal{P}_C \left[ \mathbf{\hat{p}}_t - \frac{1}{L} \nabla \chi^2(\mathbf{\hat{p}}_t) \right] $$

In essence, this iteration is just a gradient descent algorithm but using a hard thresholding projection operator in each iteration. In other words, after moving the solution on the direction of the negative gradient (controlled by a step-size $L$), then one sets to zero all elements that do not fulfill the sparsity constraint. I witness several options here for the $\mathcal{P}_C$ operator:

  • Set to zero a fixed fraction of the elements of $\mathbf{\hat{p}}$, for instance 90% of them.
  • Set to zero all elements below a certain threshold below the largest element of $\mathbf{\hat{p}}$.
  • Combinations of the previous

The gradient of the $\chi^2$ can be easily obtained in terms of the response functions (derivatives of the Stokes profiles with respect to the model parameters) and also taking into account the transformation. Therefore:

$$ \frac{\partial \chi^2}{\partial \mathbf{\hat{p}}} = \mathbf{W}^{-1} \frac{\partial \chi^2}{\partial \mathbf{p}} $$

Some details are still missing like the election of the step size $L$. A possibility is to use line search until the Wolfe conditions are met.

The gradient descent method is known to be fast when approaching the minimum, but slow for refinement. That is precisely the reason why the Levenberg-Marquardt method is a combination of a gradient descent method and a Newton method (controlled by the Hessian). There is a modification of the gradient descent method that

Example

As a working example, let us take a region of $64 \times 64$ pixels observed with Hinode and focus on the Stokes I profile of the Fe I line at 630.15 nm. We want to explain this line with the following simplistic model:

$$ I(\lambda) = I_c \left[ 1- d_c \exp \left( \frac{(\lambda-\lambda_0-\lambda_0 v/c)^2}{\Delta^2}\right) \right]. $$

The parameters of our model are, for all the pixels, then the continuum intensity $I_c$, the line absorption $d_c$, the Doppler velocity $v$ and the width of the line $\Delta$. We use the wavelet transform with the db1 mother function and keep only 10% of the coefficients for every parameter. Here is a Python code that does the work. Note that the only difference with all existing inversion codes relies on the thresholding and the computation of the gradient, that needs to take into account the transformation.

We use PyWavelets as the module to deal with wavelet transforms. First I define a small class to transform from a to the wavelet domain using images instead of a list of coefficients because this is clearer.

In [5]:
import numpy as np
import pywt
In [8]:
import numpy as np
import matplotlib.pyplot as pl
import pywt
import scipy.optimize as op
import pdb
import pyfits as pf
import glob
import socket
from scipy.special import wofz
%matplotlib inline

#------------------------------------------
# Returns the Voigt function for an axis of wavelengths l and damping parameter a
#------------------------------------------
def fvoigt(l, a):
	z = l + 1j*a
	return wofz(z).real

class wavelet(object):
	def __init__(self, nPixel, waveletClass):
		self.nPixel = nPixel
		
		a = np.zeros((nPixel,nPixel))
		res = pywt.wavedec2(a, waveletClass)
		
		self.nLevels = len(res)
				
	
	def waveletToArray(self, coeffs):
		"""
		Transform the list of tuples returned by the 2D decomposition of PyWavelet to an array ordered in the typical recursive way for wavelets
		[[cA],[cH]]
		[[cV],[cD]]
		"""
		res = coeffs[0][0][:,np.newaxis]
		for i in range(1,self.nLevels):
			resH1 = np.hstack((res, coeffs[i][0]))
			resH2 = np.hstack((coeffs[i][1], coeffs[i][2]))
			res = np.vstack((resH1, resH2))
		return res
	
	def arrayToWavelet(self, image):
		"""
		Transform an array ordered in the typical recursive way for wavelets into the list of tuples returned by the 2D decomposition of PyWavelet
		[[cA],[cH]]
		[[cV],[cD]]
		"""
		arr = image
		out = []
		for i in range(0,self.nLevels-1):
			res = np.vsplit(arr, 2)
			resH1 = np.hsplit(res[0],2)
			resH2 = np.hsplit(res[1],2)			
			arr = resH1[0]			
			out.append((resH1[1],resH2[0],resH2[1]))
		out.append(resH1[0])
		return out[::-1]

class intenME(object):
	def __init__(self):
		self.nPix = 64
		
		self.noise = 1e-3
		self.wavelet = wavelet(64, 'db1')
				
		self.hostName = socket.gethostname()
		
		self.lambda0 = 6301.5e0
		
		self.loadObservations()
		
		self.parameters = {'cont': np.zeros(self.nPix**2), 'dc': np.zeros(self.nPix**2), 'vel': np.zeros(self.nPix**2), 'delta': np.zeros(self.nPix**2)}
		
		self.final = {'cont': np.zeros((self.nPix,self.nPix)), 'dc': np.zeros((self.nPix,self.nPix)), 'vel': np.zeros((self.nPix,self.nPix)), 'delta': np.zeros((self.nPix,self.nPix))}
		
		self.parNames = ['cont', 'dc', 'vel', 'delta']
				
	def loadObservations(self):
		
		if (self.hostName == 'viga'):
			root = '/scratch1/aasensio/HINODE/MAP_LITES_LEV1/'
		else:
			root = '/scratch/OBSERVATIONS/mapLitesLev1/'
			
		files = glob.glob(root+'*.fits')
		files.sort()
		nFiles = len(files)
		self.wavelength = np.loadtxt(root+'wavelengthHinode.txt')[0:55]
		deltaWavelength = self.wavelength[1]-self.wavelength[0]
		self.nWavelength = len(self.wavelength)
		
		f = files[0]
		hdu = pf.open(f)
		nSlits = hdu[0].header['NAXIS2']
		self.stI = np.zeros((self.nPix,self.nPix,self.nWavelength))
		hdu.close()

		for i in range(self.nPix):
			f = files[i]
			hdu = pf.open(f)
			for j in range(self.nPix):
				self.stI[i,j,:] = hdu[0].data[0,j,0:55].astype(float)
				#self.stI[i,j,:] /= self.stI[i,j,0]
				
		self.stI = self.stI.reshape((self.nPix*self.nPix,self.nWavelength)).T
		
		avgCont = np.mean(self.stI[0,:])
		self.stI /= avgCont
		
	
	def evalModel(self, cont, dc, vel, delta):
		p = (self.wavelength[:,np.newaxis] - self.lambda0 - self.lambda0 * 1e5 * vel[np.newaxis,:] / 3e10) / delta
			
		profile = fvoigt(p, 0.0)
		
		model = cont * (1.0 - dc * profile)
		
		contDiff = 1.0 - dc * profile
		dcDiff = -cont * profile
		vDiff = -cont * dc * (-2.0 * p * profile) * (-1e5 * self.lambda0 / (3e10 * delta))
		deltaDiff = -cont * dc * (-2.0 * p * profile) * (-p / delta)
		
		return model, [contDiff, dcDiff, vDiff, deltaDiff]
		
	def logPosterior(self, pars):
				
		for par in range(4):
			left = par * self.nPix**2
			right = (par+1) * self.nPix**2
			temp = pars[left:right]
			coeff = self.wavelet.arrayToWavelet(temp.reshape(self.nPix,self.nPix))			
			self.parameters[self.parNames[par]] = pywt.waverec2(coeff, 'db1').reshape(self.nPix*self.nPix)
			
			
		model, gradModel = self.evalModel(self.parameters['cont'], self.parameters['dc'], self.parameters['vel'], self.parameters['delta'])
				
		chi2 = np.sum( (model-self.stI)**2 ) / (2.0*self.noise**2)
				
		grad = np.zeros(4*self.nPix*self.nPix)
		
		for par in range(4):
			left = par * self.nPix**2
			right = (par+1) * self.nPix**2
			
			dChi2 = gradModel[par] * (model-self.stI) / self.noise**2
					
			for i in range(self.nWavelength):
				res = pywt.wavedec2(dChi2[i,:].reshape(self.nPix,self.nPix), 'db1')
				grad[left:right] += self.wavelet.waveletToArray(res).reshape(self.nPix*self.nPix)
				
		return chi2, grad
	
	def testDerivative(self):
		pars = np.ones(4*self.nPix*self.nPix)
		l1, d1 = self.logPosterior(pars)
		
		indices = [0,100,4096,4196,8192,8292,12288,12388]
		for which in indices:
			print pars[which]
			pars[which] += 1e-5*pars[which]
			print pars[which]
			l2, d2 = self.logPosterior(pars)
			print (l2-l1) / (1e-5*pars[which]), d1[which]
			pars = np.ones(4*self.nPix*self.nPix)
		
	def threshold(self, x, thr):
		ind = np.argsort(np.abs(x))
		x[ind[0:(1-thr)*ind.size]] = 0.0
		return x
	
	def plotImages(self):
		fig, ax = pl.subplots(ncols=2, nrows=2, figsize=(10,10))
		ax = ax.flatten()
		for i in range(4):
			im = ax[i].imshow(self.final[self.parNames[i]])
			ax[i].figure.colorbar(im, ax=ax[i], orientation="vertical", shrink=0.8)
			ax[i].set_title(self.parNames[i])
				
	def optimize(self, threshold=0.2):

		self.x = np.zeros(4*self.nPix*self.nPix)
# Initial values
		initial = (self.stI[0,:].reshape(self.nPix,self.nPix), 0.5*np.ones((self.nPix,self.nPix)), np.zeros((self.nPix,self.nPix)), 0.08*np.ones((self.nPix,self.nPix)))
		for par in range(4):
			left = par * self.nPix**2
			right = (par+1) * self.nPix**2
			coeffs = pywt.wavedec2(initial[par], 'db1')
			self.x[left:right] = self.wavelet.waveletToArray(coeffs).reshape(self.nPix*self.nPix)
						
		logLikeOld = 1e10
		for i in range(50):			
			logLike, dlogLike = self.logPosterior(self.x)
			delta = self.threshold(dlogLike * 3e-9, threshold)
			self.x -= delta
			
			deltaLogLike = logLike - logLikeOld
			logLikeOld = logLike
			print i, logLike / (self.nWavelength * self.nPix**2), deltaLogLike / (self.nWavelength * self.nPix**2)
			
	
	def finalTransform(self):
		for par in range(4):
			left = par * self.nPix**2
			right = (par+1) * self.nPix**2
			temp = self.x[left:right]
			coeff = self.wavelet.arrayToWavelet(temp.reshape(self.nPix,self.nPix))			
			self.final[self.parNames[par]] = pywt.waverec2(coeff, 'db1')				
		
out = intenME()
out.optimize(threshold=0.1)

out.finalTransform()
0 3702.21719743 -40686.987348
1 2216.05119789 -1486.16599954
2 1808.27582807 -407.775369812
3 1648.25457599 -160.021252085
4 1571.58233806 -76.6722379297
5 1527.11841687 -44.4639211904
6 1496.22311226 -30.8953046116
7 1471.49987069 -24.7232415675
8 1449.85647963 -21.6433910587
9 1429.93848583 -19.9179937976
10 1411.12986189 -18.8086239462
11 1393.13594441 -17.9939174786
12 1375.80590543 -17.3300389762
13 1359.05346301 -16.7524424255
14 1342.82572507 -16.2277379372
15 1327.08636176 -15.739363305
16 1311.80629075 -15.2800710145
17 1296.96173354 -14.8445572121
18 1282.53313102 -14.4286025131
19 1268.50274801 -14.0303830158
20 1254.85496878 -13.6477792237
21 1241.57528727 -13.2796815194
22 1228.64990733 -12.9253799355
23 1216.06618019 -12.5837271355
24 1203.81306458 -12.2531156154
25 1191.88040582 -11.9326587623
26 1180.25634666 -11.6240591572
27 1168.9335039 -11.322842762
28 1157.90048288 -11.0330210179
29 1147.14936248 -10.7511204026
30 1136.67205983 -10.4773026456
31 1126.45989087 -10.212168966
32 1116.50548012 -9.954410747
33 1106.80120765 -9.70427247033
34 1097.33974671 -9.46146093416
35 1088.11388942 -9.22585729564
36 1079.11738766 -8.99650175624
37 1070.343681 -8.77370666269
38 1061.78602468 -8.55765631984
39 1053.43853958 -8.34748510322
40 1045.29595157 -8.142588004
41 1037.35195555 -7.94399601858
42 1029.60174967 -7.75020587936
43 1022.0391332 -7.56261647535
44 1014.65974542 -7.37938777975
45 1007.45809392 -7.20165150174
46 1000.42950429 -7.02858962671
47 993.569379567 -6.86012472378
48 986.872795409 -6.69658415783
49 980.335840708 -6.53695470135
In [9]:
out.plotImages()
In [ ]: