ligaBBVAHierarchical
import numpy as np
import matplotlib.pyplot as pl
import pymc
import ipdb
import seaborn as sns
#import daft
%matplotlib inline
After a football season finishes, we can use the results of all the matches to deduce the quality of each team. This has been done with the italian league in the paper 'Bayesian hierarchical model for the prediction of football results' by Gianluca Baio and Marta A. Blangiardo. This work has been applied to the Premier A league in http://danielweitzenfeld.github.io/passtheroc/blog/2014/10/28/bayes-premier-league/
Importance trick for hierarchical models
import numpy as np
import matplotlib.pyplot as pl
import emcee
import scipy.misc as mi
from ipdb import set_trace as stop
%matplotlib inline
Let's generate the data. They are linear trends with slopes obtained from a N(0,1) distribution and with noise added. The number of cases is N and the number of events in each case is a random number between 1 and 5
GP LSD
Least squares deconvolution¶
Least square deconvolution is a very simple technique applied for the enhancement of polarimetric signals needed for the analysis of stellar magnetism. The idea is that the signal-to-noise ratio per spectral line in solar type stars is not enough to detect the polarimetric signal (typically circular polarization, Stokes $V$). This was solved originally by M. Semel by adding all photospheric spectral lines when written in a common velocity axis. The method is based on the idea that noise will add incoherently, while the signal will add coherently. After adding many lines, the signal will appear.
This line addition method was superseeded by the least squares deconvolution (LSD) method. This method is based on the idea that, when the spectral lines are in the weak-field (the Zeeman splitting is smaller than the broadening of the line) and weak-line (the line is not saturated) regime, the Stokes $V$ profiles have the same shape:
Quiet Sun Model
import pymc
import numpy as np
import matplotlib.pyplot as pl
import seaborn as sn
import ipdb
%matplotlib inline
def LOS_to_LV(theta, costhetaB_LOS, sinthetaB_LOS, cosphiB_LOS, sinphiB_LOS):
costhetaB_LV = np.cos(theta) * costhetaB_LOS - np.sin(theta) * sinthetaB_LOS * cosphiB_LOS
sinthetaB_LV = np.sqrt(1.0-costhetaB_LV**2)
cosphiB_LV = (np.cos(theta) * sinthetaB_LOS * cosphiB_LOS + costhetaB_LOS * np.sin(theta)) / sinthetaB_LV
sinphiB_LV = sinthetaB_LOS * sinphiB_LOS / sinthetaB_LV
return costhetaB_LV, sinthetaB_LV, cosphiB_LV, sinphiB_LV
def LV_to_LOS(theta, costhetaB_LV, sinthetaB_LV, cosphiB_LV, sinphiB_LV):
costhetaB_LOS = np.cos(theta) * costhetaB_LV + np.sin(theta) * sinthetaB_LV * cosphiB_LV
sinthetaB_LOS = np.sqrt(1.0-costhetaB_LOS**2)
cosphiB_LOS = (np.cos(theta) * sinthetaB_LV * cosphiB_LV - costhetaB_LV * np.sin(theta)) / sinthetaB_LOS
sinphiB_LOS = sinthetaB_LV * sinphiB_LV / sinthetaB_LOS
return costhetaB_LOS, sinthetaB_LOS, cosphiB_LOS, sinphiB_LOS
nPatches = 10
nMax = 50
nPointsPatch = nMax * np.random.rand(nPatches)
muPatch = np.random.rand(nPatches)
for i in range(nPatches):
Change of variables for weak field
import numpy as np
import matplotlib.pyplot as pl
%matplotlib inline
BParObs = 300.0*np.cos(60.0*np.pi/180.0)
B = np.linspace(0.0,1000.0,200)
theta = np.linspace(0.0,np.pi,200)
h = np.linspace(-20.0,20.0,200)
BMat, thetaMat = np.meshgrid(B, theta)
BMat, hMat = np.meshgrid(B, h)
f = lambda x: (1.0+np.tanh(x))/2.0*np.pi
like = np.exp(-(BParObs - BMat*np.cos(thetaMat))**2 / 20.0**2)
chi2 = (BParObs - BMat*np.cos(thetaMat))**2 / 20.0**2
pl.contourf(B, theta, chi2, levels=np.arange(10))
likeTransformed = np.exp(-(BParObs - BMat*np.cos(f(hMat)))**2 / 20.0**2)
chi2Transformed = (BParObs - BMat*np.cos(f(hMat)))**2 / 20.0**2
pl.contourf(B, h, chi2Transformed, levels=np.arange(10))
censoredData
import matplotlib.pyplot as pl
import scipy.special as sp
import numpy as np
%matplotlib inline
It is usual in Astrophysics to have upper/lower limits detections on a certain quantity that one is measuring. When this happens, it is usual to give this upper/lower limit as the measured quantity. One option to deal with these measurements is to drop them from the analysis. But they might contain important information for the solution to our problem so it makes sense to include them in the analysis.
Assume the following generative model:
A new philosophy for inversion codes
In general, an inversion code is a computer program/algorithm that allows you to extract information about the physical parameters of the system you have under study from the interpretation of the observables. In Solar Physics, inversion codes are routinely applied since the '70s to get the thermodynamical and magnetic properties of different regions of the solar atmosphere from the interpretation of the Stokes parameters.
With the use of fast slit spectropolarimeters and also two-dimensional filterpolarimeters, now we routinely have 2D maps of regions in the solar atmosphere observed at several points along one or several spectral lines. The interpretation of these observations have been done in the past by assuming that all pixels are completely uncorrelated and applying the inversion codes in a pixel-by-pixel basis. After this pixel-by-pixel inversion, one just plots maps of derived quantities and hope that these maps show some kind of smoothness, an indication that the inversion did not get crazy.
AdaptiveSMC
import numpy as np
import matplotlib.pyplot as pl
import scipy.optimize as op
import pdb
%matplotlib inline
class adaptiveSMC(object):
def __init__(self):
self.M = 50 # Number of pseudo observations for each particle
self.N = 1000 # Number of particles
self.alpha = 0.95 # Percentage of particles surviving
self.theta = np.random.uniform(low=-10.0, high=10.0, size=self.N) # Initial parameters from the prior
self.pseudoObs = self.simulateFromModel(self.theta) # Simulate particles from the prior
self.W = np.ones(self.N) / self.N
self.epsilon = 100
self.epsilonOld = 0
self.epsilonTarget = 0.025
self.ESSThreshold = self.N / 2
self.epsilonStore = []
self.ESSStore = []
def simulateFromModel(self, pars):
pseudoObs = np.zeros((self.M,self.N))
for i in range(self.M):
u = np.random.rand(self.N)
a = np.where(u < 0.5)
b = np.where(u >= 0.5)
pseudoObs[i,a] = pars[a] + np.random.rand(a[0].size)
pseudoObs[i,b] = pars[b] + np.random.rand(b[0].size)
return pseudoObs
def aliveParticles(self, epsilon, x):
npa = np.sum(np.abs(x) < epsilon, axis=0)
return np.sum(npa>0) / self.N
def sample(self):
# while (self.epsilon > self.epsilonTarget):
# Find next level
self.epsilonOld = self.epsilon
refLevel = self.alpha * self.aliveParticles(self.epsilonOld, self.pseudoObs)
self.epsilon = op.bisect(lambda eps: self.aliveParticles(eps, self.pseudoObs)-refLevel, 0, self.epsilonOld)
if (self.epsilon < self.epsilonTarget):
self.epsilon = self.epsilonTarget
self.ESS = 1.0 / np.sum(self.W**2)
self.epsilonStore.append(self.epsilon)
self.ESSStore.append(self.ESS)
# Compute associated weights
d = np.abs(self.pseudoObs)
npaOld = np.sum(d < self.epsilonOld, axis=0)
npa = np.sum(d < self.epsilon, axis=0)
a = np.where(npaOld > 0)
b = np.where(npaOld == 0)
self.W[a] = self.W[a] * npa[a] / npaOld[a]
self.W[b] = np.zeros(b[0].size)
self.W /= np.sum(self.W)
# Resample if necessary
if (np.sum(self.W**2) * self.ESSThreshol > 1):
smc = adaptiveSMC()
smc.sample()
a[0].size
RankingWithComparisons
import numpy as np
import matplotlib.pyplot as pl
%matplotlib inline
As scientists, from time to time we have to apply for positions when our contract is close to its end. These positions are competitive. You submit a Curriculum Vitae and a sketch of the project you want to carry out during the new contract. Typically, there is a commission that meets for a few hours and tries to end up with an ordered list of all the candidates. Being part of this commission is not easy at all and ranking the candidates, specially in the usual case that all of them are very good, is quite tough.
A few days ago, in discussions with some colleagues after one of these lists was published, I suggested if there is the possibility of ranking a list of candidates using a crowdranking process. A large set of people receives the documentation of two candidates and has to return which one of the two is better. My feeling was that, if you have a large enough set of people doing these rankings, it is possible to rank the list of candidates.
ApproximateBayesianComputation
import numpy as np
import matplotlib.pyplot as pl
%matplotlib inline
data = 5.0 + 2.0 * np.random.randn(10)
meanD = np.mean(data)
stdD = np.std(data)
def abcAcceptance(pars):
if (pars[1] < 0):
return False
samples = pars[0] + pars[1]*np.random.randn(10)
diffMean = np.abs(meanD - np.mean(samples))
diffStd = np.abs(stdD - np.std(samples))
if ((diffMean < 0.1) and (diffStd < 0.2)):
return True
else:
return False
start = np.asarray([4.0,2.3])
chain = np.zeros((2,25001))
chain[:,0] = start
for i in range(25000):
proposal = chain[:,i] + 0.7 * np.random.randn(2)
if (abcAcceptance(proposal)):
chain[:,i+1] = proposal
else:
chain[:,i+1] = chain[:,i]
pl.plot(chain[0,:],'.')
pl.plot(chain[1,:],'.')
print "{0} - {1}".format(np.mean(chain[0,:]),np.std(chain[0,:]))
print "{0} - {1}".format(np.mean(chain[1,:]),np.std(chain[1,:]))