ligaBBVAHierarchical

In [2]:
import numpy as np
import matplotlib.pyplot as pl
import pymc
import ipdb
import seaborn as sns
#import daft
%matplotlib inline

After a football season finishes, we can use the results of all the matches to deduce the quality of each team. This has been done with the italian league in the paper 'Bayesian hierarchical model for the prediction of football results' by Gianluca Baio and Marta A. Blangiardo. This work has been applied to the Premier A league in http://danielweitzenfeld.github.io/passtheroc/blog/2014/10/28/bayes-premier-league/

Read more…

Importance trick for hierarchical models

In [19]:
import numpy as np
import matplotlib.pyplot as pl
import emcee
import scipy.misc as mi
from ipdb import set_trace as stop
%matplotlib inline

Let's generate the data. They are linear trends with slopes obtained from a N(0,1) distribution and with noise added. The number of cases is N and the number of events in each case is a random number between 1 and 5

Read more…

GP LSD

Least squares deconvolution

Least square deconvolution is a very simple technique applied for the enhancement of polarimetric signals needed for the analysis of stellar magnetism. The idea is that the signal-to-noise ratio per spectral line in solar type stars is not enough to detect the polarimetric signal (typically circular polarization, Stokes $V$). This was solved originally by M. Semel by adding all photospheric spectral lines when written in a common velocity axis. The method is based on the idea that noise will add incoherently, while the signal will add coherently. After adding many lines, the signal will appear.

This line addition method was superseeded by the least squares deconvolution (LSD) method. This method is based on the idea that, when the spectral lines are in the weak-field (the Zeeman splitting is smaller than the broadening of the line) and weak-line (the line is not saturated) regime, the Stokes $V$ profiles have the same shape:

Read more…

Quiet Sun Model

In [2]:
import pymc
import numpy as np
import matplotlib.pyplot as pl
import seaborn as sn
import ipdb
%matplotlib inline
In [ ]:
def LOS_to_LV(theta, costhetaB_LOS, sinthetaB_LOS, cosphiB_LOS, sinphiB_LOS):
	costhetaB_LV = np.cos(theta) * costhetaB_LOS - np.sin(theta) * sinthetaB_LOS * cosphiB_LOS
	sinthetaB_LV = np.sqrt(1.0-costhetaB_LV**2)
	cosphiB_LV = (np.cos(theta) * sinthetaB_LOS * cosphiB_LOS + costhetaB_LOS * np.sin(theta)) / sinthetaB_LV
	sinphiB_LV = sinthetaB_LOS * sinphiB_LOS / sinthetaB_LV
	
	return costhetaB_LV, sinthetaB_LV, cosphiB_LV, sinphiB_LV

def LV_to_LOS(theta, costhetaB_LV, sinthetaB_LV, cosphiB_LV, sinphiB_LV):
	costhetaB_LOS = np.cos(theta) * costhetaB_LV + np.sin(theta) * sinthetaB_LV * cosphiB_LV
	sinthetaB_LOS = np.sqrt(1.0-costhetaB_LOS**2)
	cosphiB_LOS = (np.cos(theta) * sinthetaB_LV * cosphiB_LV - costhetaB_LV * np.sin(theta)) / sinthetaB_LOS
	sinphiB_LOS = sinthetaB_LV * sinphiB_LV / sinthetaB_LOS
	
	return costhetaB_LOS, sinthetaB_LOS, cosphiB_LOS, sinphiB_LOS
Generate some fake data.
In [ ]:
nPatches = 10
nMax = 50
nPointsPatch = nMax * np.random.rand(nPatches)
muPatch = np.random.rand(nPatches)

for i in range(nPatches):
    

Change of variables for weak field

In [1]:
import numpy as np
import matplotlib.pyplot as pl
%matplotlib inline
In [13]:
BParObs = 300.0*np.cos(60.0*np.pi/180.0)
In [14]:
B = np.linspace(0.0,1000.0,200)
theta = np.linspace(0.0,np.pi,200)
h = np.linspace(-20.0,20.0,200)
BMat, thetaMat = np.meshgrid(B, theta)
BMat, hMat = np.meshgrid(B, h)
f = lambda x: (1.0+np.tanh(x))/2.0*np.pi
In [15]:
like = np.exp(-(BParObs - BMat*np.cos(thetaMat))**2 / 20.0**2)
chi2 = (BParObs - BMat*np.cos(thetaMat))**2 / 20.0**2
pl.contourf(B, theta, chi2, levels=np.arange(10))
Out[15]:
<matplotlib.contour.QuadContourSet instance at 0xb00e844c>
In [16]:
likeTransformed = np.exp(-(BParObs - BMat*np.cos(f(hMat)))**2 / 20.0**2)
chi2Transformed = (BParObs - BMat*np.cos(f(hMat)))**2 / 20.0**2
pl.contourf(B, h, chi2Transformed, levels=np.arange(10))
Out[16]:
<matplotlib.contour.QuadContourSet instance at 0xb00cb48c>
In [ ]:
 

censoredData

In [2]:
import matplotlib.pyplot as pl
import scipy.special as sp
import numpy as np

%matplotlib inline

It is usual in Astrophysics to have upper/lower limits detections on a certain quantity that one is measuring. When this happens, it is usual to give this upper/lower limit as the measured quantity. One option to deal with these measurements is to drop them from the analysis. But they might contain important information for the solution to our problem so it makes sense to include them in the analysis.

Assume the following generative model:

Read more…

A new philosophy for inversion codes

In general, an inversion code is a computer program/algorithm that allows you to extract information about the physical parameters of the system you have under study from the interpretation of the observables. In Solar Physics, inversion codes are routinely applied since the '70s to get the thermodynamical and magnetic properties of different regions of the solar atmosphere from the interpretation of the Stokes parameters.

With the use of fast slit spectropolarimeters and also two-dimensional filterpolarimeters, now we routinely have 2D maps of regions in the solar atmosphere observed at several points along one or several spectral lines. The interpretation of these observations have been done in the past by assuming that all pixels are completely uncorrelated and applying the inversion codes in a pixel-by-pixel basis. After this pixel-by-pixel inversion, one just plots maps of derived quantities and hope that these maps show some kind of smoothness, an indication that the inversion did not get crazy.

Read more…

AdaptiveSMC

In [30]:
import numpy as np
import matplotlib.pyplot as pl
import scipy.optimize as op
import pdb
%matplotlib inline
In [54]:
class adaptiveSMC(object):
    def __init__(self):
        self.M = 50  # Number of pseudo observations for each particle
        self.N = 1000  # Number of particles
        self.alpha = 0.95 # Percentage of particles surviving
        self.theta = np.random.uniform(low=-10.0, high=10.0, size=self.N)  # Initial parameters from the prior        
        
        self.pseudoObs = self.simulateFromModel(self.theta)  # Simulate particles from the prior
        
        self.W = np.ones(self.N) / self.N
        
        self.epsilon = 100
        self.epsilonOld = 0
        self.epsilonTarget = 0.025
        self.ESSThreshold = self.N / 2
        
        self.epsilonStore = []
        self.ESSStore = []
        
    def simulateFromModel(self, pars):
        pseudoObs = np.zeros((self.M,self.N))
        for i in range(self.M):
            u = np.random.rand(self.N)
            a = np.where(u < 0.5)
            b = np.where(u >= 0.5)
            pseudoObs[i,a] = pars[a] + np.random.rand(a[0].size)
            pseudoObs[i,b] = pars[b] + np.random.rand(b[0].size)
        return pseudoObs
    
    def aliveParticles(self, epsilon, x):
        npa = np.sum(np.abs(x) < epsilon, axis=0)
        return np.sum(npa>0) / self.N
    
    def sample(self):
#        while (self.epsilon > self.epsilonTarget):

# Find next level
        self.epsilonOld = self.epsilon
        refLevel = self.alpha * self.aliveParticles(self.epsilonOld, self.pseudoObs)
        self.epsilon = op.bisect(lambda eps: self.aliveParticles(eps, self.pseudoObs)-refLevel, 0, self.epsilonOld)
        if (self.epsilon < self.epsilonTarget):
            self.epsilon = self.epsilonTarget
        
        self.ESS = 1.0 / np.sum(self.W**2)
        self.epsilonStore.append(self.epsilon)
        self.ESSStore.append(self.ESS)
        
# Compute associated weights
        d = np.abs(self.pseudoObs)
        npaOld = np.sum(d < self.epsilonOld, axis=0)
        npa = np.sum(d < self.epsilon, axis=0)
        a = np.where(npaOld > 0)
        b = np.where(npaOld == 0)
        self.W[a] = self.W[a] * npa[a] / npaOld[a]
        self.W[b] = np.zeros(b[0].size)
        self.W /= np.sum(self.W)
        
# Resample if necessary
        if (np.sum(self.W**2) * self.ESSThreshol > 1):
            
        
In [55]:
smc = adaptiveSMC()
In [56]:
smc.sample()
10.0076821269
In [14]:
a[0].size
Out[14]:
2
In [ ]:
 

RankingWithComparisons

In [1]:
import numpy as np
import matplotlib.pyplot as pl
%matplotlib inline

As scientists, from time to time we have to apply for positions when our contract is close to its end. These positions are competitive. You submit a Curriculum Vitae and a sketch of the project you want to carry out during the new contract. Typically, there is a commission that meets for a few hours and tries to end up with an ordered list of all the candidates. Being part of this commission is not easy at all and ranking the candidates, specially in the usual case that all of them are very good, is quite tough.

A few days ago, in discussions with some colleagues after one of these lists was published, I suggested if there is the possibility of ranking a list of candidates using a crowdranking process. A large set of people receives the documentation of two candidates and has to return which one of the two is better. My feeling was that, if you have a large enough set of people doing these rankings, it is possible to rank the list of candidates.

Read more…

ApproximateBayesianComputation

In [1]:
import numpy as np
import matplotlib.pyplot as pl
%matplotlib inline
In [14]:
data = 5.0 + 2.0 * np.random.randn(10)
meanD = np.mean(data)
stdD = np.std(data)
In [28]:
def abcAcceptance(pars):
    if (pars[1] < 0):
        return False
    samples = pars[0] + pars[1]*np.random.randn(10)
    
    diffMean = np.abs(meanD - np.mean(samples))
    diffStd = np.abs(stdD - np.std(samples))
    if ((diffMean < 0.1) and (diffStd < 0.2)):
        return True
    else:
        return False
In [29]:
start = np.asarray([4.0,2.3])
chain = np.zeros((2,25001))
chain[:,0] = start
for i in range(25000):
    proposal = chain[:,i] + 0.7 * np.random.randn(2)
    
    if (abcAcceptance(proposal)):
        chain[:,i+1] = proposal
    else:
        chain[:,i+1] = chain[:,i]
In [30]:
pl.plot(chain[0,:],'.')
Out[30]:
[<matplotlib.lines.Line2D at 0x562d210>]
In [31]:
pl.plot(chain[1,:],'.')
Out[31]:
[<matplotlib.lines.Line2D at 0x5a25b90>]
In [32]:
print "{0} - {1}".format(np.mean(chain[0,:]),np.std(chain[0,:]))
5.71802874288 - 0.681588282195
In [33]:
print "{0} - {1}".format(np.mean(chain[1,:]),np.std(chain[1,:]))
2.25793396108 - 0.553244176481
In [ ]: