Why paramnormal ?¶
Both in numpy and scipy.stats and in the field of statistics in
general, you can refer to the location (loc) and scale (scale)
parameters of a distribution. Roughly speaking, they refer to the
position and spread of the distribution, respectively. For normal
distribtions loc refers the mean (symbolized as \(\mu\)) and
scale refers to the standard deviation (a.k.a. \(\sigma\)).
The main problem that paramnormal is trying to solve is that
sometimes, creating a probability distribution using these parameters
(and others) in scipy.stats can be confusing. Also the parameters in
numpy.random can be inconsistently named (admittedly, just a minor
inconvenience).
%matplotlib inline
import numpy as np
from scipy import stats
Consider the lognormal distribution.
In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable \(X\) is log-normally distributed, then \(Y = \ln(X)\) has a normal distribution. Likewise, if \(Y\) has a normal distribution, then \(X = \exp(Y)\) has a log-normal distribution. (from wikipedia)
In numpy, you specify the “mean” and “sigma” of the underlying normal
distribution. A lot lof scientific programmers know what that would
mean. But mean and standard_deviation, loc and scale or
mu and sigma would have been better choices.
Still, generating random numbers is pretty straight-forward:
np.random.seed(0)
mu = 0
sigma = 1
N = 3
np.random.lognormal(mean=mu, sigma=sigma, size=N)
array([ 5.83603919,  1.49205924,  2.66109578])
In scipy, you need an additional shape parameter (s), plus the usual
loc and scale. Aside from the mystery behind what s might
bem that seems straight-forward enough.
Except it’s not.
That shape parameter is actually the standard deviation (\(\sigma\))
of the underlying normal distribution. The scale should be set to
the exponentiated location parameter of the raw distribution
(\(e ^ \mu\)). Finally, loc actually refers to a sort of offset
that can be applied to entire distribution. In other words, you can
translate the distribution up and down to e.g., negative values.
In my field (civil/environmental engineering) variables that are often
assumed to be lognormally distributed (e.g., pollutant concentration)
can never have values less than or equal to zerlo. So in that sense, the
loc parameter in scipy’s lognormal distribution nearly always
should be set to zero.
With that out of the way, recreating the three numbers above in scipy is done as follows:
np.random.seed(0)
stats.lognorm(sigma, loc=0, scale=np.exp(mu)).rvs(size=N)
array([ 5.83603919,  1.49205924,  2.66109578])
A new challenger appears¶
paramnormal really just hopes to take away some of this friction.
Consider the following:
import paramnormal
np.random.seed(0)
paramnormal.lognormal(mu=mu, sigma=sigma).rvs(size=N)
array([ 5.83603919,  1.49205924,  2.66109578])
Hopefully that’s much more readable and straight-forward.
Greek-letter support¶
Tom Augspurger added a lovely little decorator to let you use greek letters in the function signature.
np.random.seed(0)
paramnormal.lognormal(μ=mu, σ=sigma).rvs(size=N)
array([ 5.83603919,  1.49205924,  2.66109578])
Other distributions¶
As of now, we provide a convenient interface for the following
distributions in scipy.stats:
for d in paramnormal.dist.__all__:
    print(d)
normal
lognormal
weibull
alpha
beta
gamma
chi_squared
pareto
exponential
rice
Feel free to submit a pull request at Github to add new distributions.