Why paramnormal
?¶
Both in numpy
and scipy.stats
and in the field of statistics in
general, you can refer to the location (loc
) and scale (scale
)
parameters of a distribution. Roughly speaking, they refer to the
position and spread of the distribution, respectively. For normal
distribtions loc
refers the mean (symbolized as \(\mu\)) and
scale
refers to the standard deviation (a.k.a. \(\sigma\)).
The main problem that paramnormal
is trying to solve is that
sometimes, creating a probability distribution using these parameters
(and others) in scipy.stats
can be confusing. Also the parameters in
numpy.random
can be inconsistently named (admittedly, just a minor
inconvenience).
%matplotlib inline
import numpy as np
from scipy import stats
Consider the lognormal distribution.
In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable \(X\) is log-normally distributed, then \(Y = \ln(X)\) has a normal distribution. Likewise, if \(Y\) has a normal distribution, then \(X = \exp(Y)\) has a log-normal distribution. (from wikipedia)
In numpy, you specify the “mean” and “sigma” of the underlying normal
distribution. A lot lof scientific programmers know what that would
mean. But mean
and standard_deviation
, loc
and scale
or
mu
and sigma
would have been better choices.
Still, generating random numbers is pretty straight-forward:
np.random.seed(0)
mu = 0
sigma = 1
N = 3
np.random.lognormal(mean=mu, sigma=sigma, size=N)
array([ 5.83603919, 1.49205924, 2.66109578])
In scipy, you need an additional shape parameter (s
), plus the usual
loc
and scale
. Aside from the mystery behind what s
might
bem that seems straight-forward enough.
Except it’s not.
That shape parameter is actually the standard deviation (\(\sigma\))
of the underlying normal distribution. The scale
should be set to
the exponentiated location parameter of the raw distribution
(\(e ^ \mu\)). Finally, loc
actually refers to a sort of offset
that can be applied to entire distribution. In other words, you can
translate the distribution up and down to e.g., negative values.
In my field (civil/environmental engineering) variables that are often
assumed to be lognormally distributed (e.g., pollutant concentration)
can never have values less than or equal to zerlo. So in that sense, the
loc
parameter in scipy’s lognormal distribution nearly always
should be set to zero.
With that out of the way, recreating the three numbers above in scipy is done as follows:
np.random.seed(0)
stats.lognorm(sigma, loc=0, scale=np.exp(mu)).rvs(size=N)
array([ 5.83603919, 1.49205924, 2.66109578])
A new challenger appears¶
paramnormal
really just hopes to take away some of this friction.
Consider the following:
import paramnormal
np.random.seed(0)
paramnormal.lognormal(mu=mu, sigma=sigma).rvs(size=N)
array([ 5.83603919, 1.49205924, 2.66109578])
Hopefully that’s much more readable and straight-forward.
Greek-letter support¶
Tom Augspurger added a lovely little decorator to let you use greek letters in the function signature.
np.random.seed(0)
paramnormal.lognormal(μ=mu, σ=sigma).rvs(size=N)
array([ 5.83603919, 1.49205924, 2.66109578])
Other distributions¶
As of now, we provide a convenient interface for the following
distributions in scipy.stats
:
for d in paramnormal.dist.__all__:
print(d)
normal
lognormal
weibull
alpha
beta
gamma
chi_squared
pareto
exponential
rice
Feel free to submit a pull request at Github to add new distributions.