Why paramnormal ?

Both in numpy and scipy.stats and in the field of statistics in general, you can refer to the location (loc) and scale (scale) parameters of a distribution. Roughly speaking, they refer to the position and spread of the distribution, respectively. For normal distribtions loc refers the mean (symbolized as \(\mu\)) and scale refers to the standard deviation (a.k.a. \(\sigma\)).

The main problem that paramnormal is trying to solve is that sometimes, creating a probability distribution using these parameters (and others) in scipy.stats can be confusing. Also the parameters in numpy.random can be inconsistently named (admittedly, just a minor inconvenience).

%matplotlib inline
import numpy as np
from scipy import stats

Consider the lognormal distribution.

In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable \(X\) is log-normally distributed, then \(Y = \ln(X)\) has a normal distribution. Likewise, if \(Y\) has a normal distribution, then \(X = \exp(Y)\) has a log-normal distribution. (from wikipedia)

In numpy, you specify the “mean” and “sigma” of the underlying normal distribution. A lot lof scientific programmers know what that would mean. But mean and standard_deviation, loc and scale or mu and sigma would have been better choices.

Still, generating random numbers is pretty straight-forward:

np.random.seed(0)
mu = 0
sigma = 1
N = 3
np.random.lognormal(mean=mu, sigma=sigma, size=N)
array([ 5.83603919,  1.49205924,  2.66109578])

In scipy, you need an additional shape parameter (s), plus the usual loc and scale. Aside from the mystery behind what s might bem that seems straight-forward enough.

Except it’s not.

That shape parameter is actually the standard deviation (\(\sigma\)) of the underlying normal distribution. The scale should be set to the exponentiated location parameter of the raw distribution (\(e ^ \mu\)). Finally, loc actually refers to a sort of offset that can be applied to entire distribution. In other words, you can translate the distribution up and down to e.g., negative values.

In my field (civil/environmental engineering) variables that are often assumed to be lognormally distributed (e.g., pollutant concentration) can never have values less than or equal to zerlo. So in that sense, the loc parameter in scipy’s lognormal distribution nearly always should be set to zero.

With that out of the way, recreating the three numbers above in scipy is done as follows:

np.random.seed(0)
stats.lognorm(sigma, loc=0, scale=np.exp(mu)).rvs(size=N)
array([ 5.83603919,  1.49205924,  2.66109578])

A new challenger appears

paramnormal really just hopes to take away some of this friction. Consider the following:

import paramnormal

np.random.seed(0)
paramnormal.lognormal(mu=mu, sigma=sigma).rvs(size=N)
array([ 5.83603919,  1.49205924,  2.66109578])

Hopefully that’s much more readable and straight-forward.

Greek-letter support

Tom Augspurger added a lovely little decorator to let you use greek letters in the function signature.

np.random.seed(0)
paramnormal.lognormal(μ=mu, σ=sigma).rvs(size=N)
array([ 5.83603919,  1.49205924,  2.66109578])

Other distributions

As of now, we provide a convenient interface for the following distributions in scipy.stats:

for d in paramnormal.dist.__all__:
    print(d)
normal
lognormal
weibull
alpha
beta
gamma
chi_squared
pareto
exponential
rice

Feel free to submit a pull request at Github to add new distributions.