Fitting distributions to data with ``paramnormal``. --------------------------------------------------- In addition to explicitly creating distributions from known parameters, ``paramnormal.[dist].fit`` provides a similar, interface to ``scipy.stats`` maximum-likelihood estimatation methods. Again, we'll demonstrate with a lognormal distribution and compare parameter estimatation with scipy. .. code:: python %matplotlib inline .. code:: python import warnings warnings.simplefilter('ignore') import numpy as np import matplotlib.pyplot as plt import seaborn import paramnormal clean_bkgd = {'axes.facecolor':'none', 'figure.facecolor':'none'} seaborn.set(style='ticks', rc=clean_bkgd) Let's start by generating a reasonably-sized random dataset and plotting a histogram. The primary method of creating a distribution from named parameters is shown below. The call to ``paramnormal.lognornal`` translates the parameter to be compatible with scipy. We then chain a call to the ``rvs`` (random variates) method of the returned scipy distribution. .. code:: python np.random.seed(0) x = paramnormal.lognormal(mu=1.75, sigma=0.75).rvs(370) Here's a histogram to illustrate the distribution. .. code:: python bins = np.logspace(-0.5, 1.75, num=25) fig, ax = plt.subplots() _ = ax.hist(x, bins=bins, normed=True) ax.set_xscale('log') ax.set_xlabel('$X$') ax.set_ylabel('Probability') seaborn.despine() fig .. image:: fitting_files/fitting_6_0.png .. image:: fitting_files/fitting_6_1.png Pretending for a moment that we didn't generate this dataset with explicit distribution parameters, how would we go about estimating them? Scipy provides a maximum-likelihood estimation for estimating parameters: .. code:: python from scipy import stats print(stats.lognorm.fit(x)) .. parsed-literal:: (0.77953564806411113, 0.23350251237627717, 5.3711822451406039) Unfortunately those parameters don't really make any sense based on what we know about our articifical dataset. That's where paramnormal comes in: .. code:: python params = paramnormal.lognormal.fit(x) print(params) .. parsed-literal:: params(mu=1.7370786294627456, sigma=0.73827268519387368, offset=0) This matches well with our understanding of the distribution. The returned ``params`` variable is a ``namedtuple`` that we can easily use to create a distribution via the ``.from_params`` methods. From there, we can create a nice plot of the probability distribution function with our histogram. .. code:: python dist = paramnormal.lognormal.from_params(params) # theoretical PDF x_hat = np.logspace(-0.5, 1.75, num=100) y_hat = dist.pdf(x_hat) bins = np.logspace(-0.5, 1.75, num=25) fig, ax = plt.subplots() _ = ax.hist(x, bins=bins, normed=True, alpha=0.375) ax.plot(x_hat, y_hat, zorder=2, color='g') ax.set_xscale('log') ax.set_xlabel('$X$') ax.set_ylabel('Probability') seaborn.despine() .. image:: fitting_files/fitting_12_0.png Recap ~~~~~ Fitting data ^^^^^^^^^^^^ .. code:: python params = paramnormal.lognormal.fit(x) print(params) .. parsed-literal:: params(mu=1.7370786294627456, sigma=0.73827268519387368, offset=0) Creating distributions ^^^^^^^^^^^^^^^^^^^^^^ The manual way: .. code:: python paramnormal.lognormal(mu=1.75, sigma=0.75, offset=0) .. parsed-literal:: From fit parameters: .. code:: python paramnormal.lognormal.from_params(params) .. parsed-literal::