Lab 03: computing empirical probabilities, empirical CDFs, and empirical PDFs

Lab 03: computing empirical probabilities, empirical CDFs, and empirical PDFs#

CCNY EAS 42000/A42000, Fall 2025, 2025/10/08, Prof. Spencer Hill

SUMMARY: We use xarray to load the netCDF file containing the Central Park weather station daily data from disk, and then a combination of xarray and scipy to compute empirical probabilites, CDFs, and PDFs from that data.

Empirical CDFs#

Let’s start by getting some info about the scipy.stats.ecdf function we’ll be using:

# c.f. note above: you can alternatively use `scipy.stats.ecdf?`
help(scipy.stats.ecdf)

Help on function ecdf in module scipy.stats._survival:

ecdf(sample: 'npt.ArrayLike | CensoredData') -> scipy.stats._survival.ECDFResult
    Empirical cumulative distribution function of a sample.

    The empirical cumulative distribution function (ECDF) is a step function
    estimate of the CDF of the distribution underlying a sample. This function
    returns objects representing both the empirical distribution function and
    its complement, the empirical survival function.

    Parameters
    ----------
    sample : 1D array_like or `scipy.stats.CensoredData`
        Besides array_like, instances of `scipy.stats.CensoredData` containing
        uncensored and right-censored observations are supported. Currently,
        other instances of `scipy.stats.CensoredData` will result in a
        ``NotImplementedError``.

    Returns
    -------
    res : `~scipy.stats._result_classes.ECDFResult`
        An object with the following attributes.

        cdf : `~scipy.stats._result_classes.EmpiricalDistributionFunction`
            An object representing the empirical cumulative distribution
            function.
        sf : `~scipy.stats._result_classes.EmpiricalDistributionFunction`
            An object representing the empirical survival function.

        The `cdf` and `sf` attributes themselves have the following attributes.

        quantiles : ndarray
            The unique values in the sample that defines the empirical CDF/SF.
        probabilities : ndarray
            The point estimates of the probabilities corresponding with
            `quantiles`.

        And the following methods:

        evaluate(x) :
            Evaluate the CDF/SF at the argument.

        plot(ax) :
            Plot the CDF/SF on the provided axes.

        confidence_interval(confidence_level=0.95) :
            Compute the confidence interval around the CDF/SF at the values in
            `quantiles`.

    Notes
    -----
    When each observation of the sample is a precise measurement, the ECDF
    steps up by ``1/len(sample)`` at each of the observations [1]_.

    When observations are lower bounds, upper bounds, or both upper and lower
    bounds, the data is said to be "censored", and `sample` may be provided as
    an instance of `scipy.stats.CensoredData`.

    For right-censored data, the ECDF is given by the Kaplan-Meier estimator
    [2]_; other forms of censoring are not supported at this time.

    Confidence intervals are computed according to the Greenwood formula or the
    more recent "Exponential Greenwood" formula as described in [4]_.

    References
    ----------
    .. [1] Conover, William Jay. Practical nonparametric statistics. Vol. 350.
           John Wiley & Sons, 1999.

    .. [2] Kaplan, Edward L., and Paul Meier. "Nonparametric estimation from
           incomplete observations." Journal of the American statistical
           association 53.282 (1958): 457-481.

    .. [3] Goel, Manish Kumar, Pardeep Khanna, and Jugal Kishore.
           "Understanding survival analysis: Kaplan-Meier estimate."
           International journal of Ayurveda research 1.4 (2010): 274.

    .. [4] Sawyer, Stanley. "The Greenwood and Exponential Greenwood Confidence
           Intervals in Survival Analysis."
           https://www.math.wustl.edu/~sawyer/handouts/greenwood.pdf

    Examples
    --------
    **Uncensored Data**

    As in the example from [1]_ page 79, five boys were selected at random from
    those in a single high school. Their one-mile run times were recorded as
    follows.

    >>> sample = [6.23, 5.58, 7.06, 6.42, 5.20]  # one-mile run times (minutes)

    The empirical distribution function, which approximates the distribution
    function of one-mile run times of the population from which the boys were
    sampled, is calculated as follows.

    >>> from scipy import stats
    >>> res = stats.ecdf(sample)
    >>> res.cdf.quantiles
    array([5.2 , 5.58, 6.23, 6.42, 7.06])
    >>> res.cdf.probabilities
    array([0.2, 0.4, 0.6, 0.8, 1. ])

    To plot the result as a step function:

    >>> import matplotlib.pyplot as plt
    >>> ax = plt.subplot()
    >>> res.cdf.plot(ax)
    >>> ax.set_xlabel('One-Mile Run Time (minutes)')
    >>> ax.set_ylabel('Empirical CDF')
    >>> plt.show()

    **Right-censored Data**

    As in the example from [1]_ page 91, the lives of ten car fanbelts were
    tested. Five tests concluded because the fanbelt being tested broke, but
    the remaining tests concluded for other reasons (e.g. the study ran out of
    funding, but the fanbelt was still functional). The mileage driven
    with the fanbelts were recorded as follows.

    >>> broken = [77, 47, 81, 56, 80]  # in thousands of miles driven
    >>> unbroken = [62, 60, 43, 71, 37]

    Precise survival times of the fanbelts that were still functional at the
    end of the tests are unknown, but they are known to exceed the values
    recorded in ``unbroken``. Therefore, these observations are said to be
    "right-censored", and the data is represented using
    `scipy.stats.CensoredData`.

    >>> sample = stats.CensoredData(uncensored=broken, right=unbroken)

    The empirical survival function is calculated as follows.

    >>> res = stats.ecdf(sample)
    >>> res.sf.quantiles
    array([37., 43., 47., 56., 60., 62., 71., 77., 80., 81.])
    >>> res.sf.probabilities
    array([1.   , 1.   , 0.875, 0.75 , 0.75 , 0.75 , 0.75 , 0.5  , 0.25 , 0.   ])

    To plot the result as a step function:

    >>> ax = plt.subplot()
    >>> res.sf.plot(ax)
    >>> ax.set_xlabel('Fanbelt Survival Time (thousands of miles)')
    >>> ax.set_ylabel('Empirical SF')
    >>> plt.show()

OK, so let’s call it on our daily maximum temperature variable:

ecdf_temp_max = scipy.stats.ecdf(ds_cp["temp_max"])
ecdf_temp_max

ECDFResult(cdf=EmpiricalDistributionFunction(quantiles=array([  0.,   2.,   4.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,  13.,
        14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
        25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,
        36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,
        47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,
        58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,
        69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,
        80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,  90.,
        91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99., 100., 101.,
       102., 103., 104., 106.]), probabilities=array([0.00125619, 0.00127389, 0.00130927, 0.00134466, 0.00143312,
       0.00155697, 0.00166313, 0.00189314, 0.00205237, 0.00228238,
       0.00258316, 0.00290163, 0.00334395, 0.00389243, 0.00442321,
       0.00530786, 0.00631635, 0.00805025, 0.00980184, 0.01224345,
       0.01433121, 0.0170913 , 0.02031139, 0.02430998, 0.02837933,
       0.03324487, 0.03864119, 0.04543524, 0.05276008, 0.06165959,
       0.07050602, 0.08149328, 0.09258669, 0.10546709, 0.11857749,
       0.1329264 , 0.1467799 , 0.16383581, 0.1794586 , 0.19511677,
       0.21081033, 0.22662774, 0.24189667, 0.25711253, 0.2730184 ,
       0.28816348, 0.30380396, 0.32061217, 0.33582803, 0.35053079,
       0.3653574 , 0.38110403, 0.39587757, 0.41070418, 0.42464614,
       0.43973815, 0.45368011, 0.47013447, 0.48533263, 0.50058386,
       0.51516277, 0.53053786, 0.54501062, 0.56008493, 0.57501769,
       0.59117127, 0.6059448 , 0.62344303, 0.63956122, 0.65705945,
       0.67323071, 0.69143666, 0.70872258, 0.72774239, 0.74488677,
       0.76472045, 0.78264331, 0.80493631, 0.82498231, 0.84495754,
       0.864862  , 0.8844126 , 0.90205237, 0.91884289, 0.93382873,
       0.94663836, 0.95787332, 0.96799363, 0.9757431 , 0.98198868,
       0.98733192, 0.99136589, 0.99400212, 0.99600142, 0.99738146,
       0.99837226, 0.99893843, 0.99945152, 0.99966384, 0.99985846,
       0.99992923, 0.99998231, 1.        ])), sf=EmpiricalDistributionFunction(quantiles=array([  0.,   2.,   4.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,  13.,
        14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
        25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,
        36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,
        47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,
        58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,
        69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,
        80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,  90.,
        91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99., 100., 101.,
       102., 103., 104., 106.]), probabilities=array([9.98743808e-01, 9.98726115e-01, 9.98690729e-01, 9.98655343e-01,
       9.98566879e-01, 9.98443029e-01, 9.98336872e-01, 9.98106865e-01,
       9.97947629e-01, 9.97717622e-01, 9.97416844e-01, 9.97098372e-01,
       9.96656051e-01, 9.96107573e-01, 9.95576787e-01, 9.94692144e-01,
       9.93683652e-01, 9.91949752e-01, 9.90198160e-01, 9.87756546e-01,
       9.85668790e-01, 9.82908705e-01, 9.79688606e-01, 9.75690021e-01,
       9.71620665e-01, 9.66755131e-01, 9.61358811e-01, 9.54564756e-01,
       9.47239915e-01, 9.38340410e-01, 9.29493984e-01, 9.18506723e-01,
       9.07413305e-01, 8.94532909e-01, 8.81422505e-01, 8.67073602e-01,
       8.53220099e-01, 8.36164190e-01, 8.20541401e-01, 8.04883227e-01,
       7.89189667e-01, 7.73372258e-01, 7.58103326e-01, 7.42887473e-01,
       7.26981599e-01, 7.11836518e-01, 6.96196037e-01, 6.79387827e-01,
       6.64171975e-01, 6.49469214e-01, 6.34642604e-01, 6.18895966e-01,
       6.04122435e-01, 5.89295824e-01, 5.75353857e-01, 5.60261854e-01,
       5.46319887e-01, 5.29865534e-01, 5.14667374e-01, 4.99416136e-01,
       4.84837226e-01, 4.69462137e-01, 4.54989384e-01, 4.39915074e-01,
       4.24982307e-01, 4.08828733e-01, 3.94055202e-01, 3.76556971e-01,
       3.60438783e-01, 3.42940552e-01, 3.26769285e-01, 3.08563340e-01,
       2.91277424e-01, 2.72257608e-01, 2.55113234e-01, 2.35279547e-01,
       2.17356688e-01, 1.95063694e-01, 1.75017693e-01, 1.55042463e-01,
       1.35138004e-01, 1.15587403e-01, 9.79476292e-02, 8.11571125e-02,
       6.61712668e-02, 5.33616419e-02, 4.21266808e-02, 3.20063694e-02,
       2.42569002e-02, 1.80113234e-02, 1.26680821e-02, 8.63411182e-03,
       5.99787686e-03, 3.99858457e-03, 2.61854211e-03, 1.62774239e-03,
       1.06157113e-03, 5.48478415e-04, 3.36164190e-04, 1.41542817e-04,
       7.07714084e-05, 1.76928521e-05, 0.00000000e+00])))

Rather than a simple numpy array or xr.DataArray, this creates a ECDFResult object.

From the output, it looks like this has two attributes, cdf and sf. Most likely of course cdf is the one we’re after.

But since we’re not familiar, let’s use help (or the question mark) again to see what we can do with this:

help(ecdf_temp_max)

Help on ECDFResult in module scipy.stats._survival object:

class ECDFResult(builtins.object)
 |  ECDFResult(q, cdf, sf, n, d)
 |
 |  Result object returned by `scipy.stats.ecdf`
 |
 |  Attributes
 |  ----------
 |  cdf : `~scipy.stats._result_classes.EmpiricalDistributionFunction`
 |      An object representing the empirical cumulative distribution function.
 |  sf : `~scipy.stats._result_classes.EmpiricalDistributionFunction`
 |      An object representing the complement of the empirical cumulative
 |      distribution function.
 |
 |  Methods defined here:
 |
 |  __eq__(self, other)
 |      Return self==value.
 |
 |  __init__(self, q, cdf, sf, n, d)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __replace__ = _replace(self, /, **changes) from dataclasses
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables
 |
 |  __weakref__
 |      list of weak references to the object
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __annotations__ = {'cdf': <class 'scipy.stats._survival.EmpiricalDistr...
 |
 |  __dataclass_fields__ = {'cdf': Field(name='cdf',type=<class 'scipy.sta...
 |
 |  __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
 |
 |  __hash__ = None
 |
 |  __match_args__ = ('cdf', 'sf')

OK, so indeed under the “Attributes” heading, it clearly says that the cdf is what represents the CDF we’re interested in. So let’s take a look:

ecdf_temp_max.cdf

EmpiricalDistributionFunction(quantiles=array([  0.,   2.,   4.,   6.,   7.,   8.,   9.,  10.,  11.,  12.,  13.,
        14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.,  24.,
        25.,  26.,  27.,  28.,  29.,  30.,  31.,  32.,  33.,  34.,  35.,
        36.,  37.,  38.,  39.,  40.,  41.,  42.,  43.,  44.,  45.,  46.,
        47.,  48.,  49.,  50.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,
        58.,  59.,  60.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,
        69.,  70.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,  79.,
        80.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,  89.,  90.,
        91.,  92.,  93.,  94.,  95.,  96.,  97.,  98.,  99., 100., 101.,
       102., 103., 104., 106.]), probabilities=array([0.00125619, 0.00127389, 0.00130927, 0.00134466, 0.00143312,
       0.00155697, 0.00166313, 0.00189314, 0.00205237, 0.00228238,
       0.00258316, 0.00290163, 0.00334395, 0.00389243, 0.00442321,
       0.00530786, 0.00631635, 0.00805025, 0.00980184, 0.01224345,
       0.01433121, 0.0170913 , 0.02031139, 0.02430998, 0.02837933,
       0.03324487, 0.03864119, 0.04543524, 0.05276008, 0.06165959,
       0.07050602, 0.08149328, 0.09258669, 0.10546709, 0.11857749,
       0.1329264 , 0.1467799 , 0.16383581, 0.1794586 , 0.19511677,
       0.21081033, 0.22662774, 0.24189667, 0.25711253, 0.2730184 ,
       0.28816348, 0.30380396, 0.32061217, 0.33582803, 0.35053079,
       0.3653574 , 0.38110403, 0.39587757, 0.41070418, 0.42464614,
       0.43973815, 0.45368011, 0.47013447, 0.48533263, 0.50058386,
       0.51516277, 0.53053786, 0.54501062, 0.56008493, 0.57501769,
       0.59117127, 0.6059448 , 0.62344303, 0.63956122, 0.65705945,
       0.67323071, 0.69143666, 0.70872258, 0.72774239, 0.74488677,
       0.76472045, 0.78264331, 0.80493631, 0.82498231, 0.84495754,
       0.864862  , 0.8844126 , 0.90205237, 0.91884289, 0.93382873,
       0.94663836, 0.95787332, 0.96799363, 0.9757431 , 0.98198868,
       0.98733192, 0.99136589, 0.99400212, 0.99600142, 0.99738146,
       0.99837226, 0.99893843, 0.99945152, 0.99966384, 0.99985846,
       0.99992923, 0.99998231, 1.        ]))

So this, in turn, has as attributes two arrays, quantiles and probabilities.

We can use help or the question mark yet again to get more information about those:

help(ecdf_temp_max.cdf)

Help on EmpiricalDistributionFunction in module scipy.stats._survival object:

class EmpiricalDistributionFunction(builtins.object)
 |  EmpiricalDistributionFunction(q, p, n, d, kind)
 |
 |  An empirical distribution function produced by `scipy.stats.ecdf`
 |
 |  Attributes
 |  ----------
 |  quantiles : ndarray
 |      The unique values of the sample from which the
 |      `EmpiricalDistributionFunction` was estimated.
 |  probabilities : ndarray
 |      The point estimates of the cumulative distribution function (CDF) or
 |      its complement, the survival function (SF), corresponding with
 |      `quantiles`.
 |
 |  Methods defined here:
 |
 |  __eq__(self, other)
 |      Return self==value.
 |
 |  __init__(self, q, p, n, d, kind)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |
 |  __replace__ = _replace(self, /, **changes) from dataclasses
 |
 |  __repr__(self)
 |      Return repr(self).
 |
 |  confidence_interval(self, confidence_level=0.95, *, method='linear')
 |      Compute a confidence interval around the CDF/SF point estimate
 |
 |      Parameters
 |      ----------
 |      confidence_level : float, default: 0.95
 |          Confidence level for the computed confidence interval
 |
 |      method : str, {"linear", "log-log"}
 |          Method used to compute the confidence interval. Options are
 |          "linear" for the conventional Greenwood confidence interval
 |          (default)  and "log-log" for the "exponential Greenwood",
 |          log-negative-log-transformed confidence interval.
 |
 |      Returns
 |      -------
 |      ci : ``ConfidenceInterval``
 |          An object with attributes ``low`` and ``high``, instances of
 |          `~scipy.stats._result_classes.EmpiricalDistributionFunction` that
 |          represent the lower and upper bounds (respectively) of the
 |          confidence interval.
 |
 |      Notes
 |      -----
 |      Confidence intervals are computed according to the Greenwood formula
 |      (``method='linear'``) or the more recent "exponential Greenwood"
 |      formula (``method='log-log'``) as described in [1]_. The conventional
 |      Greenwood formula can result in lower confidence limits less than 0
 |      and upper confidence limits greater than 1; these are clipped to the
 |      unit interval. NaNs may be produced by either method; these are
 |      features of the formulas.
 |
 |      References
 |      ----------
 |      .. [1] Sawyer, Stanley. "The Greenwood and Exponential Greenwood
 |             Confidence Intervals in Survival Analysis."
 |             https://www.math.wustl.edu/~sawyer/handouts/greenwood.pdf
 |
 |  evaluate(self, x)
 |      Evaluate the empirical CDF/SF function at the input.
 |
 |      Parameters
 |      ----------
 |      x : ndarray
 |          Argument to the CDF/SF
 |
 |      Returns
 |      -------
 |      y : ndarray
 |          The CDF/SF evaluated at the input
 |
 |  plot(self, ax=None, **matplotlib_kwargs)
 |      Plot the empirical distribution function
 |
 |      Available only if ``matplotlib`` is installed.
 |
 |      Parameters
 |      ----------
 |      ax : matplotlib.axes.Axes
 |          Axes object to draw the plot onto, otherwise uses the current Axes.
 |
 |      **matplotlib_kwargs : dict, optional
 |          Keyword arguments passed directly to `matplotlib.axes.Axes.step`.
 |          Unless overridden, ``where='post'``.
 |
 |      Returns
 |      -------
 |      lines : list of `matplotlib.lines.Line2D`
 |          Objects representing the plotted data
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |
 |  __dict__
 |      dictionary for instance variables
 |
 |  __weakref__
 |      list of weak references to the object
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __annotations__ = {'_d': <class 'numpy.ndarray'>, '_kind': <class 'str...
 |
 |  __dataclass_fields__ = {'_d': Field(name='_d',type=<class 'numpy.ndarr...
 |
 |  __dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,or...
 |
 |  __hash__ = None
 |
 |  __match_args__ = ('quantiles', 'probabilities', '_n', '_d', '_sf', '_k...

So we see that:

quantiles is a numpy array (that’s what ndarray means) storing the values of the dataset, in this case daily average temperature, at which the empirical CDF has been calculated
probabilities is a numpy array storing the values value of the CDF at each of those quantiles. (The “survival function” or (SF) is simply one minus the CDF, which we aren’t interested in here.)

Alright, let’s finally plot that:

fig, ax = plt.subplots()  # create the overall figure and the set of axes we'll plot on.
ecdf_temp_max.cdf.plot(ax=ax)  # plot the CDF onto the `ax` object we just created.
ax.set_xlabel("daily max temp [deg F]")  # label the x axis
ax.set_ylabel("empirical CDF")  # label the y axis

Text(0, 0.5, 'empirical CDF')

../_images/373e0a76181ef9ff959e797f6057a01057648c18f71f81b8293992666fb6a475.png

Alternative: use ax.plot

Here, again, there are multiple ways of accomplishing the same thing. Above, we used the builtin plot method of the ecdf_temp_max.cdf object. We could instead have used matplotlib functions: ax.plot(ecdf_temp_max.cdf.quantiles, ecdf_temp_max.cdf.probabilities) to get essentially the same thing.

Empirical PDFs#

Recall that an empirical probability density function is really just a histogram, crucially with the count in each bin normalized, i.e. divided, by the width of that bin.

We can get this most directly using matplotlib’s hist function and setting density=True:

plt.hist(ds_cp["temp_max"], bins=15, density=True)
plt.xlabel("daily max temp [deg F]")
plt.ylabel(r"probability density [(deg F)$^{-1}$]")

Text(0, 0.5, 'probability density [(deg F)$^{-1}$]')

../_images/791f87ea3beda53204d90724b23c4e83b95e92a4d5e8d7b237f96379e7b32806.png

Lab 03: computing empirical probabilities, empirical CDFs, and empirical PDFs

Contents

Lab 03: computing empirical probabilities, empirical CDFs, and empirical PDFs#

Preliminaries#

Notebook magic commands#

Imports#

Check that your version of scipy has the `scipy.stats.ecdf` function, which was introduced in 2023#

Load the Central Park data into this python session#

Empirical probabilities#

Technical aside: jupyter question mark `?` vs. built-in `help` function#

Example: days with average temperature > 70F#

Selecting individual months of the year#

Empirical CDFs#

Empirical PDFs#

Lab 03: computing empirical probabilities, empirical CDFs, and empirical PDFs

Contents

Lab 03: computing empirical probabilities, empirical CDFs, and empirical PDFs#

Preliminaries#

Notebook magic commands#

Imports#

Check that your version of scipy has the scipy.stats.ecdf function, which was introduced in 2023#

Load the Central Park data into this python session#

Empirical probabilities#

Technical aside: jupyter question mark ? vs. built-in help function#

Example: days with average temperature > 70F#

Selecting individual months of the year#

Empirical CDFs#

Empirical PDFs#

Check that your version of scipy has the `scipy.stats.ecdf` function, which was introduced in 2023#

Technical aside: jupyter question mark `?` vs. built-in `help` function#