HW04: Fitting normal distributions for each calendar month and fitting GEV to annual maxima

HW04: Fitting normal distributions for each calendar month and fitting GEV to annual maxima#

DUE Wednesday, October 22nd

Introduction#

In this assignment, you’ll first answer some questions about the fundamental concepts underlying probability theory, and then you’ll compute various empirical probabilities using the Central Park weather dataset.

Your specific tasks#

Normal distribution fits throughout the year#

For each of the 12 calendar months, January through December, do the following:

[ ] fit a normal distribution to the daily maximum temperature from the Central Park weather station dataset, for all days in that month across all years
[ ] Plot the histogram (with density=True) for that month, and overlay the curve of the fitted PDF

After you’ve done that, make two more plots. One is for the sample mean, one is for the sample standard deviation. For the sample mean:

[ ] Plot the sample mean for each calendar month as a function of month. So the x-axis is month, numbered 1-12, and the y-axis is the mean.
[ ] On the same axes, plot the mean from the fitted normal distribution.

For the sample standard deviation, do the exact same:

[ ] Plot the sample standard deviation for each calendar month as a function of month. So the x-axis is month, numbered 1-12, and the y-axis is the sample standard deviation.
[ ] On the same axes, plot the standard deviation from the fitten normal distribution.

Put these panels right next to each other in one single pyplot.Figure object: use plt.subplots for this.

Last, once you’ve done this, plot the histogram for all days. You’ll see that, as we saw in one of the lectures, this has a double peak structure. Based on your histograms and fitted normal distributions for each of the 12 calendar months, explain in a few sentences how this double-peaked structure for the whole year comes about.

Note that you don’t need to appeal to physical processes or arguments…make your arguments solely as regards how the sample means and standard deviations vary across the twelve months. (Hint: consider, what would the annual-mean distribution look like if the standard deviation was constant across months, and the mean shifted smoothly up and down? Would that get you two peaks or not?)

Block maxima and other metrics of extremes for diurnal temperature range#

Compute the diurnal temperature range by taking the daily maximum temperature minus the daily minimum temperature.

Then, compute the following metrics of extreme values for this new variable:

block max (single largest value in each calendar year)
an exceedance count: the number of days exceeding the climatological 95th percentile, meaning the 95th percentile computed using all days across all years
the exceedance count again but using the 99.9th percentile
The 99th percentile value computed for that individual year

Compare these different metrics of extremes. Describe in a few sentences the extent to which they behave similarly vs. differ from one another. This is an important part of extreme value analysis: making sure that your results don’t sensitively depend on the specific definition of “extreme” or specific threshold

GEV fit for the block maxima#

Use scipy.stats.genextreme to fit a GEV to the block max computed just above for the diurnal temperature range. Plot the normalized histogram of this block max and overlay the fitted GEV curve. Describe your impressions of the goodness of fit based on visual inspection of this plot.

How to submit#

Submit via this Google form

Extra credit#

None this time! Spend the extra time preparing for Friday’s midterm :)