Final project requirements#

Introduction#

Overview#

In the final project, you will apply some of the statistical methods learned throughout the course to a real-world problem in Earth and Atmospheric Sciences of your choosing. In consultation with the professor, you will select a topic of interest; then you will conduct data analysis, write a final report (in the form of a self-contained, fully executable Jupyter Notebook), and present your findings to the class.

The grade will be based on the final report and presentation, both equally weighted. The presentations will be split over two class periods, tentatively scheduled for Monday, December 4th and Wednesday, December 6th.

Topic and dataset#

By now, each of you has already met with the professor, agreed upon a scientific topic, and selected one or more datasets to analyze.

Final report#

Due date: Friday, December 15th, by 11:59pm ET#

Overall scope#

There are two goals of the final report:

  1. (Most important) demonstrate your mastery of select concepts (listed below) we have learned in class, by applying them to your dataset in a way that is in service of the scientific question you seek to answer.

  2. Use those calculations to actually make some progress toward answering your chosen scientific question.

Note, however, that you will not be directly graded on the scientific outcomes. It could be that your analyses generate statistically unclear signals, even though you did them correctly. That still is progress: it’s often just as important to know what things are not related as what things are related.

Report format: a Jupyter Notebook#

You will write the final report as a Jupyter notebook file, with your text as Markdown cells and your analysis code all included as code cells. See sections further below for style and formatting guidelines.

What you will submit: the notebook itself plus a freshly executed version of your notebook exported to HTML#

The steps for submission are:

  1. Create your report as a Jupyter Notebook.

  2. Once your report is 100% ready, reset your Jupyter kernel and re-run the whole notebook from start to finish. (There is an option under the “Kernel” drop-down that does this for you: “Restart Kernel and run all cells”)

  3. At that point, export it to an HTML file using Jupyter’s built-in exporting features.

  4. Upload both the original notebook file (.ipynb) and the HTML export (.html) following the link provided on Blackboard.

Warning

Points WILL be deducted if the cell blocks in your notebook output do not show that they were executed all in order immediately following a reset of your kernel. In other words, the number to the left of the first code cell MUST be [1], the next code block must be [2], etc. etc. until the very end of the notebook.

The reason for this is to ensure that the results of your code cells aren’t accidentally being affected by running first some cells earlier in your notebook, then others later, etc., which otherwise happens all the time working with Jupyter notebooks. The only way to be certain this isn’t the case is following step 2 above.

Notebook contents#

Like the homework assignments, the notebook will be a mixture of Markdown cells containing mostly text and code cells that perform your analyses and generate your plots.

Notebook structure: same as a normal written report would be#

Though it is written as an interactive Jupyter Notebook, your final report should be organized and read as if it were a standard scientific report written in say Microsoft Word or LaTeX. This means it should:

  • be organized into labeled and numbered sections and subsections (Use hash-signs in markdown cells for this: starting a line with # makes it a top-level heading; ## 2nd level heading, ### 3rd level, etc.)

  • be written in complete English sentences, organized into paragraphs and sections.

  • be free of grammatical, spelling, or other similar errors.

Perhaps a useful way to think about it is: imagine that the code blocks were stripped out, and only their outputs kept. The resulting document should look and read basically like an old-school printed final report would look and read.

Code blocks: excess code, style, commenting, etc.#

Your code should be pruned down to only the lines required to generate the values you report and plots that you generate. All other blocks or lines of code should be removed! Otherwise it makes understanding and ultimately grading your code much more difficult.

Adding explanatory comments or adhering to style conventions is less important. Of course, you are encouraged to include helpful comments as appropriate and to use style-enforcement tools such as black (which you can configure to automatically run your whole notebook via the jupyterlab_code_formatter tool.

Required calculations to incorporate#

Note

This list is now final.

The following calculations must be executed, presented, and described:

From Intro and numeracy:

  • thorough explanation of the dataset: source, time span, location, physical quantities being observed, instruments used to measure them

  • any other salient metadata: e.g. changes in instrumentation, change in location, calibration issues, spatial coverage

From Descriptive statistics:

  • mean and median of at least one key quantity

  • range, IQR, sample variance, sample standard deviation of at least one key quantity

  • skewness and kurtosis of at least one key quantity

From Data visualization:

  • at least one each of: histogram, boxplot, scatterplot, and timeseries (unless the data are not defined in time)

From Probability theory:

  • at least two empirical unconditional probabilities

  • at least two empirical conditional probabilities

From Probability distributions:

  • empirical PDFs and CDFs

  • a fitted normal distribution

From Hypothesis testing:

  • at least one \(t\) test of differences in means between two samples

From Linear regression:

  • at least two correlation coefficients

  • at least one variable modeled by linear regression on another variable

From Time series:

  • discussion of how the decomposition into deterministic and random components of a time series would be applied to at least one key variable, even though you don’t have to actually perform that decomposition

  • calculation and discussion of the autocorrelation function of at least one key variable

From Spectral analysis:

  • a periodogram computed, plotted, and discussed for at least one key variable

  • a running average applied to your periodogram

Grading#

(credit: copied nearly verbatim from *Teaching Statistics: A Bag of Tricks by Andrew Gelman and Deborah Nolan)

Rubric#

The table below is a competency matrix for this report. The first column describes each critical task for the assignment, and the 2nd, 3rd, and 4th columns respectively describe what work in that task would constitute Needing Improvement, Basic Competency, and Surpassed Expectations.

Critical task

Needs Improvement

Basic

Surpassed

Computation. Perform computations necessary for the data analysis.

Computations contain errors and extraneous code.

Comptations correct but contain extraneous/unnecessary code.

Computations correct, clear, and properly labeled.

Analysis. Choose and carry out analysis appropriate for data and context.

Choice of analysis is overly simplistic, irrelevant, inappropriate for the data, or missing key component.

Analysis appropriate, but incomplete and important features and assumptions not made explicit.

Analysis appropriate, complete, advanced, relevant, and informative.

Synthesis. Identify key features of the analysis, and interpret results in context.

Conclusions are missing, incorrect, or not made bade on analysis

Conclusions reasonable, but partially correct or partially complete.

Relevant conclusions explicitly connected to analysis and context.

Visual. Communicate findings graphically clearly, precisely, and concisely.

Inappropriate choice of plots; poorly labeled plots; plots missing

Plots convey information corretly but lack context for interpretation

Plots convey information correctly with adequate and appropriate reference information

Written. Communicate findings in writing clearly, precisely, and concisely

Explanation is illogical, incorrect, or incoherent

Explanation is partially correct but incomplete or unconvincing.

Explanation is correct, complete, and convincing.

Assigning points#

Basic competency in all five categories results in 85 points. Two points are added for each task in the Surpassed category, for up to three tasks. If a fourth task meets Surpassed, add an additional four points. If a fifth task meets Surpassed, an additional five points.

Similarly, two points are deducted for each competency in the Needs Improvement category, for up to three tasks. If a fourth task meets Needs Improvement, deduct an additional four points. If a fifth task meets Needs Improvement, deduct an additional five points.

As such, the maximum possible score is 100, and the minimum possible score is 60.

In-class presentation#

Each student will present a “conference-style” oral presentation to the class summarizing their final project. “Conference style” means that it follows the format of a standard oral presentation typical of major conferences in Earth Sciences such as the American Geophysical Union Fall Meeting and the American Meteorological Society Annual Meeting.

Deadlines#

You must submit your final slides before midnight the night prior to your presentation. So that’s:

  • Sunday, December 3rd by 11:59pm if you’re presenting on 12/4

  • Tuesday, December 5th by 11:58pm if you’re presenting on 12/6

Specific submission instructions will be posted later.

The professor will download the submitted slides to his computer the morning before class, and everyone will use the same computer to present (rather than each person trying to connect their own computer to the A/V system one after the other).

Format#

Conference style means the following:

  • Total duration: 10 minutes

  • Presentation: 8 minutes

  • Questions from the audience: 2 minutes

(The more standard conference length is 15 minutes, 12 for the talk and then 3 for Q&A, but to get the whole class in we have to do a shorter style. Also, since COVID, more conferences are doing shorter talks for various reasons.)

Logistics#

The presentations will be split across two class days:

  • Monday, December 4th (6 students)

  • Wednesday, December 6th (7 students)

You must attend BOTH days to receive full credit, because you will be submitting a written question or comment on every other student’s presentation. These questions will be graded, as described below.

Presentation requirements#

You must include slides. These can be in Powerpoint, Keynote, Google Slides, or anything else. You will submit these and they will be evaluated on their own in addition to your actual delivery of the presentation. Guidelines below offer recommendations, but ultimately there are no hard requirements on the actual content of your slides.

Instructions for how to submit your slides has been posted to the course Blackboard.

Guidelines#

Note

Everything in this section is meant to be helpful, but none of it is strictly required. You can deviate from e.g. the slide template if you feel like your presentation will be better served by a different structure.

Presentation scope#

Whereas the written report for this project is meant to be fairly exhaustive, where you document all the important analyses that you performed, an oral presentation has to be more targeted. Eight minutes will fly by! So ask yourself: if you had to pick just one thing that you want to convey about your project, what would it be? Then build your talk around that.

Narrative#

Human beings are storytellers. We understand and retain things best when they are presented as a coherent narrative, with a beginning, middle, and end. (Actually, we retain them best of all when they are put to music, but I won’t ask you to sing your presentation.)

This approach of creating a narrative can be contrasted with what’s unfortunately more typical in scientific presentations (and teaching writ large): a “data dump” listing one fact after the other without linking things together.

So before you start hacking away at slides, a useful first step is to write down the thesis statement of your presentation: what’s the 1 sentence summary of what you want to convey? From there, you can craft a narrative with an intro that motivates that problem (“beginning”), a main, middle section that actually conveys it (“middle”), and an end that synthesizes the individual things you presented (“end”) and leaves the audience wanting more.

Presentation structure#

Tell the audience what you’re going to say, say it; then tell them what you’ve said.

—Dale Carnegie

(source: https://www.genardmethod.com/blog/bid/192061/how-to-open-a-presentation-tell-em-what-you-re-going-to-say)

A good rule of thumb is 1 minute per slide on average over your whole presentation. So for an 8 minute talk (not 10! the last 2 are for Q&A), you should aim for 8 slides. This suggests the following template:

  1. Title slide: who are you, what’s the overall topic you’ll be presenting

  2. Motivation: what is the big-picture topic you’re addressing, and why is it important and/or interesting? (I.e. why should we care?)

  3. Introducing your project (what, in broad strokes, did you do to address the topic?) and talk outline (“tell the audience what you’re going to say”): \(\leq\)1 sentence summary of each of your \(\leq\)3 main points. (In a longer talk, these would be split into separate slides.)

  4. Main point 1

  5. More on main point 1

  6. Main point 2

  7. More on main point 2 (or maybe a 3rd main point if you have it.)

  8. Recap (“tell them what you’ve said”) and Discussion (Where is this going / could this go from here? What are the implications?) (In a longer talk, these would be split into separate slides.)

Slides#

Some guidelines:

  • Less is more.

  • Make each slide do one thing, not multiple things.

  • Make each slide’s title a complete sentence that summarizes the main point you want the slide to convey

  • Some text is helpful, but usually people include too much. Boil it down to the essentials.

  • Plots: describe in words every single image you include. This is for accessibility, but also because otherwise the audience won’t be able to t

  • Make all text, plot labels, plotted symbols, and images big enough that everyone in the room can read them.

  • Give your slides some breathing room: a slide that’s totally full with

  • Dispense with “slidejunk”: you don’t need slide numbers, logos, the date, etc. on every slide. (Except for a logo of your institution on the title slide, you don’t even need these anywhere!)

  • If the professor’s own slides for this class fail to meet these recommendations sometimes, well, “Do as I say, not as I do” ;)

Delivering the presentation#

  • Try not to worry! Public speaking can be intimidating, but especially in this setting everyone, the professor and the other students, are there to support you and learn from you.

  • Practice the talk at least once ahead of time with a timer. Make sure that you’re within the 12 minute time limit. Nobody likes a talk that goes way beyond its allotted time; it’s rude to the audience and the other presenters.

  • Don’t be afraid of silence. For the audience, it’s actually a huge relief when a speaker takes a few seconds between slides or to take a sip of water. It helps the audience take a second to gather their thoughts.

Answering questions#

  • It can be helpful for everybody, yourself included, to repeat the question back to the person in your own words for two reasons: (1) you make sure everyone in the audience heard it. (2) You make sure that you interpreted the question correctly.

  • Once you’ve confirmed you understand what they’re asking, take a second (or a few)! There’s no need to answer as soon as the last word is out of their lips.

  • If you don’t know the answer to a question, that’s OK! Take a few seconds to think hard about it, and then just give it your best shot.

Grading, your presentation#

Your presentation will be graded based on the following:

  • Narrative quality: do you tell a single, coherent “scientific story” with a clear beginning, middle, and end? Or do you try to pack in too many different things?

  • Slide quality: does each slide convey a message in service of your story? Is there enough text, plots, etc. on each slide to convey that message? Is there too much on each slide for the audience to digest? Are the fonts big enough?

  • Length: did you complete your slides within the 8 minute time limit? Or did you go over? (To ensure we get through them all, you’ll get 2-minute and 1-minute warnings, and at the 8 minute mark I’ll ask you to wrap up essentially right away, regardless of how far you’ve gotten.)

  • Answering questions: do you make a good-faith attempt to understand and address each question? Do your answers cohere with what you presented?

Grading, participation#

You will be required to submit in writing one question and one piece of constructive criticism for each classmate’s presentation.

  • Question: what in the presentation would you like to know more about? Or something you didn’t understand that you’d like clarified?

  • Constructive criticism: this could be positive—something the presenter did well. Or, if delivered respectfully and fairly, something you’d recommend changing or that didn’t quite land for you.

Afterwards, all students will be provided the ANONYMIZED questions and answers from their classmates.

Instructions for how to submit this feedback has been posted to the course Blackboard.