Crafting an Effective Report

Materials for class on

2024-12-03

Agenda

Today we’ll focus on:

  • Sample Reports
  • Effective Reporting

More Sample Reports

Poll

What’s one feature that you find very effective in the reports you looked at?

Effective Report Ingredients

A data analysis report should be a stand-alone document that answers, or at least addresses, a specific set of questions. The reader should be able to understand it without reading other materials.

It depends on the audience how much detail you would include about how you arrived at your answers. If you are sharing a report with someone who doesn’t know R, you probably would hide all of the actual R code chunks. If you’re sharing with a fellow researcher or analyst, however, it can be very useful to show the code as well as the outputs. Code folding in html reports can give the best of both worlds. For example:

Show the code
```{r}
#| code-fold: true
#| code-summary: "Show the code"
#| warning: false

library(ggplot2)
library(hexSticker)
library(palmerpenguins)
flipper_hist <- ggplot(data = penguins, aes(x = flipper_length_mm)) +
  coord_cartesian(xlim = c(180, 226), ylim = c(0,22)) +
  geom_histogram(aes(fill = species), 
                 alpha = 0.5, 
                 position = "identity") + 
  scale_fill_manual(values = c("#820263","#f4f3f6","#2E294E")) +
  theme_void() + 
  theme(legend.position = "none")
s <- sticker(flipper_hist, 
             package="LING 343", 
             p_size=20, 
             p_color = "#f4f3f6",
             s_x=1, 
             s_y=.75, 
             s_width=1, 
             s_height=1,
             h_fill="#D90368", 
             h_color="#2E294E"
             )
plot(s)
```

Look familiar?

One way to check that your report is readable is to fold all of your code and see if it still makes sense! You can change all code to be hidden for your whole document by specifying echo: false in your top YAML header, as an execution option (see cell output documentation).

Everyone has a different style in what they write in the markdown text versus what they write in code comments, but if you assume code folding or code hiding, that should give some clues as to what should go where.

Allison Horst illustration of literate programming with Rmarkdown

Given this notion of a data analysis report, an effective one will require consideration of:

  1. the target audience (what they are looking for, what they can understand)
  2. the goal of the report
  3. access to high quality data

There is some variation, but generally reports should include at minimum:

  1. an introduction/overview

  2. a description of the relevant data and variables, their source

  3. well-organized sections with questions/answers, like scientific paper sections:

    • question
    • methods/tests
    • results (clearly labelled and annotated figures and tables)
    • discussion (prose explanation of interpretations)
  4. conclusions/general discussion

To be reproducible, a report should also include:

  • all packages used explicitly/visibly loaded (ideally with version numbers)
  • paths that are not based on setwd() or local machine file system (don’t have Jenny Bryan set your computer on fire)
  • use of inline code for accurate reporting even with data changes

Allison Horst illustration of project workflow vs. setwd()

Then to complete the reproducibility you usually need these additional resources shared together or linked to clearly:

  • the runnable source code (.qmd, .Rmd, R script, etc.)
  • all data analyzed in a non-proprietary format
  • any additional functions used (sourced) not in available packages
  • a project or other container that contains these resources at accessible paths
  • properly named files/objects that work across operating systems (no spaces or special characters)
  • ideally, literate code (using tools like Quarto or Rmarkdown) and readable (adhering to a style guide such as the Tidyverse Style Guide)