Why R Markdown for reporting
R scripts are great for doing analysis, but they do not automatically produce a polished, shareable document. R Markdown lets you combine narrative text (what you did and why), code (how you did it), and results (tables, numbers, plots) in one file that can be re-run end-to-end. This makes your work reproducible: if the data changes or you refine a step, you can re-knit the report and update everything consistently.
An R Markdown file has three main parts: a YAML header (document settings), narrative text (written in Markdown), and code chunks (R code that runs during knitting). You “knit” the document to produce an output such as HTML (most common for sharing) and optionally PDF.
R Markdown file anatomy
1) YAML header: output and metadata
The YAML header is the block at the top between --- lines. It controls the output format and document-level settings.
---
title: "Analysis Report"
author: "Your Name"
date: "2026-01-16"
output:
html_document:
theme: readable
toc: true
toc_depth: 2
number_sections: true
---Common YAML options you will use often:
- output: choose
html_document,pdf_document, or multiple outputs. - toc: table of contents for navigation.
- number_sections: makes long reports easier to reference.
- theme: basic styling for HTML output.
To render both HTML and PDF from the same source file, list both outputs:
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
---
title: "Analysis Report"
output:
html_document:
toc: true
pdf_document:
toc: true
---Note: PDF output requires a LaTeX installation. If you cannot render PDF yet, focus on HTML first.
2) Narrative text: explain the analysis
Between code chunks, write short, reader-focused paragraphs. A good report answers: What is the question? What data is used? What steps were taken? What do the results mean? Keep text close to the results it describes so the reader does not have to scroll back and forth.
3) Code chunks: run code and show results
Code chunks look like this:
```{r}
# R code here
```Chunks can be named to make debugging easier and to help you navigate:
```{r load-data}
# load data here
```Inline results: put computed values in sentences
Inline code lets you insert a computed value directly into your narrative. This is useful for key numbers you want to highlight without manually copying them.
Example (written in the R Markdown document):
The dataset contains `r nrow(df)` rows and `r ncol(df)` columns.During knitting, R runs the expression and replaces it with the value. Inline results reduce copy/paste errors and keep the report consistent when data updates.
Controlling what appears: chunk options
Chunk options control whether code is shown, whether messages/warnings appear, and how figures are displayed. Options are written inside the chunk header.
Key options you will use constantly
- echo: show (
TRUE) or hide (FALSE) the code. - message: show/hide messages (often produced when loading packages).
- warning: show/hide warnings.
- include: run the code but hide both code and output (useful for setup).
- fig.width, fig.height: control plot size.
- fig.cap: add a figure caption.
Examples:
```{r setup, include=FALSE}
# runs but does not print anything
``````{r load-packages, echo=FALSE, message=FALSE, warning=FALSE}
library(dplyr)
library(ggplot2)
``````{r plot, fig.width=7, fig.height=4, fig.cap="Distribution of values"}
# plotting code
```Set global options once (recommended)
Instead of repeating options in every chunk, set global defaults in a setup chunk near the top. This keeps your report consistent and reduces clutter.
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE,
fig.width = 7,
fig.height = 4
)
```You can still override options for a specific chunk when needed.
Organize sections to mirror the analysis pipeline
A reader-friendly report follows the same sequence you would use in a clean analysis workflow. Use headings to create a predictable structure. A practical template is:
- Objective: what you are trying to learn/measure.
- Data: what dataset is used and what key fields mean.
- Preparation: cleaning, filtering, derived variables.
- Analysis: summaries, comparisons, models (if any).
- Results: tables and plots with interpretation.
- Notes: assumptions, limitations, next steps (only if needed).
Keep heavy code in chunks, but keep the narrative focused on decisions and interpretation. If a chunk is long, consider splitting it into smaller chunks that match the section’s intent (e.g., one chunk for creating a summary table, another for plotting).
Structured reporting assignment: build and knit a complete report
This assignment walks you through turning an analysis workflow into a reproducible HTML report. You will import a dataset, transform it, produce at least one table and one plot, and knit to HTML (optionally PDF) with a clean layout.
Assignment goal and dataset
Goal: create a short report that summarizes flight delays by carrier and visualizes average arrival delay.
Dataset: nycflights13::flights (built-in package dataset). This avoids file download issues while still demonstrating a realistic reporting workflow.
Step 1: Create a new R Markdown document
In RStudio: File → New File → R Markdown… Choose HTML output. Save the file as flight-delay-report.Rmd.
Replace the default content with the following structure. Start with a YAML header that enables a table of contents and readable styling:
---
title: "Flight Delay Report"
output:
html_document:
toc: true
toc_depth: 2
theme: readable
number_sections: true
---Step 2: Add a setup chunk with global options
At the top of the document (after the YAML), add a setup chunk. This keeps the report clean by hiding package startup messages and warnings, and standardizes figure size.
```{r setup, include=FALSE}
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE,
fig.width = 7,
fig.height = 4
)
```Step 3: Write an Objective section with inline results placeholders
Add a short objective section. You will fill inline results after you create the objects.
## Objective
This report summarizes flight delays and compares average arrival delay across carriers.
Key questions:
- How many flights are included after filtering?
- Which carriers have the highest average arrival delay?
- What does the distribution of average delays look like?Step 4: Load packages and data in a dedicated chunk
Create a chunk that loads packages and the dataset. Keep it early so the rest of the report can rely on these objects.
```{r load-data}
library(dplyr)
library(ggplot2)
library(nycflights13)
flights_raw <- nycflights13::flights
```Step 5: Data preparation section (filter and transform)
Create a section that prepares the data. For reporting, be explicit about filtering rules so the reader knows what is included.
## Data preparation
We focus on flights with non-missing arrival delay and carrier information.
```{r prepare-data}
flights_clean <- flights_raw %>%
filter(!is.na(arr_delay), !is.na(carrier)) %>%
mutate(
delayed = arr_delay > 0
)
n_flights <- nrow(flights_clean)
```
After filtering, the analysis includes `r n_flights` flights.This demonstrates a common reporting pattern: compute a value in a chunk, then reference it inline in the text.
Step 6: Create a summary table (at least one table)
Build a carrier-level summary table. Keep the table focused: include counts and a few interpretable metrics.
## Summary table
```{r carrier-summary}
carrier_summary <- flights_clean %>%
group_by(carrier) %>%
summarise(
flights = n(),
avg_arr_delay = mean(arr_delay),
median_arr_delay = median(arr_delay),
pct_delayed = mean(delayed) * 100,
.groups = "drop"
) %>%
arrange(desc(avg_arr_delay))
# Print a reader-friendly subset: top 10 by average delay
head(carrier_summary, 10)
```
The carrier with the highest average arrival delay in this filtered dataset is `r carrier_summary$carrier[1]` with an average delay of `r round(carrier_summary$avg_arr_delay[1], 1)` minutes.If you want a cleaner-looking HTML table, you can use a formatting helper. The following uses knitr::kable(), which is lightweight and works well in HTML:
```{r carrier-table, echo=FALSE}
knitr::kable(
head(carrier_summary, 10),
digits = 1,
caption = "Top 10 carriers by average arrival delay"
)
```Notice echo=FALSE: the reader sees the table, not the code that generated it.
Step 7: Create a plot (at least one plot) with clear labeling
Make a plot that compares carriers. A simple and effective choice is a horizontal bar chart of average arrival delay for the top carriers by delay (or by flight count). This example plots the top 10 by average delay.
## Plot: average arrival delay by carrier
```{r delay-plot, echo=FALSE, fig.cap="Average arrival delay (minutes) for the 10 carriers with the highest average delay."}
plot_data <- carrier_summary %>%
slice_max(avg_arr_delay, n = 10) %>%
mutate(carrier = reorder(carrier, avg_arr_delay))
ggplot(plot_data, aes(x = avg_arr_delay, y = carrier)) +
geom_col(fill = "steelblue") +
labs(
x = "Average arrival delay (minutes)",
y = "Carrier",
title = "Average arrival delay by carrier"
)
```Because the chunk uses echo=FALSE, the report stays readable while still being reproducible (the code runs during knitting).
Step 8: Add a short Results narrative tied to outputs
Write a few sentences interpreting the table and plot. Keep interpretation close to the outputs and reference computed values inline where helpful.
## Results
The top 10 carriers by average arrival delay show meaningful differences in typical performance.
Across all included flights, the overall average arrival delay is `r round(mean(flights_clean$arr_delay), 1)` minutes, and `r round(mean(flights_clean$delayed) * 100, 1)`% of flights arrive late (arrival delay > 0).Step 9: Knit to HTML (and optionally PDF)
To generate the shareable report:
- Click the Knit button in RStudio to produce HTML.
- The HTML file will be created in the same folder as the
.Rmdfile. - If you added
pdf_documentto the YAML, choose Knit → Knit to PDF (requires LaTeX).
When the report is knitted, R runs chunks from top to bottom in a fresh session. If your report only works when you run chunks out of order in the console, it is not yet fully reproducible. A quick check is to restart R and knit again.
Common workflow tips for clean, reproducible reports
Keep setup separate and quiet
Use a single setup chunk with include=FALSE for global options. Load packages in a dedicated chunk with message=FALSE so the report does not show startup text.
Name chunks and keep them short
Chunk names like load-data, prepare-data, carrier-summary, and delay-plot make it easier to debug knitting errors. Short chunks also make it easier to align code with the narrative sections.
Use chunk options intentionally
- Use
echo=FALSEfor presentation chunks (tables/plots) when the code is not needed for the reader. - Use
warning=TRUEtemporarily if you are diagnosing an issue; then turn warnings off again if they are expected and not actionable. - Use
include=FALSEfor chunks that set objects but should not display anything.
Make the report self-contained
Assume the reader only sees the knitted output. Define all objects inside the document, avoid relying on manually created objects in your global environment, and prefer relative paths if you do read external files.