Overview
Teaching: 30 min
Exercises: 10 minQuestions
What tools will we be using?
How can we use these tools to improve reproducibility?
Objectives
Learn to use
R
,RStudio
andRMarkdown
.
R
+ RStudio
R
?There are a number of other great programming tools out there that can also be used to improve the reproducibility of your analysis
The key is to use some type of language that will allow you to automate and document your analysis
Once you master one language you’ll probably find it easier to learn another
R
You could just type into the command prompt… - … but that doesn’t help much with documentation. - … but that doesn’t help much with automation.
R
- Gives you a bunch of really cool features that we’ll explore throughout the workshop.Packages are the fundamental units of reproducible R code. They include reusable R
functions, the documentation that describes how to use them, and (often) sample data. (From: http://r-pkgs.had.co.nz)
We will use the ggplot2
package for plots and dplyr
for data wrangling in this session. Both packages come as part of the tidyverse
suite of packages.
If you have not yet done so, install this package by running the following in the Console:
install.packages("tidyverse")
Goals of the demo:
RMarkdown
)R
commands, but rather getting the big picture of how using R
in this way facilitates reproducible analysesOpen intro-template.Rmd
Click on Knit HTML
to compile the document
Important features:
RMarkdown
syntax
Great news!? We just received some more data, in bits and pieces of course:
gapminder-7080.csv
gapminder-90plus.csv
Let’s walk through generation of new plots for the 1970s and 1980s and 1990s plus (these new analyses are already in the intro-tutorial.Rmd
document).
Note that all code required to accomplish these tasks is also in the template. You do not need to come up with the R code, knit the document to combine the datasets and you’ll see that the code required for recreating the plots is the same as above. That’s the beauty of RMarkdown
!
RMarkdown
and providing data sources, or just simply providing the generated HTML of just a summary of the analysis is neededReproducibility checklist
- Serves as a tool to help you think about the reproducibility of your data analysis.
- Many of the questions can be thought of as having a yes/no answer.
- A better approach would be to see the questions as being open ended with the real question being, “What can I do to improve the status of my project on this bullet point?”
- With that in mind, you’ll never get 100% of the bullets right for your project, but you’ll always be improving.
Key Points
R
,RStudio
andRMarkdown
allow for powerful reproducible research.