Getting Started with Reproducible Research…

…that’s good enough

Goal: Make Research Reproducible

What?

The Turing Way project illustration by Scriberia. DOI.

Different kinds of “reproducible”

DOI.

Today, we’re focused on “Reproducible”

Goal: Make Research Reproducible

Why?

Goal: Make Research Reproducible

How?

Goal: Make Research Reproducible

How?

Lots of ways!

Barriers to Reproducibility

If reproducibility is such a great thing to ensure in research, why are we here in 2024 learning about it (possibly for the first time)?

Barriers to Reproducibility

If reproducibility is such a great thing to ensure in research, why are we here in 2024 learning about it (possibly for the first time)?

A slide outlining some of the barriers to reproducible research from Kirstie Whitaker’s talk about The Turing Way at csv,conf,v4 in May 2019. DOI

Reproducibility is hard

Fortunately, reproducibility is not “all-or-nothing.”

Even some reproducibility is better than none!

Today, let’s focus on “good enough”: the least amount of work to get a respectably reproducible research project.

What we’ll cover

  1. 🔁 Version control
  2. 🪪 Licensing
  3. 🌅 Environments

🔁 Version Control

Version control is a workflow where the entire history of a set of documents is preserved.

Why do we want our work under version control?

Why version control

Tracking project history. DOI.

  • Provenance
  • Version history
  • Hide older versions
  • Distributed work

Create an account on GitHub

Create a repo

Create a repo

Fill out the form information.

  • Repo name: Something pithy that describes the project you’re working on.
  • Description: A one-line summary of what you’re using this repo for.
  • Public / Private: Suggest “public” by default, but “private” during development and “public” during and after review also makes sense.
  • Add a README: This is where you’ll go into detail about what your repo is doing. Definitely add this!
  • Add .gitignore: A bit more technical; depends on the work you’re doing.
  • Choose a license: Absolutely yes–we’ll go into more detail in the next section.

Create a folder structure

  • This is a suggestion!
  • Use what is relevant, ignore what is not
  • (PSO: Don’t put full data in version control. Sample data is great! Or, add data/ to .gitignore)

Add everything to version control

Version control workflow

One branch. DOI.

Many branches. DOI.

🪪 Licensing

Licensing is how to spell out the rights of others to use, modify, or build on our work.

Why is it important to give our work (code, data, content) a license?

Choosing a license

How does one pick a license?

Fortunately, there’s a handy flowchart!

https://choosealicense.com/

Adding a license file to your repo

You can do this right when you create the repo!

🌅 Environments

Making your environment reproducible means configuring your work space–code, software, programs, even operating system–so that it is identical to the environment in which the research was originally done.

Why is it important to have reproducible environments?

How to “capture” an environment

Computational environments. DOI.

  1. Virtual machines
  2. Docker / containers
  3. Conda / Mamba
  4. Binder

Virtual machines

VirtualBox or Vagrant

Containers

Containers are often pitched as “lightweight VMs”

Build on millions of pre-built containers at DockerHub, or build your own from scratch via Dockerfiles.

Package management systems

Conda and Mamba are most well known on the Python end

Use YAML to specify the environments

Binder

Converts a public repository into an active notebook environment, all via the web

Just specify the environment, and Binder does the rest!

So much more

Continuous integration, code testing, open access, study preregistration.

Links where you can learn more at the end of the slides.

Thank you!

Any questions?

Take this survey before leaving today!

Next up

  • Grab some ☕️ and 🍩
  • Dr. Kyle Johnsen on collaborative exercises around reproducible research practices

Resources