Parallel programming

and dealing with large data…
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

December 4, 2025

Pre-lecture activities

Tip

In advance of class, please install

  • future - this provides a unified parallel framework in R consistent

In addition, please read through

NoteHow much should I prepare for before class?

You should have future installed and be familiar with the three basic functions - plan(), future(), and value().

We will learn more about these functions in class.

Lecture

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

NoteLearning objectives

At the end of this lesson you will:

  • Understand the basics of parallel computing
  • Become familiar with basic functions in the future package
  • Recognize different file formats to work with large data not locally
  • Implement three ways to work with large data:
    1. “sample and model”
    2. “chunk and pull”
    3. “push compute to data”

Slides

Class activity

For the rest of the time in class, you and your team will work on the final project. Stephanie will walk around to answer questions and happy to help in any way!

Post-lecture

If you would like more practice using the future package, there are two tutorials for you to work through own your own developed by Henrik Bengtsson from the UseR! 2024 conference: