Parallel programming
and dealing with large data…
Pre-lecture activities
Tip
In advance of class, please install
future- this provides a unified parallel framework in R consistent
In addition, please read through
- Strategies for dealing with large data
- https://www.futureverse.org/packages-overview.html (just the
futureR package)
NoteHow much should I prepare for before class?
You should have future installed and be familiar with the three basic functions - plan(), future(), and value().
We will learn more about these functions in class.
Lecture
Acknowledgements
Material for this lecture was borrowed and adopted from
Learning objectives
NoteLearning objectives
At the end of this lesson you will:
- Understand the basics of parallel computing
- Become familiar with basic functions in the
futurepackage - Recognize different file formats to work with large data not locally
- Implement three ways to work with large data:
- “sample and model”
- “chunk and pull”
- “push compute to data”
Slides
Class activity
For the rest of the time in class, you and your team will work on the final project. Stephanie will walk around to answer questions and happy to help in any way!
Post-lecture
If you would like more practice using the future package, there are two tutorials for you to work through own your own developed by Henrik Bengtsson from the UseR! 2024 conference: