Scraping Data R Packages
Opens URLs or Interacts with APIs
- httr - wrapper for Rcurl to work with web APIs.
- jsonlite - Parse JSON-formatted data and interact with web APIs. Started out as a forked version of RJSONIO, but is now completely re-written.
- Rcurl - Fetches URLs and parses results
- RSelenium - Interface with the Selenium Webdriver API
Parses HTML/XML
- rvest - Parse html pages from web and use data in R. Inspired from the python modules Beautiful Soup and RoboBrowser.
- XML and XML2R - Parse and generate XML pages
Reads in specific files
read.table(“myTextFile.txt”, header = TRUE) # reads in text files read.csv(“myCSVFile.csv”, header = TRUE, sep = “,”) # reads in CSV files
-
readxl - Reads in
.xls
and.xlsx
files. > library(readxl) > excel_sheets(“myExcelFile.xlsx”) # lists all sheets in file > read_excel(“myExcelFile.xlsx’” sheet = 1, col_names = TRUE, skip = 0) # reads in sheet 1 in file By default, blank cels are converted to missing values.
R packages to scrape data
- bbscrapeR - R package for collecting NBA play-by-play, shot location, and some Sport VU data.