Scraping Data

Opens URLs or Interacts with APIs

httr - wrapper for Rcurl to work with web APIs.
jsonlite - Parse JSON-formatted data and interact with web APIs. Started out as a forked version of RJSONIO, but is now completely re-written.
Rcurl - Fetches URLs and parses results
RSelenium - Interface with the Selenium Webdriver API

Parses HTML/XML

rvest - Parse html pages from web and use data in R. Inspired from the python modules Beautiful Soup and RoboBrowser.
XML and XML2R - Parse and generate XML pages

Reads in specific files

read.table(“myTextFile.txt”, header = TRUE) # reads in text files read.csv(“myCSVFile.csv”, header = TRUE, sep = “,”) # reads in CSV files

readxl - Reads in .xls and .xlsx files. > library(readxl) > excel_sheets(“myExcelFile.xlsx”) # lists all sheets in file > read_excel(“myExcelFile.xlsx’” sheet = 1, col_names = TRUE, skip = 0) # reads in sheet 1 in file By default, blank cels are converted to missing values.

R packages to scrape data

bbscrapeR - R package for collecting NBA play-by-play, shot location, and some Sport VU data.

Resources