Introduction to the command-line

Introduction to the command-line for data analysis
module 1
week 2
command-line
Author
Affiliation

Department of Biostatistics, Johns Hopkins

Published

November 1, 2022

Pre-lecture materials

Read ahead

Read ahead

Before class, you can prepare by reading the following materials:

  1. Software Carpentry: The Unix Shell
  2. R Squared Academy

Acknowledgements

Material for this lecture was borrowed and adopted from

Learning objectives

Learning objectives

At the end of this lesson you will:

  • Understand what is a command shell and why would use one.
  • Explain how the shell relates to the keyboard, the screen, the operating system, and users’ programs.
  • Explain when and why command-line interfaces should be used instead of graphical interfaces.
  • Create, copy, move, rename, and delete files and folders.
  • Print and sort file contents.
  • Search for regular expressions in files.
  • Execute R commands and scripts in the command line.
Tip

You can practice your command-line skills with the Command Challenge

Introduction

We we use interact with computers, we often do so with a keyboard and mouse, touch screen interfaces, or using speech recognition systems.

The most widely used way to interact with personal computers is called a graphical user interface (GUI). With a GUI, we give instructions by clicking a mouse and using menu-driven interactions.

The problem with only working with GUIs is that while the visual aid of a GUI makes it intuitive to learn, this way of delivering instructions to a computer scales very poorly.

Example

Imagine the following task: for a literature search, you have to

  1. Copy the third line of one thousand text files in one thousand different directories
  2. Paste the lines into a separate single file.

Using a GUI, you would not only be clicking at your desk for several hours, but you could potentially also commit an error in the process of completing this repetitive task.

This is where we take advantage of the Unix shell.

The Unix shell is both

  1. A command-line interface (CLI)
  2. A scripting language

This allows such repetitive tasks to be done automatically and fast. Using the shell, the task in the literature example can be accomplished in seconds.

The Shell

The shell is a program (or environment) where users can type commands and the commands can be executed.

Another way of thinking about it is, a shell provides an interface between the user and the UNIX system.

Example types of shells
  • Bash (Bourne Again SHell). The most popular Unix shell is Bash (the Bourne Again SHell — so-called because it’s derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.
  • Zsh (Z SHell). Zsh is built on top of bash with some additional features including providing the user with more flexibility by providing various features such as plug-in support, better customization, theme support, spelling correction, etc. Zsh is the default shell for macOS and Kali Linux.

The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically.

Benefits:

  • Sequences of commands can be written into a script, improving the reproducibility of workflows.
  • The command line is often the easiest way to interact with remote machines and supercomputers.
  • Familiarity with the shell is near essential to run a variety of specialized tools and resources including high-performance computing systems.

Let’s get started.

Where to find the shell

  • If you are using Windows, by default, Windows does not use bash, but instead you will need to install one of several Windows-specific tools (like Git for Windows or PowerShell) to allow this kind of text-based interaction with your operating system.
  • If you are using macOS, Apple calls the shell ‘Terminal’. There is an application you can open called ‘Terminal’ and it also appears in a tab next to the R console in the RStudio IDE.

[Source]

Demo
  • Let’s open up the Terminal application and also show you where the Terminal is within RStudio.
  • Next, let’s show how to open up multiple terminals and close all terminals.
The Unix shell setup

You can follow these directions for setting up your shell for Windows, macOS, and Linux operating systems:

Opening the shell

When the shell is first opened, you are presented with a prompt, indicating that the shell is waiting for input.

Bash
$

The shell typically uses $ as the prompt, but may use a different symbol (for the purposes of the rest of the lecture, I will omit the $).

Important
  1. When typing commands in the shell, do not type the $, only the commands that follow it.
  2. After you type a command, you have to press the Enter key to execute it.

The prompt is followed by a text cursor, a character that indicates the position where your typing will appear.

Shell basics

So let’s try our first command, ls which is short for listing files. With R, we know how to do this with list.files() function in base R:

```{r}
list.files()
```
 [1] "analysis.R"           "combined_names.txt"   "index.qmd"           
 [4] "index.rmarkdown"      "package_names.txt"    "r_release.txt"       
 [7] "release_names.txt"    "secret_directory"     "soccer_directory"    
[10] "team_standings_3.csv" "team_standings.csv"  

This command will list the contents of the current directory where the lecture is located. In RStudio, we can write a bash code block like this:

```{bash}
ls
```

and the executed code block is this:

ls
analysis.R
combined_names.txt
index.qmd
index.rmarkdown
package_names.txt
r_release.txt
release_names.txt
secret_directory
soccer_directory
team_standings.csv
team_standings_3.csv
Note

If the shell can’t find a program whose name is the command you typed, it will print an error message such as:

ks
Error in running command bash

This might happen if the command was mis-typed or if the program corresponding to that command is not installed.

Next, lets learn to display

  • basic information about the user
  • the current date & time
  • the calendar
  • and clear the screen
Command Description R command
whoami Who is the user? Sys.info() / whoami::whoami()
date Get date, time and timezone Sys.time()
cal Display calendar
clear Clear the screen Ctrl + L

whoami prints the user id (i.e. the name of the user who runs the command). Use it to verify the user as which you are logged into the system.

whoami
stephaniehicks

date will display or change the value of the system’s time and date information.

date
Thu Nov  3 00:17:02 EDT 2022

cal will display a formatted calendar and clear will clear all text on the screen and display a new prompt.

cal
   November 2022      
Su Mo Tu We Th Fr Sa  
       1  2 _ _3  4  5  
 6  7  8  9 10 11 12  
13 14 15 16 17 18 19  
20 21 22 23 24 25 26  
27 28 29 30           
                      
Pro-tip

To clear the R console and the shell, we use Ctrl + L.

Getting help

Before we proceed further, let us learn to view the documentation/manual pages of the commands.

Command Description
nameofcommand -h For some nameofcommand command (only for some commands)
man nameofcommand Display manual pages (i.e. man) for the nameofcommand command
whatis Single line description of a command

man is used to view the system’s reference manual.

man date 
DATE(1)                          General Commands Manual                          DATE(1)

NAME
     date – display or set date and time

SYNOPSIS
     date [-jnRu] [-r seconds | filename] [-v [+|-]val[ymwdHMS]] ... [+output_fmt]
     date [-ju] [[[mm]dd]HH]MM[[cc]yy][.ss]
     date [-jRu] -f input_fmt new_date [+output_fmt]
     date [-jnu] [-I[FMT]] [-f input_fmt] [-r ...] [-v ...] [new_date]

DESCRIPTION
     When invoked without arguments, the date utility displays the current date and time.
Try it out

Let’s explore the manual pages of date in the command line to show you what that looks like.

  • We will figure out what is the argument to print out the number of seconds since the Unix epoch or 00:00:00 UTC on 1 January 1970.
  • We will figure out what is the argument to display the date in UTC.
## try it out 
Pro-tip

For most commands (but not all!), NAMEOFCOMMAND -h or NAMEOFCOMMAND --help will bring up a small guide to command options.

For example, python -h or python --help bring up:

usage: python [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-b     : issue warnings about str(bytes_instance), str(bytearray_instance)
         and comparing bytes/bytearray with str. (-bb: issue errors)
-B     : don't write .pyc files on import; also PYTHONDONTWRITEBYTECODE=x
-c cmd : program passed in as string (terminates option list)
-d     : turn on parser debugging output (for experts only, only works on
         debug builds); also PYTHONDEBUG=x
-E     : ignore PYTHON* environment variables (such as PYTHONPATH)
-h     : print this help message and exit (also --help)
-i     : inspect interactively after running script; forces a prompt even
         if stdin does not appear to be a terminal; also PYTHONINSPECT=x

Move around files

Change working directory

Let us focus a bit more on changing working directory. The below table shows commands for changing working directory to

  • up one level
  • previous working directory
  • home directory
  • and root directory
Command Description
cd . Navigate into directory
cd .. Go up one level
cd - Go to previous working directory
cd ~ Change directory to home directory
cd / Change directory to root directory
cd ..
ls 
2022-10-27-build-website
2022-11-01-command-line-part-1
2022-11-03-command-line-part-2
2022-11-08-version-control-part-1
2022-11-10-version-control-part-2
2022-11-15-object-oriented-programming
2022-11-17-r-pkg-dev-part-1
2022-11-22-r-pkg-dev-part-2
2022-11-29-purrr-fun-programming
2022-12-01-pkgdown-pkg-website
2022-12-01-targets-proj-workflows
2022-12-06-ggplot2-adv
2022-12-08-r2d3-data-viz
2022-12-13-flexdashboard
2022-12-15-shinydashboard
2022-12-20-dealing-with-large-data
2022-12-20-parallel-computing
2022-12-20-profiling-r-code
2022-12-22-interacting-with-apis
2022-12-22-relational-databases
_metadata.yml

This is a list of top-level files in my folder containing all the files for this website.

cd ../..
ls
_freeze
_post_template.qmd
_quarto.yml
_site
data
images
index.qmd
jhustatprogramming2022.Rproj
lectures.qmd
posts
profile.jpg
projects
projects.qmd
resources.qmd
schedule.qmd
styles.css
syllabus.qmd

These are all the files in my home directory on my computer.

cd ~ 
ls 
Applications
Creative Cloud Files
Desktop
Documents
Downloads
Dropbox
Library
Movies
Music
Pictures
Public
miniforge3

List directory contents

ls will list the contents of a directory. Using different arguments, we can

  • list hidden files
  • view file permissions, ownership, size & modification date
  • sort by size & modification date
Command Description
ls List directory contents
ls -l List files one per line
ls -a List all files including hidden files
ls -la Display file permissions, ownership, size & modification date
ls -lh Long format list with size displayed in human readable format
ls -lS Long format list sorted by size
ls -ltr Long format list sorted by modification date

List files one per line

cd ../..
ls -l
total 232
drwxr-xr-x   7 stephaniehicks  staff    224 Oct 21 22:40 _freeze
-rw-r--r--@  1 stephaniehicks  staff    976 Oct 17 11:43 _post_template.qmd
-rw-r--r--   1 stephaniehicks  staff    827 Oct 23 11:13 _quarto.yml
drwxr-xr-x  16 stephaniehicks  staff    512 Nov  3 00:16 _site
drwxr-xr-x   6 stephaniehicks  staff    192 Oct 26 23:55 data
drwxr-xr-x  13 stephaniehicks  staff    416 Oct 29 00:49 images
-rw-r--r--   1 stephaniehicks  staff   2200 Oct 24 21:26 index.qmd
-rw-r--r--   1 stephaniehicks  staff    205 Nov  1 21:04 jhustatprogramming2022.Rproj
-rw-r--r--   1 stephaniehicks  staff    189 Aug 11 21:03 lectures.qmd
drwxr-xr-x  23 stephaniehicks  staff    736 Oct 20 21:01 posts
-rw-r--r--   1 stephaniehicks  staff  60521 Aug 11 21:03 profile.jpg
drwxr-xr-x   6 stephaniehicks  staff    192 Oct 21 21:17 projects
-rw-r--r--   1 stephaniehicks  staff    191 Oct 21 21:08 projects.qmd
-rw-r--r--   1 stephaniehicks  staff    501 Aug 11 21:03 resources.qmd
-rw-r--r--   1 stephaniehicks  staff   3288 Oct 27 18:19 schedule.qmd
-rw-r--r--   1 stephaniehicks  staff     17 Aug 11 21:03 styles.css
-rw-r--r--   1 stephaniehicks  staff  18755 Oct 24 10:54 syllabus.qmd

Hidden files

Next, let’s talk about hidden (or invisible) files. These are everywhere on modern operating systems.

When a programmer needs to have a file or folder, but does not want to show it to the user, they prefixes the file name with a single period (.). The operating system then hides this files from the user.

But now you can see these invisible files using the command line. Just use the -a flag (short for “all”) for the ls command to have it show you all the files that are there:

cd ../..
ls -a
.
..
.Rproj.user
.git
.github
.gitignore
.quarto
_freeze
_post_template.qmd
_quarto.yml
_site
data
images
index.qmd
jhustatprogramming2022.Rproj
lectures.qmd
posts
profile.jpg
projects
projects.qmd
resources.qmd
schedule.qmd
styles.css
syllabus.qmd

Yes, we have lots of hidden files and folders in our course repository: .git, .github, .gitignore, .quarto, etc.

These are normal files — you can move them, rename them, or open them like any other — they are just hidden by default.

Next, we can display file permissions, ownership, size & modification date

cd ../..
ls -la
total 240
drwxr-xr-x  24 stephaniehicks  staff    768 Nov  3 00:16 .
drwxr-xr-x@ 13 stephaniehicks  staff    416 Oct 16 09:32 ..
drwxr-xr-x   4 stephaniehicks  staff    128 Aug  9 21:31 .Rproj.user
drwxr-xr-x  14 stephaniehicks  staff    448 Nov  1 21:04 .git
drwxr-xr-x   3 stephaniehicks  staff     96 Aug 11 21:04 .github
-rw-r--r--   1 stephaniehicks  staff     58 Aug 11 21:03 .gitignore
drwxr-xr-x   8 stephaniehicks  staff    256 Nov  3 00:15 .quarto
drwxr-xr-x   7 stephaniehicks  staff    224 Oct 21 22:40 _freeze
-rw-r--r--@  1 stephaniehicks  staff    976 Oct 17 11:43 _post_template.qmd
-rw-r--r--   1 stephaniehicks  staff    827 Oct 23 11:13 _quarto.yml
drwxr-xr-x  16 stephaniehicks  staff    512 Nov  3 00:16 _site
drwxr-xr-x   6 stephaniehicks  staff    192 Oct 26 23:55 data
drwxr-xr-x  13 stephaniehicks  staff    416 Oct 29 00:49 images
-rw-r--r--   1 stephaniehicks  staff   2200 Oct 24 21:26 index.qmd
-rw-r--r--   1 stephaniehicks  staff    205 Nov  1 21:04 jhustatprogramming2022.Rproj
-rw-r--r--   1 stephaniehicks  staff    189 Aug 11 21:03 lectures.qmd
drwxr-xr-x  23 stephaniehicks  staff    736 Oct 20 21:01 posts
-rw-r--r--   1 stephaniehicks  staff  60521 Aug 11 21:03 profile.jpg
drwxr-xr-x   6 stephaniehicks  staff    192 Oct 21 21:17 projects
-rw-r--r--   1 stephaniehicks  staff    191 Oct 21 21:08 projects.qmd
-rw-r--r--   1 stephaniehicks  staff    501 Aug 11 21:03 resources.qmd
-rw-r--r--   1 stephaniehicks  staff   3288 Oct 27 18:19 schedule.qmd
-rw-r--r--   1 stephaniehicks  staff     17 Aug 11 21:03 styles.css
-rw-r--r--   1 stephaniehicks  staff  18755 Oct 24 10:54 syllabus.qmd

Display size in human readable format

cd ../..
ls -lh
total 232
drwxr-xr-x   7 stephaniehicks  staff   224B Oct 21 22:40 _freeze
-rw-r--r--@  1 stephaniehicks  staff   976B Oct 17 11:43 _post_template.qmd
-rw-r--r--   1 stephaniehicks  staff   827B Oct 23 11:13 _quarto.yml
drwxr-xr-x  16 stephaniehicks  staff   512B Nov  3 00:16 _site
drwxr-xr-x   6 stephaniehicks  staff   192B Oct 26 23:55 data
drwxr-xr-x  13 stephaniehicks  staff   416B Oct 29 00:49 images
-rw-r--r--   1 stephaniehicks  staff   2.1K Oct 24 21:26 index.qmd
-rw-r--r--   1 stephaniehicks  staff   205B Nov  1 21:04 jhustatprogramming2022.Rproj
-rw-r--r--   1 stephaniehicks  staff   189B Aug 11 21:03 lectures.qmd
drwxr-xr-x  23 stephaniehicks  staff   736B Oct 20 21:01 posts
-rw-r--r--   1 stephaniehicks  staff    59K Aug 11 21:03 profile.jpg
drwxr-xr-x   6 stephaniehicks  staff   192B Oct 21 21:17 projects
-rw-r--r--   1 stephaniehicks  staff   191B Oct 21 21:08 projects.qmd
-rw-r--r--   1 stephaniehicks  staff   501B Aug 11 21:03 resources.qmd
-rw-r--r--   1 stephaniehicks  staff   3.2K Oct 27 18:19 schedule.qmd
-rw-r--r--   1 stephaniehicks  staff    17B Aug 11 21:03 styles.css
-rw-r--r--   1 stephaniehicks  staff    18K Oct 24 10:54 syllabus.qmd

Wildcards

Wildcards are the use of asterisk (*) to allow any pattern to appear in part of a filename.

For example, to list all the .txt files in a folder (but only the .txt files), you can type:

ls *.txt
combined_names.txt
package_names.txt
r_release.txt
release_names.txt

Or if you wanted to see any file in the directory that has a “r” in it

ls *r*
index.rmarkdown
r_release.txt
release_names.txt

secret_directory:
team_standings.csv

soccer_directory:
team_standings.csv

This is an extremely powerful tool, and one you will likely use a lot.

Question

Let’s try to write the command to pattern match all files that start with the pattern “team”

### try it out

Create, copy, rename, delete files

In this section, we will explore commands for file management including:

  • create new file/change timestamps
  • copying files
  • renaming/moving files
  • deleting files
  • comparing files
Command Description R commands
touch Create empty file(s)/change timestamp file.create()
cp Copy files and folders file.copy()
mv Rename/move file file.rename()
rm Remove/delete file file.remove()
diff Compare files

Create new file

touch modifies file timestamps which is information associated with file modification. It can be any of the following:

  • access time (the last time the file was read)
  • modification time (the last time the contents of the file was changed)
  • change time (the last time the file’s metadata was changed)

If the file does not exist, it will create an empty file of the same name.

Example

Let us use touch to create a new file secret_analysis.R.

touch secret_analysis.R
ls
analysis.R
combined_names.txt
index.qmd
index.rmarkdown
package_names.txt
r_release.txt
release_names.txt
secret_analysis.R
secret_directory
soccer_directory
team_standings.csv
team_standings_3.csv

Copy files and folders

cp makes copies of files and directories.

Note

By default, it will overwrite files without prompting for confirmation so be cautious while copying files or folders.

Example

Let us create a copy of team_standings.csv file and name it as team_standings_2.csv in the same folder.

cp team_standings.csv team_standings_2.csv
ls
analysis.R
combined_names.txt
index.qmd
index.rmarkdown
package_names.txt
r_release.txt
release_names.txt
secret_analysis.R
secret_directory
soccer_directory
team_standings.csv
team_standings_2.csv
team_standings_3.csv

To copy folders, you use the -r option which refers to --recursive i.e. copy directories recursively.

cp -r secret_directory secret_directory_2
ls secret*
secret_analysis.R

secret_directory:
team_standings.csv

secret_directory_2:
team_standings.csv

Move and rename files

mv moves and renames files and directories. Using different options, we can ensure

  • files are not overwritten
  • user is prompted for confirmation before overwriting files
  • details of files being moved is displayed
Command Description
mv Move or rename files/directories
mv -f Do not prompt for confirmation before overwriting files
mv -i Prompt for confirmation before overwriting files
mv -n Do not overwrite existing files
mv -v Move files in verbose mode

Let us move/rename the team_standings_2.csv file to team_standings_3.csv in verbose mode.

mv -v team_standings_2.csv team_standings_3.csv
ls team*
team_standings_2.csv -> team_standings_3.csv
team_standings.csv
team_standings_3.csv

We see that there is no more file called team_standings_2.csv as it’s now been renamed!

remove/delete files

The rm command is used to delete/remove files & folders. Using additional options, we can

  • remove directories & sub-directories
  • forcibly remove directories
  • interactively remove multiple files
  • display information about files removed/deleted
Command Description
rm Remove files/directories
rm -r Recursively remove a directory & all its subdirectories
rm -rf Forcibly remove directory without prompting for confirmation or showing error messages
rm -i Interactively remove multiple files, with a prompt before every removal
rm -v Remove files in verbose mode, printing a message for each removed file

Let’s remove the secret_analysis.R file that we created earlier with the touch command.

rm secret_analysis.R
ls
analysis.R
combined_names.txt
index.qmd
index.rmarkdown
package_names.txt
r_release.txt
release_names.txt
secret_directory
secret_directory_2
soccer_directory
team_standings.csv
team_standings_3.csv

To remove a folder (and all of it’s contents), we need to use recursive deletion with -r

rm -r secret_directory_2
ls
analysis.R
combined_names.txt
index.qmd
index.rmarkdown
package_names.txt
r_release.txt
release_names.txt
secret_directory
soccer_directory
team_standings.csv
team_standings_3.csv

Input and output

In this section, we will explore commands that will

  • display messages
  • print file contents
  • sort file contents
  • count length of file
Command Description
echo Display messages
cat Print contents of a file
head Prints first ten lines of a file by default
tail Prints last ten lines of a file by default
more Open a file for interactive reading, scrolling and searching
less Open a file for interactive reading, scrolling and searching
sort Sort a file in ascending order
wc Count length (words or lines) in a file

Display messages

The echo command prints text to the terminal.

It can be used for writing or appending messages to a file as well.

Command Description
echo Display messages
echo -n Print message without trailing new line
echo > file Write message to a file
echo -e Enable interpretation of special characters

Let us start with a simple example. We will print the text “Funny-Looking Kid” to the terminal. It is the release name for R version 4.2.1.

echo Funny-looking Kid
Funny-looking Kid

If we wanted to redirect that output from printing to the terminal and write to a file, we use the redirection (>) operator.

echo Funny-looking Kid > r_release.txt
cat r_release.txt
Funny-looking Kid
Redirection operator

If we want to redirect that output from printing to the terminal and write to a file, we use the > operator like so (command > [file]) where on the left side is output gets piped into a file on the right side.

The PATH variable

An important feature of the command line is the PATH variable.

I won’t get into all the details about the PATH variable, but having a basic understanding will likely prove useful if you ever have to troubleshoot problems in the future.

  • Have you ever wondered how the command-line knows what to do when you type a command like python or ls?
  • How does it know what program to run, especially on a computer that might have multiple installations of a program like Python?

The answer is that your system has a list of folders stored in an “environment variable” called PATH.

When you run a command (like python or ls), it goes through those folders in order until it finds an executable file with the name of the command you typed.

Then, when it finds that file, it executes that program and stops looking.

You can see the value of the PATH variable on your computer by typing

echo $PATH
/opt/homebrew/Caskroom/miniforge/base/bin:/opt/homebrew/Caskroom/miniforge/base/condabin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/opt/X11/bin:/opt/homebrew/bin:/opt/homebrew/sbin:/Users/stephaniehicks/Applications/quarto/bin:/usr/texbin:/Applications/RStudio.app/Contents/MacOS/quarto/bin:/Applications/RStudio.app/Contents/MacOS

That means that when I type python, my computer will first look in the folder /opt/homebrew/Caskroom/miniforge/base/bin to see if there is a file named python it can run. If it can’t find one there, it moves on to to the next one.

Why is this useful

In a perfect world, you will never have to worry about your PATH variable, but there are a couple situations where knowing about your PATH variable can be helpful. In particular:

  1. If you downloaded a program, but you cannot run it from the command line, that probably means that its location is not in the PATH variable.
  2. If you find that when you type a command like python, the command line is not running the version of python you want it to run, that’s probably because a different version of python appears earlier in the PATH variable (since the command line will stop looking through these folders as soon as it finds a match).
Note

You can diagnose this problem by typing which COMMANDNAME, which will tell you the folder from which COMMANDNAME is being run.

which python
/opt/homebrew/Caskroom/miniforge/base/bin/python
which ls
/bin/ls

Sort files

The sort command will sort the contents of text file, line by line. Using additional options, we can

  • sort a file in ascending/descending order
  • ignore case while sorting
  • use numeric order for sorting
  • preserve only unique lines while sorting
Tip

Using the sort command, the contents can be sorted numerically and alphabetically. By default, the rules for sorting are:

  • lines starting with a number will appear before lines starting with a letter.
  • lines starting with a letter that appears earlier in the alphabet will appear before lines starting with a letter that appears later in the alphabet.
  • lines starting with a lowercase letter will appear before lines starting with the same letter in uppercase.

Using additional options, the rules for sorting can be changed. We list the options in the below table.

Command Description
sort Sort lines of text files
sort -r Sort a file in descending order
sort --ignore-case Ignore case while sorting
sort -n Use numeric order for sorting
sort -u Preserve only unique lines while sorting

Here we are sorting in a descending alphabetical order of the combined_names.txt

sort -r combined_names.txt
You Stupid Darkness
World-Famous Astronaut
Wooden Christmas Tree
Warm Puppy
Very, Very Secure Dishes
Very Secure Dishes
Unsuffered Consequences
Trick or Treat
Supposedly Educational
Spring Dance
Sock it to Me
Smooth Sidewalk
Single Candle
Sincere Pumpkin Patch
Short Summer
Security Blanket
Roasted Marshmallows
Pumpkin Helmet
Masked Marvel
Kite Eating Tree
Great Pumpkin
Good Sport
Gift-Getting Season
Funny-looking Kid
Full of Ingredients
Frisbee Sailing
Fire Safety
Easter Beagle
December Snowflakes
Bug in Your Hair
Another Canoe

Count length of file

wc (word count) will print newline, word, and byte counts for file(s).

wc combined_names.txt
      31      75     564 combined_names.txt
wc -l combined_names.txt
      31 combined_names.txt
wc -w combined_names.txt
      75 combined_names.txt
wc -c combined_names.txt
     564 combined_names.txt

If more than one file is specified, it will also print total line.

wc combined_names.txt package_names.txt
      31      75     564 combined_names.txt
     108     216    1498 package_names.txt
     139     291    2062 total

Search and regular expression

In this section, we will explore commands that will

  • search for a given string in a file
  • find files using names
  • search for binary executable files
Command Description
grep Search for a given string in a file
find Find files using filenames
which Search for binary executable files

Search for a string in a file

The grep command is used for pattern matching. Along with additional options, it can be used to

  • match pattern in input text
  • ignore case
  • search recursively for an exact string
  • print filename and line number for each match
  • invert match for excluding specific strings

grep (stands for global regular expression) processes text line by line, and prints any lines which match a specified pattern.

It is a powerful tool for matching a regular expression against text in a file, multiple files, or a stream of input.

Command Description
grep Matches pattern in input text
grep -i Ignore case
grep -RI Search recursively for an exact string
grep -E Use extended regular expression
grep -Hn Print file name & corresponding line number for each match
grep -v Invert match for excluding specific strings

First, we will search for packages that include the letter “R” in a list of R package names (package_names.txt).

grep R package_names.txt
14. RJDBC
30. logNormReg
27. gLRTH
35. fermicatsR
42. OptimaRegion
61. PropScrRand
25. RPyGeo
47. SMARTp
24. SCRT
56. MARSS
85. edfReader
32. SPEDInstabR
98. SmallCountRounding

If you are familiar with regular expressions, you can do cool things like search for a “r” followed by a white space with the \s character set for white spaces.

grep -i 'r\s' release_names.txt
December Snowflakes
Easter Beagle
Trick or Treat
Bug in Your Hair
Another Canoe
Short Summer

If there is more than one file to search, use the -H option to print the filename for each match.

grep -H F r_release.txt package_names.txt
r_release.txt:Funny-looking Kid
package_names.txt:69. FField
package_names.txt:78. sybilccFBA

And here is the file name and line number

grep -Hn F r_release.txt package_names.txt
r_release.txt:1:Funny-looking Kid
package_names.txt:82:69. FField
package_names.txt:93:78. sybilccFBA

And here we invert match for excluding the string “R”

grep -vi R r_release.txt package_names.txt
r_release.txt:Funny-looking Kid
package_names.txt:36. mlflow
package_names.txt:10. aweek
package_names.txt:31. BIGDAWG
package_names.txt:22. vqtl
package_names.txt:29. sspline
package_names.txt:39. mev
package_names.txt:66. SuppDists
package_names.txt:15. MIAmaxent
package_names.txt:31. BIGDAWG
package_names.txt:29. sspline
package_names.txt:60. Eagle
package_names.txt:83. WPKDE
package_names.txt:11. hdnom
package_names.txt:26. blink
package_names.txt:18. gazepath
package_names.txt:52. ClimMobTools
package_names.txt:44. expstudies
package_names.txt:65. mined
package_names.txt:81. mgcViz
package_names.txt:45. solitude
package_names.txt:9. pAnalysis
package_names.txt:65. mined
package_names.txt:94. ICAOD
package_names.txt:48. geoknife
package_names.txt:45. solitude
package_names.txt:67. tictactoe
package_names.txt:46. cbsem
package_names.txt:93. PathSelectMP
package_names.txt:96. poisbinom
package_names.txt:17. ASIP
package_names.txt:5. pls
package_names.txt:84. BIOMASS
package_names.txt:59. AdMit
package_names.txt:77. SetMethods
package_names.txt:53. MVB
package_names.txt:2. odk
package_names.txt:86. mongolite
package_names.txt:4. TIMP
package_names.txt:97. AnalyzeTS
package_names.txt:87. WGScan
package_names.txt:63. dagitty
package_names.txt:69. FField
package_names.txt:13. MaXact
package_names.txt:73. VineCopula
package_names.txt:7. bayesbio
package_names.txt:34. ibd
package_names.txt:8. MVTests
package_names.txt:19. mcmcabn
package_names.txt:43. accept
package_names.txt:78. sybilccFBA
package_names.txt:62. lue
package_names.txt:100. addhaz
package_names.txt:37. CombinePValue
package_names.txt:1. cyclocomp
package_names.txt:54. OxyBS

System info

In this section, we will explore commands that will allow us to

  • display information about the system
  • display file system disk space usage
  • exit the terminal
  • run commands a superuser
  • shutdown the system
Command Description
uname Display important information about the system
df Display file system disk space usage
exit Exit the terminal
sudo Run command as super user
shutdown Shutdown the system

For example, we can display the file system disk usage

df
Filesystem     512-blocks      Used  Available Capacity iused       ifree %iused  Mounted on
/dev/disk3s1s1 3896910480  17251896 3303851808     1%  348619  4292631783    0%   /
devfs                 396       396          0   100%     686           0  100%   /dev
/dev/disk3s6   3896910480        40 3303851808     1%       0 16519259040    0%   /System/Volumes/VM
/dev/disk3s2   3896910480   9161032 3303851808     1%     905 16519259040    0%   /System/Volumes/Preboot
/dev/disk3s4   3896910480     22024 3303851808     1%      45 16519259040    0%   /System/Volumes/Update
/dev/disk1s2      1024000     12328     985144     2%       1     4925720    0%   /System/Volumes/xarts
/dev/disk1s1      1024000     12504     985144     2%      27     4925720    0%   /System/Volumes/iSCPreboot
/dev/disk1s3      1024000      4304     985144     1%      52     4925720    0%   /System/Volumes/Hardware
/dev/disk3s5   3896910480 564544896 3303851808    15% 1585826 16519259040    0%   /System/Volumes/Data
map auto_home           0         0          0   100%       0           0  100%   /System/Volumes/Data/home

R in the shell

In this section, we will learn to execute R commands and scripts in the command line using:

  • R -e
  • Rscript -e
  • R CMD BATCH

The -e option allows us to specify R expression(s).

R -e will launch R and then execute the code specified within quotes.

  • Use semi-colon to execute multiple expressions as shown below.
  • You will be able to run the below commands only if you are able to launch R from the command line. (Demo this).
  • Windows users need to ensure that R is added to the path environment.
R -e "head(mtcars); tail(mtcars)"

R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin21.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> head(mtcars); tail(mtcars)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
                mpg cyl  disp  hp drat    wt qsec vs am gear carb
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2
> 
> 

Rscript -e will run code without launching R.

Rscript -e "head(mtcars); tail(mtcars)"
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
                mpg cyl  disp  hp drat    wt qsec vs am gear carb
Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.7  0  1    5    2
Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.9  1  1    5    2
Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.5  0  1    5    4
Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.5  0  1    5    6
Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.6  0  1    5    8
Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.6  1  1    4    2

We can use Rscript to execute a R script as well. In the below example, we execute the code in analysis.R file (which just asks to print the head of mtcars).

cat analysis.R
head(mtcars)
Rscript analysis.R
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Post-lecture materials

Summary

  • Shell is a text based application for viewing, handling and manipulating files
  • It is also known by the following names
    • CLI (Command Line Interface)
    • Terminal
    • Bash (Bourne Again Shell)
  • Use Rscript -e or R -e to execute R scripts from the command line
  • RStudio includes a Terminal (from version 1.1.383)
  • Execute commands from shell script in RStudio using Ctrl + Enter
  • RMarkdown and Quarto supports bash, sh and awk

Final Questions

Here are some post-lecture questions to help you think about the material discussed.

Questions
  1. Explore the help files of tar and gzip commands for compressing files.
  2. Move around the computer, get used to moving in and out of directories, see how different file types appear in the Unix shell. Be sure to use the pwd and cd commands, and the different flags for the ls commands.
  3. Practice using “Tab for Auto-complete” in the shell to autocomplete commands or file names.
  4. Practice your command line knowledge with Command Challenge.

Additional Resources