Working with Data in Python
To read or write to a file in the current working directory, you use the function open()
to “open” the file.
fin = open('myfile.txt')
With the mode w
as a second parameters, you can write to the file.
fout = open('myoutput.txt', 'w')
Note: If the file already exists, using open()
will clear out everything in the file before writing to it. If the file doesn’t exist, the function will create a new file.
Reading and writing files
As you have seen, everything is an object in Python. When you define files such as fin
and fout
above, these are file objects have an associated set of methods.
Reading files
fin = open('myfile.txt') fin
Read in files one line at a time
If you want to read in a file one line at a time, use the methods function readline()
. The syntax is the name of the file, dot (or period) and readline()
. It will read in the characters from a file until it reaches a new line and returns the result as a string. To read in the next line, run the line again.
fin = open('myfile.txt') fin.readline()
You can also combine these functions
open('myfile.txt').readline()
Current line position in file
The method tell()
reports the current position of the open file. When you open a file, the current position is the beginning of the file (position 0).
fin.tell()
Read in file at a specific position
To start reading the file at a different position than the current position, use the seek()
method. The second argument describes how to interpret the first argument (number of bytes) .
Position | Meaning |
---|---|
0 |
move to absolute position (from start of file) |
1 |
move to a relative position (from current position) |
2 |
move to a position relative to the end of the file. |
fin.seek(5, 0)
If you want read in each line of a file and print each line to the screen, you can use a for
loop:
fin = open('myfile.txt') for line in fin: word = line.strip() print word
Keyboard input
To accept input from the keyboard, use the keyword raw_input
(in Python 2) or input
(in Python 3).
raw_input("How tall are you?")
Writing to files
To write to a file in Python, the argument must be a string. If the argument is not a string, use the function str()
to convert the argument to a string.
The methods functions write()
and close()
will write data to a file and close the file when finished writing to it. The syntax is the name of the file, a dot (or period), and the function write()
.
fin.close() fin.closed
The method closed
will test if the file has been closed returning a True or False.
Writing to files one line at a time
To write to a file one line at a time, just recall write
again and again.
fout.write("This is my first line.") fout.write("This is my second line.") fout.close()
The last line tell Python you are done writing to the file, so it knows to close the file. Once a file has been closed
with and as
There is a short cut to opening and closing files using the with
and as
keywords. The syntax is
with open("file.txt", "w") as textfile: textfile.write("Success!") # Read or write to the file
Paths and directories
To print the current working directory, use the function getcwd()
from the module os
.
import os cwd = os.getcwd() print cwd
To find the absolute path to a file use path.abspath()
:
os.path.abspath('myfile.text')
To check if a file exists, use path.exists()
:
os.path.exists('myfile.text')
To split the file and return a tuple containing the path and file name, use path.split()
. To split the file and return a tuple containing the file name and the file extension, use path.splitext()
.
(pathdir, nameofile) = os.path.split('~/pathtofile/myfile.txt') (nameofile, extension) = os.path.splitext('myfile.text')
To join a path and the name of a file, use path.join()
os.path.join("nameofnewpath", "myfile.txt")
To check if the argument is a file or a directory, use path.isfile()
and path.isdir()
:
os.path.isdir('myfile.text') os.path.isfile('myfile.text')
To list the files in a directory use path.listdir()
:
os.path.listdir(cwd)
To loop through the files in a directory, combine a for
loop with os.listdir()
:
[os.path.join("~/nameofnewpath", elem) for elem in os.listdir(cwd)]
Errors
- If a module doesn’t exist and you try to import it, Python will raise an
ImportError
exception. - If a file doesn’t exist, if you don’t have permission to access a file or if you try to open a directory to read in, you will get an
IOError
exception.- Tip: use functions like
os.path.exists()
andos.path.isfile()
to test before reading in a file - Tip: use the
try
andexcept
statements (similar to anif
statement).
- Tip: use functions like
try: fin = open('myfile.txt') except IOError: print "This file does not exist." print "File opened."
The line except IOError
catches when the file does not exist and executes your own code.
The glob module and listing files in directories
The glob module is useful when listing directories, but you do not have the name of the files (just a wildcard e.g. *.txt). The glob()
function will match all the *.txt files in a directory and return the them as a list
import glob glob.glob('*.txt')
Databases
A database is a specific type of file with a purpose of storing data. For example, if you have created a dictionary with keys and values, you can create and update the dictionary database file.
anydbm, pickle and shelve
The anydbm
module is interfaces and updates database files. The two methods functions to open and close the databases anydbm.open()
and anydbmclose()
. To work with this module, the database must have keys and values as strings!
To open a database (and create it if it doesn’t already exist):
import anydbm db = anydbm.open('mydatabase.db', 'c')
where the ‘c’ means to create the database if it doesn’t exist. While interfacing with the database, if you add a key-value pair (i.e. item), then the module anydbm
will update the database file. After you are finished with the database, close it with the methods function close()
.
db.close()
Pickling a database
The module anydbm
ONLY works with databases that have keys and values as strings. If your keys or values are not strings, you can use the pickle
module to translate any object into a string which can be then stored in a database using the pickle.dumps()
function. When you want to access the database, the module will translate the string back into the original objects using the pickle.loads()
function.
import pickle
The methods function pickle.dumps()
takes in any object and returns the string representation and pickle.loads()
will translate the string object back to the original representation:
x = tuple('yolo') xstring = pickle.dumps(x) xoriginal = pickle.loads(xstring) print xoriginal
Note: pickling and unpickling the object is the same copying the object. It will have the same value, but it not the same object.
Shelve module
This method of “pickling” the objects from the pickle
module combined with the anydbm
module has been implemented in the shelve
module. To open and close files using this module use shelve.open()
and shelve.close()
which uses the pickle
module to open and store the objects.
import shelve shelve.open('mydatabase.db')
To interact with this database, use the shelve()
function
newVar = shelve['mykey'] shelve['mykey'] = newVar shelve.close('mydatabase.db')
glob and listing files in directories
The glob module is useful when listing directories, but you do not have the name of the files (just a wildcard e.g. *.txt). The glob()
function will match all the *.txt files in a directory and return the them as a list
import glob glob.glob('*.txt')