Parsing CSV Files with Python's DictReader

Parsing CSV Files with Python's DictReader

I had an interview today (spoiler: I didn't get an offer), and one of the rounds of my interview involved refactoring some poorly written Python code.

In this code was a function that parsed a CSV file and returned all the rows with columns matching some arbitrary input values. The code used a csv.reader object to parse the input file and looped through it several times to search the columns.

I was allowed to look up documentation and use StackOverflow during this interview, which led me to discovering the csv.DictReader class. This class was key to my refactor of the code.

DictReader

The DictReader class basically creates a CSV object that behaves like a Python OrderedDict. It works by reading in the first line of the CSV and using each comma separated value in this line as a dictionary key. The columns in each subsequent row then behave like dictionary values and can be accessed with the appropriate key.

If the first row of your CSV does not contain your column names, you can pass a fieldnames parameter into the DictReader's constructor to assign the dictionary keys manually.

Example

Let's say you have a CSV file called videos.csv with some basic information about videos. It looks something like this:

id,title,total_plays,total_comments,total_likes
200019,Drift Day,205,10,6
200088,Block Party,751,0,10
200108,Charlotte Gainsbourg,1964,2,23
200111,5 Memorial Weekend Vignettes,689,24,21
200197,Best Day Ever,119,3,8

You'll notice the first row in this file is not a video but the column headers for rest of the file. This is typical of most CSV files. In order to read this file in as a dictionary, you would use code similar to the following:

import csv
with open('videos.csv') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print(row['id'])

Running the code above will loop through and print the id field of each row of the file. You can replace 'id' with any of the other column headers such as 'title' or 'total_plays' for similar results.

Conclusions

So why is this useful? Well I personally find using strings as keys much more readable/understandable than working with indexes. print(row['total_likes']) tells you a lot more about what is being displayed than print(row[4]) in the event that the CSV file you are working with is not right next to you.

I hope this post was able to show you something new about Python. As always, thanks for reading and feel free to follow me on Twitter @brodan_ to keep up with future blog posts.