Lecture 30

The CSV module

MCS 260 Fall 2020
David Dumas

Reminders

  • Project 3 is due Friday
  • Quiz 10 due Today
  • Worksheet 11 available
  • Nov 3: No discussions
  • Nov 5: Discussion converted to TA office hours
  • CSV

    CSV is a format for storing tabular data in a text file. It is often used to export data from spreadsheets.

    Basic format: Each line contains some values, separated by commas. There is no universally accepted way to indicate types (e.g. string vs int).

    
            fullname,midterm,final,hwk_avg
            Maureen Singh,82.0,91.5,94.0
            Yousuf Shaw,78.0,89.0,96.5
        

    Often, the first line contains column headers.

    Reading CSV

    The function csv.reader(f) takes a file object f and returns an iterable that yields a tuple for each row.

    csv.reader will also accept an iterable of strings instead of a file object.

    Common annoyance: csv.reader forces you to remember the number of the column you want.

    Note: Should always pass newline="" to open() when opening to read/write CSV.

    Reading CSV (cont'd)

    If the CSV file has has headers, a better option is csv.DictReader(f) which yields rows as dictionaries, using column headers as keys.

    csv.DictReader does not return the header row.

    Writing CSV

    csv.writer(f) takes a file object and returns a writer object, which has a useful method:

    • .writerow(L) — Write the items in iterable L to a row in the file. Must be in the proper order!

    Note: Should always pass newline="" to open() when opening to read/write CSV.

    Writing CSV (cont'd)

    csv.DictWriter(f,fieldnames=L) specifies an iterable L of field names, and returns a writer object that expects rows as dictionaries. Useful methods:

    • .writeheader() — Write the field names to a header row.
    • .writerow(d) — Write the values from dictionary d to a line of the output file (but only the ones corresponding to keys that are field names).

    Other features

    The CSV reader and writer functions can use a separator other than a comma, e.g. specify delimiter="\t" to read or write tab separated values (TSV).

    Some CSV files put values in quotes so that the separator character can appear in the value, e.g.

    
    fullname,occupation
    Octavia Spencer,"actor,author"
    "Bond, James Bond","spy"
    

    The csv module supports this convention.

    References

    Revision history

    • 2020-11-03 Add note about newline="" when opening to read/write CSV.
    • 2020-11-02 Add data portal links
    • 2020-11-01 Initial publication