Lecture 23

CSV and JSON

MCS 275 Spring 2021
David Dumas

Lecture 23: CSV and JSON

Course bulletins:

  • Worksheet solutions coming soon.
  • Project 3 pitch in Monday's lecture.

Notebook

The (small) notebook of sample code from this lecture is here.

Install Pillow

Next week: Manipulating images with the Python package Pillow. To prepare, please

python3 -m pip install pillow

Or substitute the correct interpreter name for your platform.

If you have trouble, check the install instructions and let us know if you don't find a solution there.

Modules

Python has a number of built-in modules to support reading and writing special file formats. We'll cover two of these today:

  • csv for Comma Separated Value files
  • json for Javascript Object Notation files

CSV

Text-based format for tabular data. Fundamentally based on rows and columns.

Used for exchanging data with spreadsheet and database programs.

Untyped. Up to reader to figure out string/float/int/etc.


        District,Fin-Sub,Chrgbl Fin No,PO Name,Unit Name,Property Address,County,City,ST,ZIP Code,Property Status,Ownership,FDB ID (All),AMS Locale Key (All),FDB Facility Type (All),FDB Facility Subtype (All),Building Ownership Description,Land Desc,Space Certified Indicator,Bldg Occu Date,Int Sq Ft
        Greater Boston,431120-G01,431120,BARRINGTON,MAIN OFFICE,200 MIDDLE HWY,BRISTOL,BARRINGTON,RI,02806-9998,Active,Owned,1354095,V25837,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1974,"6,769"
        Greater Boston,432360-G01,432360,COVENTRY,MAIN OFFICE,1550 NOOSENECK HILL RD,KENT,COVENTRY,RI,02816-9998,Active,Owned,1359450,V25859,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,7/1/1990,"13,764"
        Greater Boston,434480-G01,434480,HARRISVILLE,MAIN OFFICE,131 HARRISVILLE MAIN ST,PROVIDENCE,HARRISVILLE,RI,02830-9998,Active,Owned,1366338,V25891,Post Office,Administrative Post Office (APO),"USPS Building, Transferred","USPS Land, Transferred",No,12/1/1951,"1,413"
        Greater Boston,436020-G01,436020,NEWPORT,MAIN OFFICE,320 THAMES ST STE 1,NEWPORT,NEWPORT,RI,02840-9998,Active,Owned,1375017,V25919,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,8/1/1917,"35,637"
        Greater Boston,436090-G02,436090,NORTH KINGSTOWN,MAIN OFFICE,7715 POST RD,WASHINGTON,NORTH KINGSTOWN,RI,02852-9998,Active,Owned,1375354,V25921,Post Office,Administrative Post Office (APO),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,10/1/1997,"32,629"
        Greater Boston,436580-G02,436580,PASCOAG,MAIN OFFICE,35 BRIDGE WAY,PROVIDENCE,PASCOAG,RI,02859-3132,Active,Owned,1376912,V25928,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",No,6/1/1953,"3,290"
        Greater Boston,436723-G01,436723,PAWTUCKET,CUMBERLAND BR.,2055 DIAMOND HILL RD,PROVIDENCE,CUMBERLAND,RI,02864-9998,Active,Owned,1434572,V25862,Post Office,Branch,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,3/1/1985,"9,452"
        Greater Boston,436720-G03,436720,PAWTUCKET,DARLINGTON,30 MONTICELLO RD,PROVIDENCE,PAWTUCKET,RI,02861-3810,Active,Owned,1360314,V25864,Post Office,Station,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,,"16,137"
        Greater Boston,436720-G01,436720,PAWTUCKET,MAIN OFFICE,40 MONTGOMERY ST,PROVIDENCE,PAWTUCKET,RI,02860-9998,Active,Owned,1377003,V25929,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,9/1/1933,"49,535"
        Greater Boston,436720-G01,436720,PAWTUCKET,MAIN OFFICE,40 MONTGOMERY ST,PROVIDENCE,PAWTUCKET,RI,02860-9998,Active,Owned,1434718,V28273,Post Office,Finance Station - No Delivery,"USPS Building, Transferred","USPS Land, Transferred",Yes,9/1/1933,"49,535"
        Greater Boston,436860-G01,436860,PORTSMOUTH,MAIN OFFICE,95 CHASE RD,NEWPORT,PORTSMOUTH,RI,02871-9998,Active,Owned,1378183,V25933,Post Office,Main Post Office,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,8/1/1990,"12,048"
        Greater Boston,437140-G07,437140,PROVIDENCE,CORLISS PK. STA & VMF,55 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-9722,Active,Owned,1434581,V25907,Post Office,Carrier Annex (ANX),Unknown,Unknown,Yes,7/1/1980,"39,677"
        Greater Boston,437140-G07,437140,PROVIDENCE,CORLISS PK. STA & VMF,55 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-9722,Active,Owned,1434685,V27719,Vehicle Maintenance,Vehicle Maintenance Facility (VMF),Unknown,Unknown,Yes,7/1/1980,"39,677"
        Greater Boston,437178-G01,437178,PROVIDENCE,EAST PROVIDENCE BR.,17 GROVE AVE,PROVIDENCE,EAST PROVIDENCE,RI,02914-4506,Active,Owned,1434641,V27459,Post Office,Finance Branch,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,6/1/1971,"6,085"
        Greater Boston,437166-G01,437166,PROVIDENCE,JOHNSTON BRANCH,1530 ATWOOD AVE,PROVIDENCE,JOHNSTON,RI,02919-9998,Active,Owned,1434577,V25901,Post Office,Branch,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,,"17,034"
        Greater Boston,437170-G01,437170,PROVIDENCE,OLNEYVILLE STA,100 HARTFORD AVE,PROVIDENCE,PROVIDENCE,RI,02909-9998,Active,Owned,1376101,V25927,Post Office,Station,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1967,"10,733"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1434586,V25936,Post Office,Main Post Office,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1441076,V25935,Mail Processing,Processing and Distribution Center/Facility (PDC/PDF),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,437141-G08,437141,PROVIDENCE,P&DC,24 CORLISS ST RM 100,PROVIDENCE,PROVIDENCE,RI,02904-2477,Active,Owned,1444535,8938,Administrative Office,Postal Inspection Service Field Division (USPIS),"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",No,1/1/1961,"384,113"
        Greater Boston,438260-G07,438260,WAKEFIELD,MAIN OFFICE,551 KINGSTOWN RD,WASHINGTON,WAKEFIELD,RI,02879-9998,Active,Owned,1386177,V25967,Post Office,Main Post Office,"USPS Building, Not Prev. Leased","USPS Land, Not Prev. Leased",Yes,11/1/1997,"31,358"
        Greater Boston,438260-G01,438260,WAKEFIELD,NARRAGANSETT BR.,15 MEMORIAL SQ,WASHINGTON,NARRAGANSETT,RI,02882-3391,Active,Owned,1434584,V25916,Post Office,Finance Branch,"USPS Building, Transferred","USPS Land, Transferred",Yes,6/1/1916,"6,408"
        Greater Boston,438540-G01,438540,WARREN,MAIN OFFICE,53 CHILD ST,BRISTOL,WARREN,RI,02885-9998,Active,Owned,1386428,V25968,Post Office,Main Post Office,"USPS Building, Transferred","USPS Land, Transferred",Yes,8/1/1931,"9,484"
        Greater Boston,438680-G01,438680,WARWICK,APPONAUG STATION,3205 POST RD,KENT,WARWICK,RI,02886-7157,Active,Owned,1353208,V27496,Post Office,Finance Station,"USPS Building, Transferred","USPS Land, Transferred",Yes,10/1/1941,"6,097"
        Greater Boston,438680-G03,438680,WARWICK,POST OFFICE ANNEX-STORAGE,3205 POST RD REAR,KENT,WARWICK,RI,02886-7141,Active,Owned,,,,,"USPS Building, Prev. Leased","USPS Land, Prev. Leased",Yes,10/1/1957,"10,131"
        Greater Boston,439380-G02,439380,WEST WARWICK,MAIN OFFICE,100 WASHINGTON ST,KENT,WEST WARWICK,RI,02893-9998,Active,Owned,1386937,V25977,Post Office,Main Post Office,"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,,"17,482"
        Greater Boston,439100-G01,439100,WESTERLY,HIGH STREET STATION,5 HIGH ST,WASHINGTON,WESTERLY,RI,02891-1878,Active,Owned,1366914,V25892,Post Office,Station,"USPS Building, Transferred","USPS Land, Transferred",No,6/1/1914,"8,448"
        Greater Boston,439100-G02,439100,WESTERLY,MAIN OFFICE,110 TOM HARVEY RD,WASHINGTON,WESTERLY,RI,02891-9998,Active,Owned,1387262,V25978,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",Yes,12/1/1988,"14,236"
        Greater Boston,439800-G02,439800,WOONSOCKET,MAIN OFFICE,127 SOCIAL ST STE 1,PROVIDENCE,WOONSOCKET,RI,02895-9998,Active,Owned,1388287,V25983,Post Office,Administrative Post Office (APO),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",No,3/1/1976,"34,476"
        Greater Boston,439800-G02,439800,WOONSOCKET,MAIN OFFICE,127 SOCIAL ST STE 1,PROVIDENCE,WOONSOCKET,RI,02895-9998,Active,Owned,1448921,11464,Administrative Office,Office of the Inspector General Office (OIG),"USPS Building, Const. by USPS","USPS Land, Not Prev. Leased",No,3/1/1976,"34,476"
    

Source: USPS

Key CSV features

  • May or may not have header row
  • Quotes used around field values that may contain commas.

Reading CSV


        rdr = csv.reader(fobj)
        for row in rdr:  # reader objects are iterable
            print("First column of this row:",row[0])
            print("Second column of this row:",row[1])
    

Note: Should always pass newline="" to open() when opening to read/write CSV.

Reading CSV


        rdr = csv.DictReader(fobj) # file MUST have header row
        for row in rdr:  # rows will be dicts
            print(row["name"])
            print(row["project2_score"])
    

Note: Should always pass newline="" to open() when opening to read/write CSV.

Writing CSV


            w = csv.writer(fobj)
            # Write a header row
            w.writerow(["course","instructor"])
            # Write data rows
            w.writerow(["MCS 260","Dumas"])
            w.writerow(["MCS 275","Dumas"])
        

Disadvantage: Easy to get the order of columns wrong, or make index mistakes.

Writing CSV

        
            # Set the column order
            w = csv.DictWriter(fobj, fieldnames=["course","instructor"])
            # Write the header row
            w.writeheader()
            # Write data rows
            w.writerow({"instructor":"Dumas","course":"MATH 445"})
            w.writerow({"course":"MCS 481"})
        

More verbose code, but easier to read and maintain. Data order need not match column order. Missing keys handled gracefully.

JSON

JSON stands for JavaScript object notation. It is a text-based format for hierarchical data. Has types:

  • string — must use double quotes.
  • number — float, int, other? Up to reader.
  • boolean — lower case names true, false.
  • null — like Python None.
  • array — like Python list. Brackets and commas.
  • object — like Python dict. Curly braces, colons, and commas. Keys must be strings.

    {
        "date": "2020-08-31T16:29:04.122000",
        "id": "LANDSAT/LC08/C01/T1_SR/LC08_022031_20200831",
        "resource": {
          "dataset": "LANDSAT/LC08/C01/T1_SR",
          "planet": "earth"
        },
        "service_version": "v5000",
        "url": "https://earthengine.googleapis.com/v1alpha/projects/earthengine-legacy/thumbnails/20ad17a51405caa2fd00e6e5ca951b97-3265d85bb795aad645d4e937d4ac556f:getPixels"
    }

Source: NASA

Key JSON features

  • Does not require data to be tabular.
  • Has excellent standardization and cross-language support.
  • Most HTTP APIs (e.g. data portals) return JSON.
  • Semi-readable for humans.

Reading JSON


        val = json.load(fobj) # read from file
        val = json.loads(s)   # read from string
    

The object returned can be hard to use if you don't have documentation for the layout of the file. But since it has keys and values, it is at least explorable.

Writing JSON


        val = { "date": "yesterday", 
                "primes": [2,3,5,7,11],
                "awesome": True
              }
        json.dump(val,fobj) # save exactly one object to file
        s = json.dumps(val) # make JSON string
    

Conversion table for Python → JSON

  • dictobject
  • list or tuplearray
  • int or floatnumber
  • boolboolean
  • Nonenull

References

Revision history

  • 2021-03-11 Notebook link
  • 2021-03-05 Initial publication