A document from MCS 275 Spring 2021, instructor David Dumas. You can also get the notebook file.

Quiz 9 Solutions

MCS 275 Spring 2021 - David Dumas

Instructions:

Deadline

This quiz must be submitted in Gradescope by 12:00pm CST on Tuesday, March 16, 2021.

Topic

The course topics corresponding to this quiz are tree traversals, set, defaultdict, CSV, and JSON.

Resources you are allowed to consult

Quizzes are INDIVIDUAL, closed book, and only allow access to specified resources. For this quiz you can access:

Point distribution

There are two problems on this quiz, numbered 2 and 3. The point breakdown is:

Points Item
3 autograder
4 problem 2
4 problem 3
11 total

Problem 2: CSV missing value detector

Write a program that takes one command line argument, which is the filename of an existing CSV file that has a header row. The program should read the CSV file and determine whether any of the values in its rows are equal to the empty string. For each such "missing value" found, the program should print a message in this format:

Missing value in column Quiz9Score on line 58.

The line numbers should be 1-based, so line 1 refers to the header row (which you can assume does not have any missing values), line 2 refers to the first row of actual data, etc.

For example, if the input CSV file is

Moon,Planet,YearDiscovered
Phobos,Mars,1877
Callisto,Jupiter,1610
,Saturn,2019
Eros,"",

then the expected output is

Missing value in column Moon on line 4.
Missing value in column Planet on line 5.
Missing value in column YearDiscovered on line 5.

Upload this program as quiz9prob2.py.

In [ ]:
# MCS 275 Quiz 9 Prob 2
# Jennifer Vaccaro
# I completed this work myself, in accordance with the rules in the syllabus.
"""Program that finds the missing fields in a csv defined as a command line argument"""

import sys
import csv

# Open the file and create a csv DictReader object
fname = sys.argv[1]
with open(fname, "rt", newline="") as f:
    rdr = csv.DictReader(f)
    fields = rdr.fieldnames

    # Iterate through the data rows in the csv file
    for i,row in enumerate(rdr):
        for field in fields:
            # If the value for a field is missing, then print a message
            if row[field] == "":
                print("Missing data in row {} and column {}".format(i+2,field)) 
                # i+2 gives the correct row number

Problem 3: Indecomposables

Suppose S is a set of strings. Let's say that a string in S is decomposable if it can be written as A+B where A and B are both in S.

For example, in the set {"racecar","carrot","car","race","rot","cog","c","a","r"}, the string "racecar" is decomposabe because it can be written as "race"+"car" and "carrot" is decomposable because it can be written as "car"+"rot", but none of the other elements of the set are decomposable.

Write a function indecomposables(S) which accepts one argument, a set object S whose elements are strings, and which returns the set of all elements of S that are not decomposable.

Use only sets in your function; do not use lists, tuples, or dictionaries. Also, calling the function should not modify S.

Put this function in a file quiz9prob3.py and upload it.

Solution 1: Cutting

Try all the ways of cutting an element into pieces. If any of them give A,B that are already in S, we know that element is decomposable. Keep elements that are not found to be decomposable.

In [9]:
#MCS 275 Quiz 9 Problem 3
# Jennifer Vaccaro
# I completed this work myself, in accordance with the rules in the syllabus.

def is_decomposable(S, x):
    """Given a set S and an element x of S, returns true is x is decomposable in S"""
    # Generate all of the substrings such that A+B = x
    for i in range(len(x)): # Forbidden construction, but clearest way in this case.
        # If both A and B in S, then x is decomposable
        if x[:i] in S and x[i:] in S:
            return True
    # If no such pairs A+B=x are both in S, then x indecomposable
    return False

def indecomposables(S):
    """Given a set S, returns the subset of S that is NOT decomposable"""
    # Create a new empty set
    S_indecom = set()
    for x in S:
        # If x is not decomposable, then add it to the new set
        if not is_decomposable(S,x):
            S_indecom.add(x)
    return S_indecom

Solution 2: Joining

This alternative method makes a copy of S, and then looks at each string of the form A+B for A,B in S. Each such sum is removed from the copy, if present. Then the remaining elements are known to be indecomposable.

In [10]:
#MCS 275 Quiz 9 Problem 3
# David Dumas

def indecomposables2(S):
    """Return the set of elements in S that are not expressible as a+b for a,b in S"""
    T = set(S) # creates a copy of S; 
               # ( set() accepts any iterable and creates a new set from it. )
    for a in S:
        for b in S:
            T.discard(a+b)
    return T

Comparing the solutions

Since we've shown two rather different ways to solve the problem---cutting vs joining---why might you choose one or the other?

This isn't part of the quiz, but let's do a timing test to see how they behave in two extreme cases:

  1. S contains many elements, all short strings
  2. S contains a few elements, all very long
In [24]:
import random
import time

def rand_word(n):
    """Make a random string of length n from letters a-z"""
    return "".join(random.choices("abcdefghijklmnopqrstuvwxyz",k=n))

def rand_short_word():
    """Random short string"""
    return rand_word(10)

def rand_long_word():
    """Random long string"""
    return rand_word(100_000)

print("Generating the test data...")

S_short_many = set()
for _ in range(10_000):
    S_short_many.add(rand_short_word())
    
# Now let's make sure it has at least one decomposable in it
# (not strictly needed, but why not make sure the function
# `indecomposables` actually removes something!)
x = S_short_many.pop()
y = S_short_many.pop()
S_short_many.add(x+y)
S_short_many.add(x)
S_short_many.add(y)

S_long_few = set()
for _ in range(20):
    S_long_few.add(rand_long_word())

# Now let's make sure it has at least one decomposable in it
x = S_long_few.pop()
y = S_long_few.pop()
S_long_few.add(x+y)
S_long_few.add(x)
S_long_few.add(y)


t_start = time.time()
T = indecomposables(S_short_many)
t = time.time() - t_start
print("cutting-based, many short strings: {:.3f} s".format(t))

t_start = time.time()
T = indecomposables2(S_short_many)
t = time.time() - t_start
print("joining-based, many short strings: {:.3f} s".format(t))

t_start = time.time()
T = indecomposables(S_long_few)
t = time.time() - t_start
print("cutting-based, few long strings: {:.3f} s".format(t))

t_start = time.time()
T = indecomposables2(S_long_few)
t = time.time() - t_start
print("joining-based, few long strings: {:.3f} s".format(t))
Generating the test data...
cutting-based, many short strings: 0.011 s
joining-based, many short strings: 11.222 s
cutting-based, few long strings: 43.461 s
joining-based, few long strings: 0.039 s

Each solution is fast in one case, and slow in the other!

When there are many short strings (say, 10,000 of them of length 10), the joining-based approach is extremely slow because it looks at every possible sum A+B (of which there are 100,000,000). The cutting-based approach only needs to consider the 9 ways to split each of the 10,000 words.

When there are just a few long strings (say, 20 of them, each of length 100,000) the comparison is reversed. The joining-based approach only looks at the 20x20=400 ways you can add two elements of the set. But the cutting-based approach needs to consider all 99,999 ways to split each of the 20 elements.

Epilogue: A slicker way to write the joining solution

There is a shorter but equivalent way to write the joining-based solution. You wouldn't be expected to come up with this one, as it uses a feature we didn't discuss in MCS 275, but I wanted to show it as a taste of a nice feature you might want to learn about later. Python's generator expressions provide anonymous lazy iterables that can be used to avoid the nested for loops and explicit copying of S in the previous solution.

In [11]:
#MCS 275 Quiz 9 Problem 3
# David Dumas

def indecomposables2slick(S):
    """Return the set of elements in S that are not expressible as a+b for a,b in S"""
    return S.difference( a+b for a in S for b in S )