A document from MCS 275 Spring 2021, instructor David Dumas. You can also get the notebook file.

Project 3

MCS 275 Spring 2021 - David Dumas

Instructions:

Deadline

This project must be submitted to Gradescope by 6:00pm CDT on Friday, March 19, 2021.

Collaboration policy & academic honesty

This project must be completed individually. Seeking or giving aid on this assignment is prohibited; doing so constitutes academic misconduct which can have serious consequences. The only resources you are allowed to consult are the ones listed below. It is very important to follow these rules. If you are unsure about whether something is allowed, ask. The course syllabus contains more information about the course and university policies regarding academic honesty.

Resources you are allowed to consult

  • All documents and videos posted to the course web page (including lecture slides, lecture videos, the Python tour)
  • Textbooks listed in the "textbooks" section of the course home page.

What to do if you're stuck

Contact the instructor or TA by email, or visit office hours.

One-line summary of the project

You'll make a module that implements a new tree-based data structure, and use that module to improve an existing program.

Conceptual overview

Chromaticities

In this project, a key concept will be that of a chromaticity or chroma. We define this as an ordered triple of nonnegative integers that add up to 255. Here are some examples of chromaticities:

  • (0,0,255)
  • (208,31,16)
  • (85,85,85)

And here are some triples of integers that are NOT chromaticities:

  • (200,100,8) is not a chromaticity, because the sum is different from 255
  • (181,78,-4) is not a chromaticity, because one of the components is negative

The three entries in a chromaticity have names: The first is called r or the red component. Similarly the second component is g or the green component, and the last component is b or the blue component.

Why?

Most computer display technologies (like the screen you are probably looking at now) create colors by mixing varying amounts of red, green, and blue light. Often, there are 256 possible levels of brightness for each of the three colors, indicated by an integer between 0 (no light of that color) and 255 (as much light of that color as possible). This means that a color that can be displayed on your computer screen is typically encoded as a triple of integers between 0 and 255, e.g.

  • (255,0,0) is bright pure red
  • (0,0,0) is black
  • (255,255,255) is white
  • (128,128,128) is "50% gray"
  • (255,200,0) is orange-yellow
  • (137,112,219) is a medium purple
  • (0,64,0) is very dark green

As you can see, some colors are bright, and some are dim. Generally, higher numbers mean higher brightness.

So, in this way, a chromaticity represents a color that has a fixed, medium brightness level. Working with chromaticities is a way to consider color variations without allowing differences in brightness alone. There are many shades of pure gray, but only one "pure gray chromaticity", namely (85,85,85).

The interpretation of chromaticity in terms of colors is not something you need to directly use in order to complete the project successfully, but it may be helpful to know the motivation behind the terminology.

Chromaticity Trees

Having introduced chromaticities, now we'll describe a tree-based structure for storing them.

Key observation: If you have two chromaticities that are not equal---let's call them c1=(r1,g1,b1) and c2=(r2,g2,b2)---then at least one of the following must hold:

  • r1 > r2, in which case we say c1 is more red than c2
  • g1 > g2, in which case we say c1 is more green than c2
  • b1 > b2, in which case we say c1 is more blue than c2

This is a special feature of chromaticities, which is not true for triples of integers in general. That is, if you don't fix the sum of the components, it's easy to make triples c1 and c2 where c1 has all of its components less than or equal to c2 (e.g. c1=(5,5,5) and c2=(10,10,10)), so that none of the conditions above holds. But for chromaticities, the fixed sum of components means that c1 will always be more red, more green, or more blue than c2.

It's also worth noting that the three conditions (more red, more green, more blue) are not mutually exclusive. For example:

  • (30,120,105) is more green than (55,100,100), AND
  • (30,120,105) is more blue than (55,100,100)

These three ways that chromaticities might compare leads to a natural way to make an object like a binary search tree to store a set of distinct chromaticies. Each node will store a chromaticity, and will have up to three children: a red child, a green child, and a blue child. These can have their own children, etc., so that below each node there is an entire red subtree, green subtree, etc.. Crucially, such a tree structure can be built so as to satisfy the following

CHROMATICITY TREE DEFINITION:

  1. Every chromaticity in the red subtree of a node is more red than that node, and
  2. Every chromaticity in the green subtree of a node is more green than that node, and
  3. Every chromaticity in the blue subtree of a node is more blue than that node.

In making pictures of such trees, I follow the convention of drawing the child nodes in the left-to-right order red, green, blue, and marking the edges with corresponding colors.

Here is an example of a chromaticity tree:

chromatree example 1

Here is another example of a chromaticity tree:

chromatree example 2

(It might be worth noting that the two examples shown above contain exactly the same chromaticities, and differ only in the position of one node. There are actually several different positions where (30,120,105) can fit into the tree, and these figures show two of them. This is important for the algorithm that searches such a tree!)

Here is an example of something that is not a chromaticity tree:

chromatree non-example

The problem with the tree above is highlighted in the diagram below. There is a node whose red subtree contains a chromaticity that is not more red than the node. The two red components that compare in the wrong way are highlighted.

chromatree non-example explained

What you need to do

First, read this project description document.

Before you start working, download the starter pack.

Then, complete these tasks:

  • Task 1: Write a module chroma that is described below, implementing a chromaticity tree that supports insertion, search, and some other operations, and where each node also stores a string that is the "name" of the chromaticity.
  • Task 2: Modify the program lookup.py contained in the starter pack so that it uses a chromaticity tree instead of a list.

The sections below provide more detail on each task.

The final product of your work will be very similar to the module bst we built in class, and the main thing we're testing is your ability to determine what needs to be different in the chromaticity tree setting.

WHEN YOU ARE DONE, chroma.py (which you write from scratch) and lookup.py (which you've modified from the starter pack version) ARE THE ONLY FILES YOU SHOULD SUBMIT TO GRADESCOPE.

Task 1: chroma module

This is the biggest part of the project. The module chroma doesn't exist yet, but everything it needs to do is documented below. Write the module to match this documentation. The module will be considered correct if it does exactly what is written in this section, and if it follows the rules in the section "Other requirements" below.

Of all the classes and methods you'll write, two are substantially more challenging than the others, and form the core of this project. They are marked "Tricky!".

class Chroma

Purpose: Store a chromaticity and perform comparisons.

Required attributes:

  • r, g, b : The three components (integers in the range 0..255). IMPORTANT: No other class is allowed to directly access these components. Other classes may only call the methods described below, which provide all the necessary features for comparing chromaticities. If you find yourself writing .r or .g or .b next to a variable of type Chroma, and outside of the Chroma class itself, then it is an error.

Methods:

  • __init__(self,r,g,b) : Saves the arguments as attributes. Raises ValueError if any are negative, or if the sum is not equal to 255.
  • __eq__(self,other) : Takes another Chroma object and returns True if this Chroma and that one have the same r, g, and b components. This special method provides overloading of the operator ==, so that expressions like a == b and a != b will work if a and b are Chroma objects.
  • more_red_than(self,other) : Takes another Chroma object and returns True if self.r is greater than other.r, otherwise returns False.
  • more_green_than(self,other) : Takes another Chroma object and returns True if self.g is greater than other.g, otherwise returns False.
  • more_blue_than(self,other) : Takes another Chroma object and returns True if self.b is greater than other.b, otherwise returns False.

class ChromaNode

Purpose: A node in a tree that has up to three children, a key that is a Chroma, and a value name that is a string. Such nodes will be used to build a tree that stores a mapping from chromaticities to names.

Required attributes:

  • key : A Chroma object
  • name : A string
  • red : The red child ChromeNode, or None
  • green : The green child ChromeNode, or None
  • blue : The blue child ChromeNode, or None
  • parent : The parent ChromeNode, or None

Methods:

  • __init__(self,key,name,red=None,green=None,blue=None,parent=None) : Saves all arguments as attributes of the same names.
  • set_red(self,other) : Take another ChromaNode, other, and make it the red child of this node. Make this node the parent of other.
  • set_green(self,other) : Take another ChromaNode, other, and make it the green child of this node. Make this node the parent of other.
  • set_blue(self,other) : Take another ChromaNode, other, and make it the blue child of this node. Make this node the parent of other.

class ChromaTree

Purpose: Maintain a tree of ChromaNode objects that satisfies the Chromaticity Tree Condition.

Required attributes:

  • root : The root of the tree structure maintained by the class; either a ChromaNode or None.

Methods:

  • __init__(self) : Set attribute root to None.
  • insert(self,node,dir=0) : Find a valid place for node in the chromaticity tree and add it as a child of an existing node. See the section Insertion algorithm notes below for important information about how this must be done, and in particular for a description of the argument dir. Tricky!
  • search(self,key) : Given a Chroma object key, search for a node with that key. If found, return the ChromaNode object. If no such node exists, return None. See the section Search algorithm notes below for important hints about how to do this. Tricky!
  • __len__(self) : Return the total number of nodes in the tree. Must be space efficient, in the sense that it should not make a copy of all the nodes in the tree, or any data structure of size proportional to the number of nodes. (For example, copying all of the nodes into a list and then taking the length of the list would not be space efficient.)
  • depth(self) : Return the length of the longest descending path in the tree that starts at self.root. If the tree is empty or has only a root node, return 0.

These methods may call other methods (or themselves). It shouldn't be necessary for you to create functions outside of the class that these methods call. Calling built-in functions is OK.

class ChromaNameMapping

Purpose: Maintain a mutable collection of key-value pairs, where keys are Chroma objects and values are strings (names of the chromaticities), using a ChromaTree as the backing data structure.

Required attributes:

  • T : The ChromaTree that it uses to store the data.

Methods:

  • __init__(self) : Sets self.T to an empty ChromaTree
  • __setitem__(self,key,name) : Make and add a new node to self.T to record the fact that the chromaticity key is supposed to map to the name name.
  • __getitem__(self,key) : Look up the name associated to a chromaticity, i.e. search self.T for a node with key key. If one is found, return the name attribute of that node. If no such node is found, raise KeyError.
  • __len__(self) : Return the number of nodes in self.T.

Note that the special methods __getitem__ and __setitem__ mean you can add and retrieve items from this mapping in the same way you'd use a Python dictionary, e.g.

In [ ]:
M = ChromaNameMapping()
M[Chroma(85,85,85)] = "gray"  # calls __setitem__
print(M[Chroma(85,85,85)]) # calls __getitem__.  Will print "gray"
print(M[Chroma(255,0,0)])  # KeyError

Insertion algorithm notes

Inserting a node into a chromatree is similar to inserting a node in a binary search tree, with one important difference: When comparing the new key to the current node's key and choosing which subtree to descend into, you sometimes have more than one choice. (The images at the beginning of this document showed this; they show two different ways we could insert (30,120,105) into an existing chromaticity tree.)

Because of this issue, the order in which you check whether or not to descend into the red, green, and blue subtrees matters.

If dir is 0, the default, then the order in which directions are checked must be red-green-blue. That means, for example, that a new key that is both more red and more green than the current node will go into the red subtree if dir is 0, since red is checked before green.

If dir is 1, then they must be checked in the order blue-green-red instead. Thus, for example, a new key that is both more red and more green than the current node will go into the green subtree if dir is 1, since green is checked before red.

(If we wanted the insert method to make a decision on its own, it would probably be best to flip a coin on each call and use the answer to decide between dir=0 and dir=1. But randomness would make the results differ between runs, making autograding more difficult, so we've opted for this deterministic strategy involving an extra argument.)

Search algorithm notes

It's up to you to figure out how to write a search algorithm for this structure, but this is tricky enough that you should probably write the steps out by hand and manually test the search for keys in some example chromaticity trees. Once you believe your method works, you can move on to writing code.

For example, you should probably test your algorithm to make sure it can find (30,120,105) in both of the example chromaticity trees at the top of this document.

Note that the search method does not have an argument analogous to dir in the insert method. That's deliberate: It needs to be able to find a key regardless of what strategy was used to insert it!

Task 2: Improve lookup.py

This section describes Task 2, in which you take a program that does a specific thing, and turn it into a program that does the same thing more efficiently using the chroma module.

The existing program

The starter pack contains a program lookup.py which accepts one command line argument, which is the name of a CSV file that we'll call the chromaticity dictionary. That file has a header row

red,green,blue,name

followed by data rows that contain the components of chromaticities and the names associated to them. Here is an example of a possible input file:

red,green,blue,name
255,0,0,red
0,255,0,green
0,0,255,blue
85,85,85,gray
138,76,41,"reddish brown"

The program reads this file and stores all of the chromaticities and names. It then waits for keyboard input, expecting three integers separated by spaces. If the integers entered form a chromaticity, the program searches for its name in the chromaticity dictionary. If it is found, the name is printed in the terminal. If it is not found, the string "<not found>" is printed. If the integers do not form a chromaticity, the string "<invalid>" is printed. The process repeats (waiting for another chromaticity to look up), until a blank line is entered by the user, at which point the program exits.

Therefore, here is a possible session of using the lookup program with the dictionary above.

 input: 0 0 255
output: blue
 input: 0 18 0
output: <invalid>
 input: 138 76 41
output: reddish brown
 input: 128 127 0
output: <not found>
 input: 85 85 85
output: gray
 input: (a blank line)
output: (program exits)

To be clear, the labels "input:" and "output:" above do not actually appear. They are there to indicate which lines correspond to keyboard input, and which ones are output printed by the program.

The program lookup.py works, but it uses a data structure poorly adapted to this purpose (a list). For a large chromaticity dictionary, it will be very slow. Your task is to fix that by using the ChromaNameMapping data structure in place of a list.

Required upgrade

Modify lookup.py so that:

  • It turns the chromaticities it reads from the file into Chroma objects, rather than just tuples
  • It stores the Chroma-name pairs in a ChromaNameMapping object, rather than a list

For most orders of the input chromaticity dictionary, this modified version of the program will be must faster for looking up chromaticity names.

Your should test your modified lookup.py program. A good way to do this would be to copy the original (slow, list-based) lookup program to lookup_orig.py (or some other name), edit lookup.py, and then test to make sure the two programs give identical output.

Important: The starter program contains comments that attempt to help you, pointing out various things about how it works. You will be modifying the program, and so when you are done, some of the original comments will no longer be accurate. Do not submit a program that contains comments that are inaccurate because of your modifications. Doing so will result in a significant point penalty in the code review step.

(In general, modifying code and forgetting to modify its comments or documentation causes a lot of confusion, frustration, and wasted time in software development. Because of this, I want to provide a strong incentive to avoid this common pitfall.)

Other requirements

This section contains rules your code must follow, in addition to it needing to do everything documented in the previous section.

Coding standards

Like everything you submit for credit, your code must follow the rules in the course coding standards document.

r,g,b are private attributes of Chroma

This is mentioned in the module documentation but it is important, so this is a reminder: Any code that needs to compare two Chroma objects to see if one is more red than the other (or more green, more blue) should use the methods .more_red_than(...) etc. to do so. Testing equality of Chroma objects should use the overloaded == and != operators via the __eq__ method. It is not acceptable to access the attributes r, g and b directly, except within methods of Chroma itself.

(Many other languages that support object-oriented programming provide ways to enforce privacy of class attributes, so that it would be literally impossible for other classes to access these attributes. However, Python doesn't have any enforcement mechanism like this.)

No inheritance

This assignment does not use inheritance. None of the classes you will write should be explicit subclasses of any other class.

No commented-out code

Comments may only be used for explanatory text. If at some point you have a file that contains code that you want to prevent from running, you should simply remove that code before submitting it. Do not submit files that contain lines of code that have been disabled by turning them into comments.

No extra code in chroma.py

The file chroma.py should define the necessary classes, and when imported as a module, it should not do anything else.

You can put test code into chroma.py if you like, but it should be inside a block starting

if __name__=="__main__":

so that it only runs when chroma.py is the main program, not when it is imported.

Epilogue: What you'd do in the real world

Reading this section is optional. Having lookup.py use a list for a mapping like this is ridiculously inefficient, and isn't something you should ever do. But for this project you should imagine we're thinking about a world where Python's dict type doesn't exist, and where we want to build an efficient mapping class adapted to the specific key type this program deals with (chromaticities).

And while the tree data structure you build in this project might have certain advantages over dictionaries if we wanted to do more complex operations on the set of chromaticities (e.g. find all chromaticities close to a given one), the reality is that Python's dict will be faster for simple lookup operations. That's mostly because it has been heavily optimized by the language developers, and parts of it are written in C rather than Python.

Revision History

  • 2021-03-08 Correct name of time zone for the deadline. It is due at 6pm on a Friday, as usual, but daylight saving time starts before this project's deadline, so the timezone is correctly called "CDT".
  • 2021-03-07 Initial publication