# Lecture 22

## set and defaultdict

MCS 275 Spring 2022
David Dumas

### Lecture 22: set and defaultdict

Course bulletins:

• Project 3 (due 18 March) coming soon.

## Plan

• Wrap up trees unit
• Start language features unit

## IntegerSet timing

integerset.py has been updated with a script to test addition and membership test times for 20,000 integers.

## Traversals

Last time we introduced the preorder, postorder, and inorder traversals of a binary tree.

The trees module now has methods for each of these.

## Uniquely describing a tree

Many different binary trees can have the same inorder traversal.

Many different binary trees can have the same preorder traversal.

And yet:

Theorem: A binary tree T is uniquely determined by its inorder and preorder traversals.

## Last words on binary trees

• BSTs make a lot of data accessible in a few "hops" from the root.
• They are a good choice for mutable data structures involving search operations.
• Deletion of a node is an important feature we didn't implement. (Take MCS 360!)
• Unbalanced trees are less efficient.

MCS 360 usually covers rebalancing operations.

## Set

Python's built-in type set represents an unordered collection of distinct objects.

You can put an object in a set if (and only if) it's allowed as a key of a dict. For built-in types that usually just means immutable.

Allowed: bool, int, float, str, tuple

Not allowed: list, set

## Set usage

S = { 4, 8, 15, 16, 23, 42 } # Set literal
S = set()  # New empty set
8 in S   # False
5 in S   # True
S.remove(1)   # Raises KeyError
S.remove(5)   # Now S is {10}
S.pop()  # Remove and return one element (unclear which!)
for x in S:  # sets are iterable (but no control over order)
print(x)


## Set operations

Binary operations returning new sets:


S | S2  # Evaluates to union of sets
S & S2  # Evaluates to intersection of sets
S.union(iterable)        # Like | but allows any iterable
S.intersection(iterable) # Like & but allows any iterable


## Set mutations

Operations that modify a set S based on contents of another collection.


# adds elements of iterable to S
S.update(iterable)

# remove anything from S that is NOT in the iterable
S.intersection_update(iterable)

# remove anything from S that is in the iterable
S.difference_update(iterable)


set has lots of other features that are described in the documentation.

Python's set is basically a dictionary without values.

For large collections, it is much faster than using a list.

Appropriate whenever order is not important, and items cannot appear multiple times.

## Histogram

You want to know how many times each character appears in a string.

hist = dict()
for c in s:
hist[c] += 1

This won't work. Why?

## defaultdict

Built-in module collections contains a class defaultdict that works like a dictionary, but if a key is requested that doesn't exist, it creates it and assigns a default value.

import collections
hist = collections.defaultdict(int)
for c in s:
hist[c] += 1

This works!

The defaultdict constructor takes one argument, a function default_factory.

default_factory is called to make default values for keys when needed.

Common examples with built-in factories:

defaultdict(list)  # default value [] as returned by list()
defaultdict(int)   # default value 0, as returned by int()
defaultdict(float) # default value 0.0, as returned by float()
defaultdict(str)   # default value "", as returned by str()

### Revision history

• 2022-03-02 Initial publication