Lecture 34

Requesting URLs in Python

MCS 260 Fall 2021
David Dumas

Reminders

  • Homework 12 will be posted tomorrow
  • Read Project 4 description
  • Project 4 proposals due by Nov 17
    • But please give yourself more work time by doing it sooner if possible
  • Project 4 is due on Fri Dec 3 at 6:00pm Central
  • No office hours or synchronous lecture Friday
    • I'll post a lecture video and slide set

Internet Layer Cake

ApplicationRetrieve http://example.com/
TransportTransmit GET / to 93.184.216.34
NetworkDeliver this packet to 93.184.216.34
LinkSend this ethernet frame to the router
PhysicalChange voltages on these wires...

Today

We'll discuss making Application-level network requests in Python.

We focus specifically on retrieving data (documents, etc.) from a Uniform Resource Locator or URL.

The urllib module in Python supports this. It is primarily focused on HTTP, HTTPS, and local files.

HTTP request types

HTTP allows many types of requests. For example:

  • GET — Ask for the resource. Most common.
  • POST — Submit data to the resource.
  • PUT — Submit data that should replace the resource.

Today we'll only use GET.

HTTP response

Response consists of a numeric status code, some headers (key: value pairs, one per line), then a payload.

E.g. GET a web page, the HTML will be in the payload.

There are lots of codes; first digit gives category:

  • 2xx — success
  • 3xx — what you want is somewhere else
  • 4xx — error (server thinks it's your fault)
  • 5xx — error (server's fault)

Parts of a HTTP response

Response to GET http://example.com/

Basic urllib usage

Import urllib.request to get the most convenient functions for loading URLs.

Call urllib.request.urlopen(url) to open the URL url using GET. It returns a response object.

Response objects behave like read-only files, and should be closed with .close().

If a 4xx or 5xx response is received, or if contacting the host fails, a urllib.error.URLError exception is raised.

Response objects

A HTTP response object res has:

  • res.status — the status code
  • res.geturl() — returns the final URL (maybe not the one requested, if redirection used)
  • res.read() — returns the payload as a bytes object
  • res.headers — dict-like object storing the HTTP headers (not HTML header!)
  • res.headers.get_content_charset() — Return payload encoding, if known

Bytes and strings

Often the payload is meant to be a string, but you will always receive it as bytes.

To recover that string from the bytes object returned by res.read(), you need to call the .decode(...) method, e.g.


        enc = res.headers.get_content_charset()  # probably "UTF-8"
        response_string = res.read().decode(enc) # bytes -> str
    

APIs

An application programming interface or API is a structured way for computer programs to talk to each other.

APIs often use the network, and often use HTTP.

Some are available freely to anyone.

Using an API

urllib.request.urlopen is a great way to fetch data from HTTP APIs.

Example for today: A free dice rolling JSON API* by Steve Brazier at roll.diceapi.com.

Examples:

  • http://roll.diceapi.com/json/d6 — roll one six-sided die
  • http://roll.diceapi.com/json/3d6 — roll three six-sided dice
  • http://roll.diceapi.com/json/4d12 — roll four twelve-sided dice

* This API could disappear at any moment. It worked on November 9, 2021.

URL parameters

HTTP GET requests can send an associative array of parameters. For example, to send the dictionary {"name":"David","apple":"McIntosh"} to http://example.com/ the URL would be


            http://example.com/?name=David&apple=McIntosh
            

The parameter list begins with ? and has & between name=value pairs. It gets tricky when values or names have spaces, but urllib.parse.urlencode can convert a dictionary to a suitable string.

Cat Facts

The domain cat-fact.herokuapp.com hosts an API* created by CS undergrad student Alex Wohlbruck for retrieving facts about cats (and other animals). E.g.

  • https://cat-fact.herokuapp.com/facts/random?amount=2 — two random facts about cats
  • https://cat-fact.herokuapp.com/facts/random?animal_type=dog&amount=1 — one random fact about dogs

* This API could disappear at any moment. It worked on November 10, 2020.

References

Revision history

  • 2021-11-10 Initial publication