Lecture 30

Databases

MCS 275 Spring 2021
David Dumas

Lecture 30: Databases

Course bulletins:

  • Project 3 grades expected by end of day on Wednesday.
  • Worksheet 11 to be posted today.

Database week

This week we'll learn about databases, a language for writing database queries (SQL), and the use of the SQLite database in Python.

We'll only scratch the surface; you can spend a semester on this topic. E.g. CS 480 covers database design in detail, and MCS 565 covers theoretical aspects at the graduate level.

Databases

A database is a storage and retrieval system for structured data, usually in a persistent storage medium.

This is such a general definition that it sounds similar to other concepts like a data structure. What's the difference?

A database is defined by the way it can be used (storage and retrieval). In contrast, a data structure (like list, dictionary, BST) refers to the way data is arranged.

E.G. CSV?

A CSV file could be seen as a database in this broad sense. But to retrieve, modify, and save data in such a file, you need to do all the work yourself.

The term database is most often used when the storage and retrieval is handled by a separate piece of software—a database management system (DBMS).

When you use a DBMS, you let it handle the storage details. You interact with the data using a set of commands the DBMS supports.

E.G. CSV module?

Python's CSV module could almost be seen as a DBMS, but one that makes changing data very cumbersome. How do you update one row of data in a CSV?

Read file, change data, delete file, write the new CSV? (Fancier file IO tricks with .seek()?)

Most DBMS would have a single command to update a record.

Typical DBMS features

  • Find and return data based on criteria
  • Insert new data or make new collections
  • Update (change) existing data
  • Distribute storage across multiple servers
  • Receive requests and return data over a network

DBMS command usually express intent (e.g. find X, change Y) rather than implementation details (e.g. the loops, data structures, algorithms, IO operations).

Relational databases

A relational database organizes data into rectangular tables, called relations. It is the most common type.

Each column of a table has a name (e.g. "year") and a type (e.g. 16-bit unsigned integer).

A row in a table, called a tuple in DB terminology, is the basic unit of information you work with. The values in a single row refer to different attributes of some item entity (e.g. a book, an event, a user, a song, ...).

Table example

A table in a relational database might be used to store tasks in a to-do list application.

task_iddescriptiont_addedt_duedone
integerstringfloatfloatboolean
964Enjoy spring break2247True
971Write Lecture 302247True
973Write Lecture 314653False
978Write Project 44666False
408Finish novel3nullFalse

RDBMS

Some widely-used relational database management systems (RDBMS):

  • Open source
    • PostgreSQL
    • MySQL
    • SQLite
  • Proprietary
    • Microsoft SQL Server
    • Oracle
    • IBM DB2

UIC examples

UIC stores your course registration and final grades in an Oracle database.

Blackboard stores all course-related data (e.g. grade center) in a Microsoft SQL Server Database.

UIC's web site (uic.edu) uses Wordpress, which stores site content in a MySQL database.

I store some research data in a PostgreSQL database.

Aside: (not R)DBMS

MongoDB, Cassandra, Redis, Couchbase are popular examples of database management systems that use a non-relational structure.

For example, Redis is a key-value store, like a persistent version of Python's dict (with lots of additional features). It is often used for highly concurrent, latency-sensitive applications.

SQL

Structured Query Lanuage or SQL is a language for making requests (queries) to a RDBMS.

(Pronounced "ess kyoo ell" or "sequel".)

Think of SQL as a data request programming language. Most RDBMS use SQL exclusively, so if you want to ask a RDBMS to do something, you need to write it in SQL.

Learning SQL is learning a new programming language. We'll only cover the basics in MCS 275.

Summary so far

"Database" has a rather general meaning, but outside of theoretical CS it usually refers to a relational database management system (RDBMS) where you access and store tabular data using a special language called SQL.

Key SQL commands

  • SELECT finds rows in a table that match given conditions and returns them
  • INSERT adds a new row to a table
  • UPDATE modifies rows in a table (e.g. change the data in a single existing row)
  • CREATE TABLE adds a new table
  • DROP TABLE deletes a table

SQLITE

SQLite is an open source RDBMS that stores an entire database in a single file.

It consists of a standalone program where you can run SQL commands in a REPL environment, as well as libraries for most popular programming languages.

It was created by D. Richard Hipp in 2000 and is in the public domain. It is incredibly widely used (e.g. every Android or iOS mobile device, every copy of Windows 10, Chrome, Safari, Firefox). Many car stereos use it!

Lite?

The "lite" part of SQLite refers to its limited scope. It doesn't handle network requests or distributing data across multiple files, disks, or computers. But for many purposes it is a good choice.

If you want a database server that many programs will interact with concurrently, SQLite is not a good choice.

And any time you consider inventing your own file format to store a program's data, you should think carefully about whether CSV, JSON, or a SQLite database might be better.

SQLite in MCS 275

SQLite is the RDBMS we cover in MCS 275 because the sqlite3 module is part of the Python standard library, and it can be used with a minimal amount of setup. A single file contains all the data.

In comparison, MySQL or PostgreSQL require installing and configuring both server and client software, creating user accounts, distributing credentials, having active network connections, etc..

Next time

We'll start working with SQLite databases in Python (and in the SQLite shell), focusing on learning basic SQL commands.

References

Revision history

  • 2021-03-29 Initial publication
  • 2021-04-07 Add info about database courses at UIC