Sets

Python Assets

2022-04-04

Sets are unordered collections of unique objects. In Python a set is a built-in data type, like other more common collections, such as lists, tuples and dictionaries. Sets are widely used in logic and mathematics. In Python we can take advantage of their properties to create shorter, more efficient, and more readable code.

To create a set we put its elements between braces:

s = {1, 2, 3, 4}

We can have elements of multiple data types within a single set:

s = {True, 3.14, None, False, "Hello world", (1, 2)}

However, a set cannot contain mutable objects, like lists, dictionaries, and even other sets.

	`>>> s = {[1, 2]}`
	`Traceback (most recent call last):`
	`...`
	`TypeError: unhashable type: 'list'`

Note that sets and dictionaries are both created using braces, thus Python cannot know whether we intend to create a set or a dictionary when we say:

s = {}

This assignment creates an empty dictionary. To generate an empty set, we need to use the built-in set() function:

s = set() # Create an empty set.

In the same way we can obtain a set from any iterable object:

	`s1 = set([1, 2, 3, 4])`
	`s2 = set(range(10))`

A set can be converted to a list and vice-versa. In the latter case, repeated elements are unified (remember, sets members are unique).

	`>>> list({1, 2, 3, 4})`
	`[1, 2, 3, 4]`
	`>>> set([1, 2, 2, 3, 4])`
	`{1, 2, 3, 4}`

Managing Elements¶

Sets are mutable objects. By using the add() and discard() / remove() methods we can insert and remove elements.

	`>>> s = {1, 2, 3, 4}`
	`>>> s.add(5)`
	`>>> s.discard(2)`
	`>>> s.remove(4)`
	`>>> s`
	`{1, 3, 5}`

Both discard() and remove() serve the same purpose, but remove() will raise a KeyError when the argument does not exist within the set, while discard() will just fail silently.

To determine if an element belongs to a set, we use the in keyword.

	`>>> 2 in {1, 2, 3}`
	`True`
	`>>> 4 in {1, 2, 3}`
	`False`

The clear() function removes all the elements.

	`>>> s = {1, 2, 3, 4}`
	`>>> s.clear()`
	`>>> s`
	`set()`

The pop() method randomly returns an element (it couldn't be any other way since the elements are not ordered). Thus, the following loop prints and removes the members of a set one by one.

	`while s:`
	`print(s.pop())`

pop() throws the KeyError exception when the set is empty.

To get the number of elements we use the well-known len() function:

	`>>> len({1, 2, 3, 4})`
	`4`

Main Operations¶

Some of the most interesting properties of sets lie in their main operations: union, intersection and difference.

The union is performed with the | character and returns a set containing the elements found in at least one of the two sets involved in the operation.

	`>>> a = {1, 2, 3, 4}`
	`>>> b = {3, 4, 5, 6}`
	`>>> a \| b`
	`{1, 2, 3, 4, 5, 6}`

The intersection works analogously, but with the & operator, and returns a new set with the elements found in both sets.

	`>>> a & b`
	`{3, 4}`

The difference returns a new set containing the elements of a that are not in b.

	`>>> a = {1, 2, 3, 4}`
	`>>> b = {2, 3}`
	`>>> a - b`
	`{1, 4}`

Two sets are equal if and only if they contain the same elements (this is known as extensionality):

	`>>> {1, 2, 3} == {3, 2, 1}`
	`True`
	`>>> {1, 2, 3} == {4, 5, 6}`
	`False`

Other Operations¶

B is said to be a subset of A when all the elements of the former also belong to the latter. Python can determine this relationship via the issubset() method.

	`>>> a = {1, 2, 3, 4}`
	`>>> b = {2, 3}`
	`>>> b.issubset(a)`
	`True`

Conversely, A is said to be a superset of B.

	`>>> a.issuperset(b)`
	`True`

The definition of these two relations leads us to conclude that every set is at the same time a subset and a superset of itself.

	`>>> a = {1, 2, 3, 4}`
	`>>> a.issubset(a)`
	`True`
	`>>> a.issuperset(a)`
	`True`

The symmetric difference returns a new set which contains those elements which belong to one of the two sets that participate in the operation but not to both. It could be understood as an exclusive union.

	`>>> a = {1, 2, 3, 4}`
	`>>> b = {3, 4, 5, 6}`
	`>>> a.symmetric_difference(b)`
	`{1, 2, 5, 6}`

Given this definition, it follows that the order of the objects is indistinct:

	`>>> b.symmetric_difference(a)`
	`{1, 2, 5, 6}`

Finally, a set is said to be disjoint with respect to another if they do not share elements with each other.

	`>>> a = {1, 2, 3}`
	`>>> b = {3, 4, 5}`
	`>>> c = {5, 6, 7}`
	`>>> a.isdisjoint(b)`
	`False # Not disjoint because they have at leas one element in common.`
	`>>> a.isdisjoint(c)`
	`True # a and c have no common elements.`

In other words, two sets are disjoint if their intersection is the empty set, so it can be illustrated as follows:

	`>>> def isdisjoint(a, b):`
	`... return a & b == set()`
	`...`
	`>>> isdisjoint(a, b)`
	`False`
	`>>> isdisjoint(a, c)`
	`True`

Immutable Sets¶

frozenset is an implementation similar to set, but immutable. That is, it shares all the set operations provided in this post except for those that involve altering its elements (add(), discard(), etc.). The difference is analogous to that between a list and a tuple.

	`>>> a = frozenset({1, 2, 3})`
	`>>> b = frozenset({3, 4, 5})`
	`>>> a & b`
	`frozenset({3})`
	`>>> a \| b`
	`frozenset({1, 2, 3, 4, 5})`
	`>>> a.isdisjoint(b)`
	`False`

This allows, for example, to use sets as keys in dictionaries:

	`>>> a = {1, 2, 3}`
	`>>> b = frozenset(a)`
	`>>> {a: 1}`
	`Traceback (most recent call last):`
	`...`
	`TypeError: unhashable type: 'set'`
	`>>> {b: 1}`
	`{frozenset({1, 2, 3}): 1}`

Examples¶

What real use cases do sets have? Consider a program that asks the user to enter a couple of integers and outputs those which are prime.

	`# Request user input.`
	`numbers = input("Enter numbers separated by spaces: ")`
	`# Convert to a list of integers.`
	`numbers = [int(n) for n in numbers.split(" ")]`

Now, using the get_prime_numbers() function designed in a previous post to get prime numbers, the traditional solution (using lists) would look something like this:

	`numbers = input("Enter numbers separated by spaces: ")`
	`numbers = [int(n) for n in numbers.split(" ")]`
	`prime_numbers = [n for n in numbers if n in get_prime_numbers(max(numbers))]`

However, if we work with sets, the solution is even shorter and more efficient:

	`numbers = input("Enter numbers separated by spaces: ")`
	`# We use a set comprehension instead.`
	`numbers = {int(n) for n in numbers.split(" ")}`
	`# Then just apply the intersection.`
	`prime_numbers = numbers & get_prime_numbers(max(numbers))`

For this to work we need to make sure that the get_prime_numbers() function returns a set by replacing the last line with a set comprehension (instead of a list comprehension):

	`def get_prime_numbers(max_number):`
	`numbers = [True, True] + [True] * (max_number-1)`
	`last_prime_number = 2`
	`i = last_prime_number`
	`while last_prime_number**2 <= max_number:`
	`i += last_prime_number`
	`while i <= max_number:`
	`numbers[i] = False`
	`i += last_prime_number`
	`j = last_prime_number + 1`
	`while j < max_number:`
	`if numbers[j]:`
	`last_prime_number = j`
	`break`
	`j += 1`
	`i = last_prime_number`
	`return {i + 2 for i, not_crossed in enumerate(numbers[2:]) if not_crossed}`

Managing Elements¶

Main Operations¶

Other Operations¶

Immutable Sets¶

Examples¶

Related Posts