When to use classes in Python? When your functions take the same arguments

May 2021 ∙ five minute read ∙

Are you having trouble figuring out when to use classes or how to organize them?

Have you repeatedly searched for "when to use classes in Python", read all the articles and watched all the talks, and still don't know whether you should be using classes in any given situation?

Have you read discussions about it that for all you know may be right, but they're so academic you can't parse the jargon?

Have you read articles that all treat the "obvious" cases, leaving you with no clear answer when you try to apply them to your own code?

My experience is that, unfortunately, the best way to learn this is to look at lots of examples.

Most guidelines tend to either be too vague if you don't already know enough about the subject, or too specific and saying things you already know.

This is one of those things that once you get it seems obvious and intuitive, but it's not, and is quite difficult to explain properly.

So, instead of prescribing a general approach, let's look at:

  • one specific case where you may want to use classes
  • examples from real-world code
  • some considerations you should keep in mind

The heuristic #

If you have functions that take the same set of arguments, consider using a class.

That's it.

In its most basic form, a class is when you group data with functions that operate on that data; it doesn't have to represent a real ("business") object, it can be an abstract object that exists only to make things easier to use / understand.


As Wikipedia puts it, "A heuristic is a practical way to solve a problem. It is better than chance, but does not always work. A person develops a heuristic by using intelligence, experience, and common sense."

So, this is not the correct thing to do all the time, or even most of the time.

Instead, I hope that this and other heuristics can help build the right intuition for people on their way from "I know the class syntax, now what?" to "proper" object-oriented design.

Example: HighlightedString #

My feed reader library supports searching articles. The results include article snippets, and which parts of the snippet actually matched.

To highlight the matches (say, on a web page), we write a function that takes a string and a list of slices1, and adds before/after markers to the parts inside the slices:

>>> value = 'water on mars'
>>> highlights = [slice(9, 13)]
>>> apply_highlights(value, highlights, '<b>', '</b>')
'water on <b>mars</b>'

While writing it, we pull part of the logic into a helper that splits the string such that highlights always have odd indices. We don't have to, but it's easier to reason about problems one at a time.

>>> list(split_highlights(value, highlights))
['water on ', 'mars', '']

To make things easier, we only allow non-overlapping slices with positive start/stop and no step. We pull this logic into another function that raises an exception for bad slices.

>>> validate_highlights(value, highlights)  # no exception
>>> validate_highlights(value, [slice(6, 10), slice(9, 13)])
Traceback (most recent call last):
ValueError: highlights must not overlap: slice(6, 10, None), slice(9, 13, None)

Quiz: Which function should call validate_highlights()? Both? The user?

Instead of separate functions, we can write a HighlightedString class with:

  • value and highlights as attributes
  • apply() and split() as methods
  • the validation happening in __init__
>>> string = HighlightedString('water on mars', [slice(9, 13)])
>>> string.value
'water on mars'
>>> string.highlights
(slice(9, 13, None),)
>>> string.apply('<b>', '</b>')
'water on <b>mars</b>'
>>> list(string.split())
['water on ', 'mars', '']
>>> HighlightedString('water on mars', [slice(13, 9)])
Traceback (most recent call last):
ValueError: invalid highlight: start must be not be greater than stop: slice(13, 9, None)

This essentially bundles data and behavior.

You may ask: I can do any number of things with a string and some slices, why this behavior specifically? Because, in this context, this behavior is generally useful.

Besides being shorter to use, a class:

  • shows intent: this isn't just a string and some slices, it's a highlighted string
  • makes it easier to discover what actions are possible (help(), code completion)
  • makes code cleaner; __init__ validation ensures invalid objects cannot exist; thus, the methods don't have to validate anything themselves

Caveat: attribute changes are confusing #

Let's say we pass a highlighted string to a function that writes the results in a text file, and after that we do some other stuff with it.

What would you think if this happened?

>>> string.apply('<b>', '</b>')
'water on <b>mars</b>'
>>> render_results_page('output.txt', titles=[string])
>>> string.apply('<b>', '</b>')
'<b>water</b> on mars'

You may think it's quite unexpected; I know I would. Either intentionally or by mistake, render_results_page() seems to have changed our highlights, when it was supposed to just render the results.

That's OK, mistakes happen. But how can we prevent it from happening in the future?

Solution: make the class immutable #

Well, in the real implementation, this mistake can't happen.

HighlightedString is a frozen dataclass, so its attributes are read-only; also, highlights is stored as a tuple, which is immutable as well:

>>> string.highlights = [slice(0, 5)]
Traceback (most recent call last):
dataclasses.FrozenInstanceError: cannot assign to field 'highlights'
>>> string.highlights[:] = [slice(0, 5)]
Traceback (most recent call last):
TypeError: 'tuple' object does not support item assignment

You can find this pattern in werkzeug.datastructures, which contains HTTP-flavored subclasses of common Python objects. For example, Accept2 is an immutable list:

>>> accept = Accept([('image/png', 1)])
>>> accept[0]
('image/png', 1)
>>> accept.append(('image/gif', 1))
Traceback (most recent call last):
TypeError: 'Accept' objects are immutable

Try it out #

If you're doing something and you think you need a class, do it and see how it looks. If you think it's better, keep it, otherwise, revert the change. You can always switch in either direction later.

If you got it right the first time, great! If not, by having to fix it you'll learn something, and next time you'll know better.

Also, don't beat yourself up.

Sure, there are nice libraries out there that use classes in just the right way, after spending lots of time to find the right abstraction. But abstraction is difficult and time consuming, and in everyday code good enough is just that – good enough – you don't need to go to the extreme.


Update: I wrote an article about exceptions to this heuristic (that is, when functions with the same arguments don't necessarily make a class).

That's it for now.

Learned something new today? Share this with others, it really helps!

  1. A slice is an object Python uses internally for the extended indexing syntax; thing[9:13] and thing[slice(9, 13)] are equivalent. [return]

  2. You may have used Accept yourself: the request.accept_* attributes on Flask's request global are all Accept instances. [return]

This is part of a series: