When your functions take the same arguments, consider using a class: counter-examples

2021-06-18 ∙ six minute read

In a previous article, I talk about this heuristic for using classes in Python:

If you have functions that take the same set of arguments, consider using a class.

Thing is, heuristics don't always work.

To make the most out of them, it helps to know what the exceptions are.

So, let's look at a few real-world examples where functions taking the same arguments don't necessarily make a class.

Counter-example: two sets of arguments #

Consider the following scenario:

We have a feed reader web application. It shows a list of feeds and a list of entries (articles), filtered in various ways.

Because we want to do the same thing from the command-line, we pull database-specific logic into functions in a separate module. The functions take a database connection and other arguments, query the database, and return the results.

def get_entries(db, feed=None, read=None, important=None): ...
def get_entry_counts(db, feed=None, read=None, important=None): ...
def search_entries(db, query, feed=None, read=None, important=None): ...
def get_feeds(db): ...

The main usage pattern is: at the start of the program, connect to the database; depending on user input, repeatedly call the functions with the same connection, but different options.


Taking the heuristic to the extreme, we end up with this:

class Storage:

    def __init__(self, db, feed=None, read=None, important=None):
        self._db = db
        self._feed = feed
        self._read = read
        self._important = important

    def get_entries(self): ...
    def get_entry_counts(self): ...
    def search_entries(self, query): ...
    def get_feeds(self): ...

This is not very useful: every time we change the options, we need to create a new Storage object (or worse, have a single one and change its attributes). Also, get_feeds() doesn't even use them – but somehow leaving it out seems just as bad.

What's missing is a bit of nuance: there isn't one set of arguments, there are two, and one of them changes more often than the other.

Let's take care of the obvious one first.

The database connection changes least often, so it makes sense to keep it on the storage, and pass a storage object around:

class Storage:

    def __init__(self, db):
        self._db = db

    def get_entries(self, feed=None, read=None, important=None): ...
    def get_entry_counts(self, feed=None, read=None, important=None): ...
    def search_entries(self, query, feed=None, read=None, important=None): ...
    def get_feeds(self): ...

The most important benefit of this is that it abstracts the database from the code using it, allowing you to have more than one kind of storage.

Want to store entries as files on disk? Write a FileStorage class that reads them from there. Want to test your application with various combinations of made-up entries? Write a MockStorage class that keeps the entries in in a list, in memory. Whoever calls get_entries() or search_entries() doesn't have to know or care where the entries are coming from or how the search is implemented.

This is the data access object design pattern. In object-oriented programming terminology, a DAO provides an abstract interface that encapsulates a persistence mechanism.


OK, the above looks just about right to me – I wouldn't really change anything else.

Some arguments are still repeating, but it's useful repetition: once a user learns to filter entries with one method, they can do it with any of them. Also, people use different arguments at different times; from their perspective, it's not really repetition.

And anyway, we're already using a class...

Counter-example: data classes #

Let's add more requirements.

There's more functionality beyond storing things, and we have multiple users for that as well (web app, CLI, someone using our code as a library). So we leave Storage to do only storage, and wrap it in a Reader object that has a storage:

class Reader:

    def __init__(self, storage):
        self._storage = storage

    def get_entries(self, feed=None, read=None, important=None):
        return self._storage.get_entries(feed=feed, read=read, important=important)

    ...

    def update_feeds(self):
        # calls various storage methods multiple times:
        # get feeds to be retrieved from storage,
        # store new/modified entries
        ...

Now, the main caller of Storage.get_entries() is Reader.get_entries(). Furthermore, the filter arguments are rarely used directly by storage methods, most of the time they're passed to helper functions:

class Storage:

    def get_entries(self, feed=None, read=None, important=None):
        query = make_get_entries_query(feed=feed, read=read, important=important)
        ...

Problem: When we add a new entry filter option, we have to change the Reader methods, the Storage methods, and the helpers. And it's likely we'll do so in the future.

Solution: Group the arguments in a class that contains only data.

from typing import NamedTuple, Optional

class EntryFilterOptions(NamedTuple):
    feed: Optional[str] = None
    read: Optional[bool] = None
    important: Optional[bool] = None

class Storage:

    ...

    def get_entries(self, filter_options):
        query = make_get_entries_query(filter_options)
        ...

    def get_entry_counts(self, filter_options): ...
    def search_entries(self, query, filter_options): ...
    def get_feeds(self): ...

Now, regardless of how much they're passed around, there are only two places where it matters what the options are:

  • in a Reader method, which builds the EntryFilterOptions object
  • where they get used, either a helper or a Storage method

Note that while we're using the Python class syntax, EntryFilterOptions is not a class in the traditional object-oriented programming sense, since it has no behavior.1 Sometimes, these are known as "passive data structures" or "plain old data".

A plain class or a dataclass would have been a decent choice as well; why I chose a named tuple is a discussion for another article.

I used type hints because it's a cheap way of documenting the options, but you don't have to, not even for dataclasses.

The example above is a simplified version of the code in my feed reader library. In the real world, EntryFilterOptions has more options (with more on the way), and the Reader and Storage get_entries() are a bit more complicated.

Another real-world example of this pattern is Requests:


That's pretty much it for now – hang around for some extra stuff, though ;)

I hope I managed add more nuance to the original article, and that you're now at least a little bit better equipped to use classes. Keep in mind that this is more an art than a science, and that you can always change your mind later.

Learned something new today? Share this with others, it really helps!


Bonus: other alternatives #

Still here? Cool!

Let's look at some of the other options I considered, and why I didn't go that way.

Why not a dict? #

Instead of defining a whole new class, we could've used a dict:

{'feed': ..., 'read': ..., 'important': ...}

But this has a number of drawbacks:

  • Dicts are not type-checked. TypedDict helps, but doesn't prevent using the wrong keys at runtime.
  • Dicts break code completion. TypedDict may help with smarter tools like PyCharm, but doesn't in interactive mode or IPython.
  • Dicts are mutable. For our use case, immutability is a plus: the options don't have much reason to change, so it's useful to disallow it.

Why not **kwargs? #

Why not pass **kwargs directly to EntryFilterOptions?

class Reader:
    def get_entries(self, **kwargs):
        return self._storage.get_entries(**kwargs)

Because:

  • It also breaks code completion.
  • It makes the code less self-documenting: you don't know what arguments get_entries() takes, even if you read the source. Presumably, they're in the docstring, but not everybody writes one all the time.
  • If we introduce another options object (say, for pagination), we still have to write code to split the kwargs between the two.

Why not EntryFilterOptions? #

Why not take an EntryFilterOptions directly, then?

from reader import make_reader, EntryFilterOptions
reader = make_reader(...)
options = EntryFilterOptions(read=True)
entries = reader.get_entries(options)

Because it makes things verbose for the user: they have to import EntryFilterOptions, and build and pass one to get_entries() for every call. That's not very friendly.

The Reader and Storage method signatures differ because they're used differently:

  • Reader methods are mostly called by external users in many ways
  • Storage methods are mostly called by internal users (Reader) in a few ways

  1. Ted Kaminski discusses this distinction in more detail in Data, objects, and how we're railroaded into poor design. [return]


This is part of a series: