namedtuple has been around since forever,1 and over time, its convenience saw it used far outside its originally intended purpose.
With dataclasses now covering part of those use cases, what should one use named tuples for?
In this article, we take a look at exactly that, with a few examples from real code.
What are named tuples used for?
namedtuple exists in the standard library since Python 2.6, and allows building tuple subclasses that also have fields accessible by attribute lookup.
>>> from collections import namedtuple >>> Point = namedtuple('Point', 'x y')
In general, this is useful when wrapping structured data; from the docs:
Named tuples are especially useful for assigning field names to result tuples returned by the csv or sqlite3 modules.
Because of how easy they are to define, named tuples have also been used for:
- quick-and-dirty temporary data structures,
more readable than plain tuples and regular classes
(you get constructor keyword arguments and a
- hashable instances (to use as dict keys or set members, or as arguments to functions decorated with e.g. functools.lru_cache)
- immutable instances2
dataclasses was added in Python 3.7, and allows writing regular classes just as easily, by generating the required special methods. With frozen instances, it even covers hashable and immutable instances.
Before dataclasses, named tuples were used for the last three use cases because there were no other good alternatives in the standard library – you can do it with a normal class definition, but you have to write all the special methods by hand.
In case you've never used them, here's a comparison.
>>> class Point(NamedTuple): ... x: int ... y: int ... >>> p = Point(1, y=2) >>> p Point(x=1, y=2) >>> p.x 1 >>> p 1 >>> list(p) [1, 2]
>>> @dataclass ... class Point: ... x: int ... y: int ... >>> p = Point(1, y=2) >>> p Point(x=1, y=2) >>> p.x 1 >>> p Traceback (most recent call last): ... TypeError: 'Point' object is not subscriptable >>> list(p) Traceback (most recent call last): ... TypeError: 'Point' object is not iterable
The problems with named tuples
- The instances are always iterable;
this can make it difficult to add fields,
because adding a new field will break code that uses unpacking.
- Also, if used as return value in a backwards-compatible API, it means the result must remain iterable/indexable forever, even if you later stop using namedtuple.
- Instances can be accidentally compared with any other tuple.
- There's no mutable version (in the standard library).
- Fields can't be combined by inheritance.
What are named tuples still good for?
With the drawbacks mentioned above, and with dataclasses covering a lot of their (maybe unintended) use cases, are named tuples good for anything anymore?
As you'd expect, the answer is yes.
The data is naturally a tuple
Named tuples remain perfect for their originally intended purpose: ordered, structured data.
- rows returned by a database query
- the result of parsing a binary file format
- pairs of things, like HTTP headers (a dict is not always appropriate, since the same header can appear more than once, and the order does matter in some cases)
Pairs of things are interesting, because both unpacking and attribute access are valid usage patterns.
For example, for my feed reader library I use a named tuple to model the result of a feed update, a (feed URL, update details or exception) pair.
This makes it easier to make sense of what a value means in interactive sessions or when debugging; compare the named and unnamed versions:
>>> result = next(reader.update_feeds_iter()) >>> result UpdateResult(url='http://antirez.com/rss', value=None) >>> tuple(result) ('http://antirez.com/rss', None)
You're already using a tuple
You're already using a tuple, and want to make new code more readable: a namedtuple gets you this, but guarantees you won't break old code.
Some people argue that wherever you return a non-trivial tuple, you should be returning a namedtuple instead. I tend to agree.
You want consumers that do unpacking to fail
In some cases, you want consumers that do unpacking to fail.
For example, in my feed reader library, I use a named tuple to group arguments related to filtering, because there's a lot of them, and they get passed around quite a bit before being used (I cover why in more detail here).
I know all arguments should always be handled, so I use unpacking specifically because I want the code to fail when a new one is added – if I used attribute access, the code would silently succeed. This is no substitute for tests, but the early warning is nice, especially in a larger code base.
Memory and speed
Last, but not least, named tuples are useful if you care about memory or speed; they are much smaller and faster than the equivalent (data)class. In most cases, the difference doesn't matter, but it can become noticeable if you create millions of instances.
Setting __slots__ helps with memory, but doesn't really help with speed.
Here's a quick comparison:
|cls(1, 2)||obj.a||hash(obj)||size||total size|
|dataclass + slots||709.1||45.5||342.5||48||104|
|dataobject + gc||150.3||43.1||104.1||48||48|
hash(obj) are timings for that expression,
I ran this with 64-bit CPython 3.8 on macOS; Linux looks roughly the same.
When increasing the number of fields,
obj.a remains constant,
while the other timings increase proportionally.
The slots dataclass is always 8 bytes smaller than the namedtuple.
For the dataobject rows I used recordclass, which provides dataclass/namedtuple-equivalent types. The version without gc doesn't participate in cyclic garbage collection, so it shouldn't be used for recursive data structures.
The library still has some rough edges, though: the documentation is a bit confusing, and I had to use the (yet unreleased) 0.15 version to get it working; also, note the wrong total size (it may be a Pympler bug). Nevertheless, the numbers are pretty compelling, and if you have this problem, it's definitely worth a look.
The class definitions:
__slots__ must be set explicitly;
this was fixed in Python 3.10.
from typing import NamedTuple from dataclasses import dataclass from recordclass import dataobject class NT(NamedTuple): a: int b: int @dataclass(frozen=True) class DC: a: int b: int @dataclass(frozen=True) class DS: a: int b: int __slots__ = ('a', 'b') class DO(dataobject): a: int b: int __options__ = dict(readonly=True, fast_new=True) class DG(dataobject): a: int b: int __options__ = dict(readonly=True, fast_new=True, gc=True)
That's it for now. :)
Learned something new today? Share this with others, it really helps!