Dataclasses without type annotations

2021-03-23 ∙ six minute read

The dataclasses standard library module reduces the boilerplate of writing classes by generating special methods like __init__ and __repr__.

I've noticed a small (but vocal) minority of people that:

  • would like to use dataclasses, but feel they are forced to use type annotations to do so; and more generally, that choosing to opt out of type hints means they are restricted from using specific orthogonal language features
  • perceive dataclasses' use of type annotations as a sign of type annotations becoming compulsory in the future

Now, I know most of these people are probably just looking for something to be angry about – this is the internet, after all.

But if you really want to use dataclasses, you can, static typing or not. Here:

>>> @dataclass
... class Data:
...     one: ...
...     two: ... = 2
...
>>> Data(1)
Data(one=1, two=2)

I'll say it again: dataclasses do not require type annotations. Despite what most examples show, they only require variable annotations.

If you'd like to know why, how to make the best of it, and what this means about Python in general, read on!

2021-04-01 update: The decorator from the If you really don't like variable annotations section below is now available on PyPI: typeless-dataclasses.

Dataclasses were added in Python 3.7, with a backport available for 3.6. If you need to support earlier versions, or want more powerful features like validators, converters, and __slots__, check out attrs.

Contents

A bit of language lawyering #

First, let's define some terms, straight from the Python glossary:

annotation
A label associated with a variable, a class attribute or a function parameter or return value, used by convention as a type hint.
variable annotation
An annotation of a variable or a class attribute.
type hint
An annotation that specifies the expected type for a variable, a class attribute, or a function parameter or return value. Type hints are optional and are not enforced by Python [...].
PEP 526
(my definition) Titled "Syntax for Variable Annotations", Python enhancement proposal that specifies two different things: syntax to add annotations to variables and how to use said syntax with PEP 484, "Type Hints".

In practice, annotation is used somewhat interchangably with type [hint] annotation. There's this example from the beginning of the dataclasses module:

The member variables to use in these generated methods are defined using PEP 526 type annotations.

It's also true that most of the examples are using PEP 484 annotations.

However, the dataclass() decorator documentation clearly says that the types specified in the annotation are ignored:

The dataclass() decorator examines the class to find fields. A field is defined as class variable that has a type annotation. With two exceptions described below, nothing in dataclass() examines the type specified in the variable annotation.

Which has some very interesting implications: you can use anything as the annotation.

Don't believe me? Here's from the author himself, again:

That is, use any type you want. If you're not using a static type checker, no one is going to care what type you use.

Really,

@dataclass
class Literally:
    one: "anything can go in here",
    two: sum(map(ord, "as long as"))
    three: lambda: "it can be evaluated"

# Now, I've noticed a tendency for this program to get rather silly.

@dataclass
class Hell:
    one: "starting with Python 3.10",
    two: it(s=not even.evaluated)
    three: it[just].has(to=be) * syntactically + valid

# Right! Stop that! It's SILLY!

If not type hints, then what? #

Now that we've seen that type hints are not required, let's look at some decent alternatives of using dataclasses without them.1

Partial types #

My favorite approach is to use a built-in, string or literal that roughly matches type of the attribute, to make the intent more obvious to human readers. I've found myself doing this naturally, and it's what prompted this article in the first place.

It's quite convenient when you come back to the code after a few months :)

@dataclass
class Data:
    one: set
    two: 'dict(int -> str)'
    three: ['one', 'two', ...]

Documentation #

Speaking of showing intent: if you're not using some other convention for attribute documentation, annotations seem like a good place for short docstrings. Although I doubt any documentation generators support this; still fine for scripts, though.

@dataclass
class Data:
    one: "the first thing"
    two: "the second thing; an integer" = 2

Ellipsis #

The Ellipsis literal is a nice way of saying "I don't care about this value":

@dataclass
class Data:
    one: ...
    two: ... = 2

Being type checker friendly #

If you still want the dataclass to work with type checking, while not bothering with types yourself, you can use Any:

from typing import Any

@dataclass
class Data:
    one: Any
    two: Any = 2

Or, if you don't like the extra import, use object:

@dataclass
class Data:
    one: object
    two: object = 2

This works because everything in Python is an object (figuratively and literally).

Aside: named tuples #

Not directly related to dataclasses, but all of the above work with the typed version of namedtuple as well:

class Data(NamedTuple):
    one: ...
    two: dict = 2

Will this not break stuff? #

No.

If the documentation states that dataclass() ignores annotation values, it will stay like that for the foreseeable future; standard library deprecations aren't taken lightly.

Also, all of the major typing PEPs (484, 526, 563) clearly state that:

Python will remain a dynamically typed language, and the authors have no desire to ever make type hints mandatory, even by convention.

563 does imply that the type hinting use of annotations will become standard in the future, but that's only relevant if you care about typing.

If you really don't like variable annotations #

... I made a decorator that makes them optional:

from dataclasses import dataclass, field

@dataclass
@typeless
class Data:
    one = field()
    two = field(default=2)

Compare with attrs:

import attr

@attr.s
class Data:
    one = attr.ib()
    two = attr.ib(default=2)
It is less than 30 lines of code, and works by adding annotations programmatically:
import typing
import inspect
import dataclasses

def typeless(cls):
    if not hasattr(cls, '__annotations__'):
        cls.__annotations__ = {}

    for name, thing in cls.__dict__.items():
        if name.startswith('__') and name.endswith('__'):
            continue
        if not isattribute(thing):
            continue

        if isinstance(thing, dataclasses.Field):
            annotation = typing.Any
        else:
            annotation = typing.ClassVar[typing.Any]

        cls.__annotations__.setdefault(name, annotation)

    return cls

def isattribute(thing):
    return not any(p(thing) for p in [
        inspect.isroutine,
        inspect.ismethoddescriptor,
        inspect.isdatadescriptor,
    ])

It's silly, but it works!

2021-04-01 update: This is now available on PyPI: typeless-dataclasses.

We are all consenting adults #

There's a saying in the Python world, probably as pervasive as The Zen of Python itself, that you may be unaware of if you haven't read older articles or discussions on python-dev: we are all consenting adults.

It was initially used to refer to Python's attitude towards private class attributes (that is, nothing's really private), but it also applies to things like monkey patching, code generation, and more:

[...] No class or class instance can keep you away from all what's inside (this makes introspection possible and powerful). Python trusts you. It says "hey, if you want to go poking around in dark places, I'm gonna trust that you've got a good reason and you're not making trouble."

After all, we're all consenting adults here.

As long as you're OK with the consequences, you can do whatever you please; no one's stopping you. Of course, it is the responsible, adult thing to learn what those are – "know the rules so you can break them effectively" kind of thing.

Yes, if you're working on a team, you might have to gather consensus and persuade people (or if you can't, go with the current one), but isn't that how a healthy team works anyway?


Type annotations are (and will continue to be) a thing, and dataclasses exist in the context of that; it would be silly to not converge on something, and not have clear guidance for beginners.

But if you go and read the documentation, there is a clear alternative right there. If you are experienced enough have opinions about things, you are probably experienced enough to understand the alternatives and make your own choices.

Python trusts you :)

  1. You may have seen some of the examples below in very nice Reddit comments like this one (it appears in other threads as well, where the author's patience wasn't really deserved; I'm deliberately not linking to those). [return]