Python sentinel objects, type hints, and PEP 661

PEP 661 "Sentinel Values" recently brought to attention the sentinel object pattern.¹

While by no means new², this time the pattern appears in the context of typing, so it's worth taking a look at how the two interact.

Contents:

What's a sentinel, and why do I need one?
- Real world examples
- Non-private sentinels
What's this got to do with typing?
What's with PEP 661?
- How does this affect me?
- Is this worth a PEP?

What's a sentinel, and why do I need one? #

The PEP 661 abstract summarizes it best:

Unique placeholder values, widely known as "sentinel values", are useful in Python programs for several things, such as default values for function arguments where None is a valid input value.

The simplest use case I can think of is a function that returns a default value only if explicitly provided, otherwise raises an exception.

The next() built-in function is a good example:

next(iterator[, default])

Retrieve the next item from the iterator by calling its __next__() method. If default is given, it is returned if the iterator is exhausted, otherwise StopIteration is raised.

Given this definition, let's try to re-implement it.

next() essentially has two signatures³:

next(iterator) -> item or raise exception
next(iterator, default) -> item or default

There are two main ways to write a function that supports both:

next(*args, **kwargs); you have to extract iterator and default from args and kwargs, and raise TypeError if there are too many / too few / unexpected arguments
next(iterator, default=None); Python checks the arguments, you just need to check if default is None

To me, the second seems easier to implement than the first.

But the second version has a problem: for some users, None is a valid default – how can next() distinguish between raise-exception-None and default-value-None?

In your own code, you may be able to guarantee None is never a valid value, making this a non-issue.

In a library, however, you don't want to restrict users in this way, since you usually can't foresee all their use cases. Even if you did choose to restrict valid values like this, you'd have to document it, and the users would have to learn about it, and always remember the exception.⁴

Here's where a private, internal-use only sentinel object helps:

_missing = object()

def next(iterator, default=_missing):
    try:
        return iterator.__next__()
    except StopIteration:
        if default is _missing:
            raise
        return default

Example output:

>>> it = iter([1])
>>> print(next(it, None))
1
>>> print(next(it, None))
None
>>> print(next(it))
Traceback (most recent call last):
  ...
StopIteration

Now, next() knows that default=_missing means raise exception, and default=None is just a regular default value to be returned.

You can think of _missing as of another None, for when the actual None is already taken – a "higher-order" None. Because it's private to the module, users can never (accidentally) use it as a default value, and never have know about it.

Tip

For a more in-depth explanation of sentinel objects and related patterns, see The Sentinel Object Pattern by Brandon Rhodes.

Real world examples #

The real next() doesn't actually use sentinel values, because it's implemented in C, and things are sometimes different there.

But there are plenty of examples in pure-Python code:

The dataclasses module has two.

The docs even explain what a sentinel is:

[...] the MISSING value is a sentinel object used to detect if the default and default_factory parameters are provided. This sentinel is used because None is a valid value for default. No code should directly use the MISSING value.

(The other one is used in the __init__ of the generated classes to show a default value comes from a factory.)
attrs also has two. One of them (analogous to dataclasses.MISSING) is even included in the API documentation.
Werkzeug has one.
I have one in my feed reader library (originally stolen from Werkzeug). I use it for methods like get_feed(feed[, default]), which either raises FeedNotFoundError or returns default.

Non-private sentinels #

I mentioned before sentinels are private; that's not always the case.

If the sentinel is the default argument of a public method or function, it may be a good idea to expose / document it, to facilitate inheritance and function wrappers.⁵ attrs is a good example of this.

(If you don't expose it, people can still extend your code by using their own sentinel, and then calling either form of your function.)

What's this got to do with typing? #

Let's try to add type hints to our hand-rolled next():

from typing import overload, TypeVar, Union, Iterator

T = TypeVar('T')
U = TypeVar('U')


# We define MissingType in one of two ways:

class MissingType: pass
# MissingType = object

# The second one is equivalent to the original
# `_missing = object()`, but the alias allows us
# to keep the same type annotations.

_missing = MissingType()


# As mentioned before, next() is actually two functions;
# typing.overload allows us to express this.
#
# One that returns an item or raises an exception:

@overload
def next(iterator: Iterator[T]) -> T: ...

# ... and one that takes a default value (of some type U),
# and returns either an item, or that default value
# (of the *same* type U):

@overload
def next(iterator: Iterator[T], default: U) -> Union[T, U]: ...

# The implementation takes all the arguments,
# and returns a union of all the types:

def next(
    iterator: Iterator[T],
    default: Union[MissingType, U] = _missing
) -> Union[T, U]:
    try:
        return iterator.__next__()
    except StopIteration:

        # "if default is _missing" is idiomatic here,
        # but Mypy doesn't understand it
        # ("var is None" is a special case).
        # It does understand isinstance(), though:
        # https://mypy.readthedocs.io/en/stable/casts.html#casts

        if isinstance(default, MissingType):
            # If MissingType is `object`, this is always true,
            # since all types are a subclass of `object`.
            raise

        return default

The isinstance() thing at the end is why a plain object() sentinel doesn't work – you can't (easily) get Mypy to treat your own "constants" the way it does a built-in constant like None, and the sentinel doesn't have a distinct type.

Also, if you use the MissingType = object version, Mypy complains:

next.py:37: error: Overloaded function implementation cannot produce return type of signature 2

If you're wondering if the good version actually worked, here's what Mypy says:

it = iter([1, 2])

one = next(it)
reveal_type(one)
# next.py:62: note: Revealed type is 'builtins.int*'

two = next(it, 'a string')
reveal_type(two)
# next.py:66: note: Revealed type is 'Union[builtins.int*, builtins.str*]'

What's with PEP 661? #

There are many sentinel implementations out there; there are 15 different ones in the standard library alone.

Many of them have at least one of these issues:

non-descriptive / too long repr() (e.g. <object object at 0x7f99a355fc20>)
don't pickle correctly (e.g. after unpickling you get a different, new object)
don't work well with typing

Thus, PEP 661 "suggests adding a utility for defining sentinel values, to be used in the stdlib and made publicly available as part of the stdlib". It looks like this:

>>> NotGiven = sentinel('NotGiven')
>>> NotGiven
<NotGiven>
>>> MISSING = sentinel('MISSING', repr='mymodule.MISSING')
>>> MISSING
mymodule.MISSING

This utility would address all the known issues, saving developers (mostly, stdlib and third party library authors) from reinventing the wheel (again).

How does this affect me? #

Not at all.

If the PEP gets accepted and implemented, you'll be able to create an issue-free sentinel with one line of code.

Of course, you can keep using your own sentinel objects if you want to; the PEP doesn't even propose to change the existing sentinels in the standard library.

Is this worth a PEP? #

PEPs exist to support discussions in cases where the "correct" way to go isn't obvious, consensus or coordination are required, or the changes have a big blast radius. A lot of PEPs get abandoned or rejected (that's fine, it's how the process is supposed to work).

PEP 661 seems to fall under the "requires consensus" category; it follows a community poll where although the top pick was "do nothing", most voters went for "do something" (but with no clear agreement on what that should be).

The poll introduction states:

This is a minor detail, so ISTM most important that we reach a reasonable decision quickly, even if that decision is that nothing should be done.

It's worth remembering that doing nothing is always an option. :)

If you're into this kind of thing, I highly recommend going through the poll thread and the (ongoing) PEP discussion thread – usually, these discussions are API design master classes.

That's all I have for now.

Learned something new today? Share this with others, it really helps! PyCoder's Weekly HN Reddit linkedin Twitter

The PEP is still in draft status as of 2021-06-10. ^[return]
Here's a 2008 article about it. ^[return]
While Python doesn't support overloading, sometimes it's useful to think about functions in this way. ^[return]
The same applies to using some other "common" value, for example, a "<NotGiven>" string sentinel.

For immutable values like strings, it's probably worse. Because of optimizations like interning, strings constructed at different times may actually result in the same object. The data model specifically allows for this to happen (emphasis mine):

Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.
^[return]
Thanks to u/energybased for reminding me of this! ^[return]