yaml: could not determine a constructor for the tag

February 2022 ∙ four minute read ∙

So you're trying to read some YAML using PyYAML, and get an exception like this:

>>> yaml.safe_load("!!python/tuple [0,0]")
Traceback (most recent call last):
  ...
yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/tuple'
  in "<unicode string>", line 1, column 1:
    !!python/tuple [0,0]
    ^

... or like this:

>>> yaml.safe_load("!GetAZs us-east-1")
Traceback (most recent call last):
  ...
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!GetAZs'
  in "<unicode string>", line 1, column 1:
    !GetAZs us-east-1
    ^

What does it mean? #

First, a bit of background.

On top of basic types (strings, integers, sequences, and so on), YAML can represent native and user-defined data structures. To denote the type of a node, you mark it with an explicit tag. Even basic types end up with a tag; the following are all equivalent:

>>> yaml.safe_load("[implicit]")
['implicit']
>>> yaml.safe_load("!!seq [global, shorthand]")
['global', 'shorthand']
>>> yaml.safe_load("!<tag:yaml.org,2002:seq> [global, full]")
['global', 'full']

The errors above both mean that the loader encountered an explicit tag, but doesn't know how to construct objects with that tag.

Why does this happen? #

!!python/tuple is a language-specific tag corresponding to a Python native data structure (a tuple). However, safe_load() resolves only basic YAML tags, known to be safe for untrusted input.

!GetAZs is an application-specific tag (in this case, specific to AWS CloudFormation). There's no way for PyYAML to know about it without being told explicitly.

This is by design – from the spec:

That said, tag resolution is specific to the application. YAML processors should therefore provide a mechanism allowing the application to override and expand these default tag resolution rules.

What now? #

Python-specific tags #

For Python-specific tags, you can use full_load(), which resolves all tags except those known to be unsafe; this includes all the tags listed here.

You could also use unsafe_load(), but most of the time it's not what you want:

Warning

yaml.unsafe_load() is unsafe for untrusted data, because it allows running arbitrary code. Consider using safe_load() or full_load() instead.

For example, you can do this:

>>> yaml.unsafe_load("!!python/object/new:os.system [echo WOOSH. YOU HAVE been compromised]")
WOOSH. YOU HAVE been compromised
0

There were a bunch of CVEs about it.

Application-specific tags #

For application-specific tags, you can define a constructor for the tag:

@dataclass
class GetAZs:
    region: str

class Loader(yaml.SafeLoader):
    pass

def construct_GetAZs(loader, node):
    return GetAZs(loader.construct_scalar(node))

Loader.add_constructor('!GetAZs', construct_GetAZs)

Here, we're wrapping the value in a dataclass, to indicate is isn't just a simple string.

Note

We are subclassing SafeLoader because calling add_constructor() on it would modify it in-place, for everyone, which isn't necessarily great; imagine getting a GetAZs from safe_load(), when you were expecting only built-in types.

To use it, pass the loader class to load():

>>> yaml.load("!GetAZs us-east-1", Loader=Loader)
GetAZs(region='us-east-1')

Of course, you don't have to store the value, you can do something with it – after all, that's what CloudFormation does:

KNOWN_AZS = {
    'us-east-1': ['us-east-1a', 'us-east-1b', 'us-east-1c', 'us-east-1d', 'us-east-1e'],
    'eu-west-1': ['eu-west-1a', 'eu-west-1b', 'eu-west-1c'],
}

def construct_GetAZs(loader, node):
    value = loader.construct_scalar(node)
    if value not in KNOWN_AZS:
        raise yaml.constructor.ConstructorError(
            None, None, f"GetAZs got unknown region {value!r}", node.start_mark
        )
    return KNOWN_AZS[value]

Loader.add_constructor('!GetAZs', construct_GetAZs)
>>> yaml.load("!GetAZs us-east-1", Loader=Loader)
['us-east-1a', 'us-east-1b', 'us-east-1c', 'us-east-1d', 'us-east-1e']

But I don't know the tags in advance #

For the above to work, you need to register constructors for each expected tag.

But sometimes you don't know the tags in advance, or there's too many of them, or you just want to access the data, without caring what it means (for example, because you just want to change a little thing and write it back out).

YAML allows you to register a catch-all constructor for unknown tags ...but you still need to implement some sort of generic wrapper to go with it.

Luckily, I've already written a whole article on how to do that, complete with code:

>>> yaml.load("!GetAZs us-east-1", Loader=Loader)
Tagged('!GetAZs', 'us-east-1')

It works with arbitrarily nested YAML:

>>> value = yaml.load("""
... Properties:
...   ImageId: !FindInMap [RegionMap, !Ref 'AWS::Region', HVM64]
... """, Loader=Loader)
>>> value
{
    'Properties': {
        'ImageId': Tagged(
            '!FindInMap',
            ['RegionMap', Tagged('!Ref', 'AWS::Region'), 'HVM64']
        )
    }
}

... allows you to ignore tags most of the time:

>>> value['Properties']['ImageId'][-1] = 'HVMG2'

... and can output tagged YAML too:

>>> print(yaml.dump(value, Dumper=Dumper))
Properties:
  ImageId: !FindInMap
  - RegionMap
  - !Ref 'AWS::Region'
  - HVMG2

Check it out: Dealing with YAML with arbitrary tags in Python.


That's it for now.

Learned something new today? Share this with others, it really helps!