yaml: while constructing a mapping found unhashable key
February 2023 ∙ five minute read ∙
So you're trying to read some YAML using PyYAML, and get an exception like this:
>>> yaml.safe_load("""\
... [0, 0]: top-left
... [1, 1]: bottom-right
... """)
Traceback (most recent call last):
...
yaml.constructor.ConstructorError: while constructing a mapping
found unhashable key
in "<unicode string>", line 1, column 1:
[0, 0]: top-left
^
What does it mean? #
The error message is pretty self-explanatory, but let's unpack it a bit.
First, it happened during construction – that is, while converting the generic representation of the YAML document to native data structures; in this case, converting a mapping to a Python dict.
The problem is that a key of the mapping,
[0, 0]
, is not hashable:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a
__hash__()
method), and can be compared to other objects (it needs an__eq__()
method). Hashable objects which compare equal must have the same hash value.
Most immutable built-ins are hashable; mutable containers (such as lists) are not; immutable containers (such as tuples) are hashable only if their elements are.
Why does this happen? #
This is not a limitation of YAML itself; quoting the spec:
The content of a mapping node is an unordered set of key/value node pairs, with the restriction that each of the keys is unique. YAML places no further restrictions on the nodes. In particular, keys may be arbitrary nodes, the same node may be used as the value of several key/value pairs and a mapping could even contain itself as a key or a value.
We can load everything up to the representation:
>>> yaml.compose("[0, 0]: top-left")
MappingNode(
tag='tag:yaml.org,2002:map',
value=[
(
SequenceNode(
tag='tag:yaml.org,2002:seq',
value=[
ScalarNode(tag='tag:yaml.org,2002:int', value='0'),
ScalarNode(tag='tag:yaml.org,2002:int', value='0'),
],
),
ScalarNode(tag='tag:yaml.org,2002:str', value='top-left'),
)
],
)
The limitation comes from how dicts are implemented, specifically:
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
>>> {[0, 0]: "top-left"}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
If we use a (hashable) tuple instead, it works:
>>> {(0, 0): "top-left"}
{(0, 0): 'top-left'}
What now? #
Use the representation #
Depending on your needs, the representation might be enough. But probably not...
Use full_load() and Python-specific tags #
If you control the input,
and you're OK with language-specific tags,
use full_load()
;
it resolves all tags except those known to be unsafe,
including all the Python-specific tags listed
here.
>>> yaml.full_load("""\
... !!python/tuple [0, 0]: top-left
... !!python/tuple [1, 1]: bottom-right
... """)
{(0, 0): 'top-left', (1, 1): 'bottom-right'}
You could also use unsafe_load()
,
but most of the time it's not what you want:
Warning
yaml.unsafe_load()
is unsafe for untrusted data,
because it allows running arbitrary code.
Consider using safe_load()
or full_load()
instead.
For example, you can do this:
>>> yaml.unsafe_load("!!python/object/new:os.system [echo WOOSH. YOU HAVE been compromised]")
WOOSH. YOU HAVE been compromised
0
There were a bunch of CVEs about it.
But, I don't control the input #
If you don't control the input, you can use a custom constructor to convert the keys to something hashable. Here, we convert list keys to tuples:
class Loader(yaml.SafeLoader):
pass
def construct_mapping(self, node):
pairs = self.construct_pairs(node, deep=True)
try:
return dict(pairs)
except TypeError:
rv = {}
for key, value in pairs:
if isinstance(key, list):
key = tuple(key)
rv[key] = value
return rv
Loader.construct_mapping = construct_mapping
Loader.add_constructor('tag:yaml.org,2002:map', Loader.construct_mapping)
>>> yaml.load("""\
... [0, 0]: top-left
... [1, 1]: bottom-right
... """, Loader=Loader)
{(0, 0): 'top-left', (1, 1): 'bottom-right'}
We subclass SafeLoader to account for
a PyYAML quirk
– calling add_constructor()
directly
would modify it in-place, for everyone, which isn't necessarily great.
We still override construct_mapping
so that other constructors
wanting to make a mapping get to use our version.
Alas, this is quite limited, because new key types need to be handled explicitly; for example, we might be able to convert dicts to a frozenset of items().
>>> yaml.load("{0: 1}: top-left", Loader=Loader)
Traceback (most recent call last):
...
rv[key] = value
TypeError: unhashable type: 'dict'
But nested keys don't work either, we need to convert them recursively ourselves:
>>> yaml.load("[[0]]: top-left", Loader=Loader)
Traceback (most recent call last):
...
rv[key] = value
TypeError: unhashable type: 'list'
But, I don't know the key types in advance #
A decent trade-off is to just let the mapping devolve into a list of pairs:
def construct_mapping(self, node):
pairs = self.construct_pairs(node)
try:
return dict(pairs)
except TypeError:
return pairs
>>> yaml.load("""\
... [0, 0]: top-left
... [1, 1]: bottom-right
... """, Loader=Loader)
[([0, 0], 'top-left'), ([1, 1], 'bottom-right')]
>>> yaml.load("{0: 1}: top-left", Loader=Loader)
[({0: 1}, 'top-left')]
But, I need to round-trip the data #
This works, until you need to round-trip the data, or emit this kind of YAML yourself.
Luckily, I've already written a whole article on how to do that, complete with code; the trick is to mark the list of pairs in some way:
>>> value = yaml.load("""\
... [0, 0]: top-left
... [1, 1]: bottom-right
... """, Loader=Loader)
>>> value
Pairs([([0, 0], 'top-left'), ([1, 1], 'bottom-right')])
... so you can represent it back into a mapping:
>>> print(yaml.dump(value, Dumper=Dumper))
? - 0
- 0
: top-left
? - 1
- 1
: bottom-right
(The fancy ?
syntax indicates a
complex mapping key,
but that's just another way of writing the original input.)
That's it for now.
Learned something new today? Share this with others, it really helps!