So you're trying to compute a hash using
and get an exception like this:
>>> hashlib.md5(2) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: object supporting the buffer API required
... or like this:
>>> hashlib.md5('two') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: Unicode-objects must be encoded before hashing
What does it mean?
The first clue are these two bits from the docs:
You can now feed this object with bytes-like objects (normally bytes) using the
Feeding string objects into
update()is not supported, as hashes work on bytes, not on characters.
Now, "object supporting the buffer API required" is a more precise way of saying "the object is not bytes-like". That is, it cannot export a series of bytes through the buffer interface, a way for Python objects to provide access to their underlying binary data.
In the code above, the constructor passes
the initial data to
Why does this happen?
update() refuses to take anything other than bytes because
there are many different ways of converting arbitrary objects to bytes
(and some can't even be meaningfully converted
– for example, file objects or sockets).
Let's look at the initial example, where we're trying to get the hash of an int.
One way of converting an int to bytes is
to get its string representation,
and convert that into bytes;
utf-8 encoding should be acceptable:
>>> x = 2 >>> repr(x) '2' >>> repr(x).encode() b'2'
Alternatively, we can use
to_bytes() to convert it directly;
to do it, we must specify an explicit byte length and order:
>>> x.to_bytes(2, 'big') b'\x00\x02' >>> x.to_bytes(2, 'little') b'\x02\x00' >>> x.to_bytes(4, 'big') b'\x00\x00\x00\x02' >>> x.to_bytes(4, 'little') b'\x02\x00\x00\x00'
struct module allows doing the same thing for C structs
composed of bools, bytes, integers and floats, with varied representations:
>>> struct.pack('>i', x) b'\x00\x00\x00\x02' >>> struct.pack('<i', x) b'\x02\x00\x00\x00' >>> struct.pack('>q', x) b'\x00\x00\x00\x00\x00\x00\x00\x02' >>> struct.pack('<q', x) b'\x02\x00\x00\x00\x00\x00\x00\x00'
As you can see, we get different bytes depending on the method used. Obviously, the hash also differs:
>>> values = [ ... repr(x).encode('utf-8'), ... x.to_bytes(2, 'big'), ... x.to_bytes(2, 'little'), ... x.to_bytes(4, 'big'), ... x.to_bytes(4, 'little'), ... struct.pack('>i', x), ... struct.pack('<i', x), ... ] >>> for value in values: ... print(hashlib.md5(value).hexdigest()) ... c81e728d9d4c2f636f067f89cc14862c 7209a1ce16f85bd1cbd287134ff5cbb6 11870cb56df12527e588f2ef967232e8 f11177d2ec63d995fb4ac628e0d782df f2dd0dedb2c260419ece4a9e03b2e828 f11177d2ec63d995fb4ac628e0d782df f2dd0dedb2c260419ece4a9e03b2e828
In general, you have to pick a standard way of converting things to bytes.
If you only want to hash integers, you can pick one of the methods above.
If you go with
the byte size has to fit the biggest number you expect;
for example, 255 is the biggest number you can express with 1 byte;
you need 2 bytes for 256:
>>> (255).to_bytes(1, 'big') b'\xff' >>> (256).to_bytes(1, 'big') Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: int too big to convert >>> (256).to_bytes(2, 'big') b'\x01\x00'
If you want to hash arbitrary objects, you have to find a standard way of converting them to bytes for each type you need to support, recursively. I've written an article about doing this for (almost) arbitrary objects.
Particularly, note that
repr(...).encode() will only work
if the result of the object's
__repr__ method has all the data you need,
in a predictable order, and nothing that changes between equal objects
(including across processes etc.).
>>> class C: ... def __init__(self, n): ... self.n = n ... def __eq__(self, other): ... if isinstance(other, type(self)): ... return self.n == other.n ... return False ... >>> a = C(2) >>> b = C(2) >>> a == b True >>> repr(a) == repr(b) False >>> repr(a) '<__main__.C object at 0x7f8890132df0>' >>> repr(b) '<__main__.C object at 0x7f88901be580>'
b are equal,
C doesn't define
it inherits the default one from
which just returns the type name and memory address of the object.
That's it for now. If you found this useful, consider sharing it wherever you share things, or drop me a line :)