DynamoDB crash course: part 3 – design patterns

Previously

This is the last part of a series covering core DynamoDB concepts. The goal is to help you understand idiomatic usage and trade-offs in under an hour.

In the first part, I summarized DynamoDB's main proposition to its users like so:

data modeling complexity is always preferable to complexity coming from infrastructure maintenance, availability, and scalability

Today, we're looking at the design patterns that help manage this complexity, making the most of its data model and features and working around its limits.

Contents

Composite keys
Single table design
GSI overloading
Partition key sharding
Sparse indexes
Base table indexes
Optimistic locking

Composite keys #

Composite (aka synthetic) keys underpin most other patterns.

The idea is simple: keys don't have to be natural attributes of your data, they can be composed of other attributes that enable specific access patterns. This works both with table and index keys.

How do you compose keys? By string concatenation, of course! Careful with numbers though, they need padding to be useful in sort keys.

Example

To sort lexicographically by more than one attribute, you group them in a sort key, e.g. {Album}#{Song}.

Or, in single table design, you distinguish between item types by prefixing keys with the type, e.g. album#{Album}.

Or, in partition key sharding, you spread the load on a GSI partition by splitting one partition key into multiple ones, e.g. {Genre}#{shard}.

But denormalization has its trade-offs. For sort key {Album}#{Song}, should Album and Song also be separate attributes? If yes, you need to ensure they never change, but you can use them in indexes (e.g. a GSI with Album as primary key). If no, items can't become inconsistent, but you need to parse the key to get them.

This was inconvenient enough that DynamoDB finally added multi-attribute keys support to GSIs in 2025 (although not inconvenient enough to also add it to tables).

See also

Single table design #

The AWS guidance is to use as few tables as possible:

As a general rule, you should maintain as few tables as possible in a DynamoDB application. [...] A single table with inverted indexes can usually enable simple queries to create and retrieve the complex hierarchical data structures required by your application.

This culminates in single table design, where you put all entities in the same table, and tell them apart based on the key format, usually using a prefix. With this pattern, one DynamoDB table corresponds to a whole relational database.

The easiest way is to put items related to a top-level entity on the same partition. The main benefit is that joins with the top-level entity become trivial. A second one is that you can sometimes get different entity types in a single query, which can be both faster and cheaper (fewer queries; small items pack into fewer capacity units).

Example

You can group items related to an Artist on the same partition, with sort keys like artist, album#{Album}, and song#{Album}#{Song}.

# table Music (partition key: Artist, sort key: sk)
Solar Fields: !btree
  'album#Leaving Home': { Genre: Electronic }
  'artist': { Variations: [ Solarfields ] }
  'song#Leaving Home#Air Song': { Duration: 741 }
  'song#Leaving Home#Monogram': { Duration: 944 }

Besides getting items of a single type, you can also get artist details and albums in a single query (sk BETWEEN "album#" AND "artist").

But choose wisely – queries can have only one sort key condition, so you can't also get album details and songs in a single query with this schema; sort keys {Album} and {Album}#{Song} would do it, at the expense of the first query.

Sometimes, it can be useful to put some sub-entities on dedicated partitions, accepting that joins will have to be done in code.

Example

In the example above, a popular artist with lots of songs can lead to:

throttling due to partition throughput limits
slow list songs for artist due to sequential paginated queries

Perhaps it's better to put the songs in each album on separate partitions:

partition key artist#{Artist}, sort key artist or album#{Album}
partition key song#{Artist}#{Album}, sort key {Song}

# table Music (partition key: pk, sort key: sk)
'artist#Solar Fields': !btree
  'album#Leaving Home': { Genre: Electronic }
  'artist': { Variations: [ Solarfields ] }
'song#Solar Fields#Leaving Home': !btree
  'Air Song': { Duration: 741 }
  'Monogram': { Duration: 944 }

This spreads the load onto multiple partitions, which should fix throttling.

The downside is that list songs for artist is now a two-step operation: first one query for the albums, then one query per album for the songs. The upside is that the per-album queries can be done in parallel, which wasn't possible before.

A consequence of this design is that you need a GSI to list items of a specific type (otherwise, you have to do a full table scan). Of note, exceeding the GSI partition throughput limit will cause write throttling on the base table; in the absence of a natural high-cardinality GSI partition key, sharding or some other composite key can help.

A final benefit of using a single table is better utilization with provisioned mode: usage gets averaged across entities and tends to be smoother, and spikes can share the same spare capacity.

See also

GSI overloading #

GSI overloading is just single table design for indexes – you put different values in the GSI key attributes, depending on item type. This way you can index more attributes than the 20 GSIs per table quota, and it can be cheaper too, since, like with tables, fewer indexes make better use of spare provisioned capacity.

Example

For a table that contains both artist and album items, a single GSI can be used for entirely different purposes:

artist: partition key artist#{Country} – list artists by country
album: partition key album#{Genre} – list albums by genre

# table Music (partition key: Artist, sort key: sk)
2 Bit Pie: !btree
  'album#2 Pie Island': { gsi1pk: 'album#Electronic' }
  'artist': { gsi1pk: 'artist#United Kingdom' }
Ishome: !btree
  'album#Confession': { gsi1pk: 'album#Electronic' }
  'artist': { gsi1pk: 'artist#Russia' }

# GSI GSI1 (partition key: gsi1pk, sort key: Artist)
'artist#United Kingdom': !btree
  2 Bit Pie: { sk: 'artist' }
'artist#Russia': !btree
  Ishome: { sk: 'artist' }
'album#Electronic': !btree
  2 Bit Pie: { sk: 'album#2 Pie Island' }
  Ishome: { sk: 'album#Confession' }

See also

GSI overloading

Partition key sharding #

Sometimes, a partition key composed of multiple natural attributes is not enough to spread the load evenly across partitions; you can deal with this by putting items with the same natural attributes on multiple partitions.

So, what partition key should you use? One option is to use a random suffix from a known range; this allows you to list items for a natural attribute value by doing multiple queries, one for each suffix.

Example

For a table of songs, using Album as the partition key won't work, since not all songs are released on an album; Artist always has a value, but some artists have hundreds or even thousands of songs, which can lead to throttling.

Instead, we can use {Artist}#{randrange(10)} as partition key, which allows ten times as many items before we reach throughput limits. To list an artist's songs:

for shard in range(10):
    for item in dynamodb.query(f"{artist}#{shard}"):
        yield item

A downside of random suffixes is that you can't get a specific item, because you don't know what its suffix is. A better option is to calculate the suffix from an attribute that you do know, for example using its hash modulo N.

Example

With primary key {Artist}#{hash(Song) % 10)}, we can get a song like this:

def hash(s):
    return int.from_bytes(sha256(s.encode()).digest())

shard = hash(song_title) % 10
dynamodb.get_item(f"{artist}#{shard}", song_title)

A lot of times you need to list items by a low-cardinality attribute, so sharding may be even more important for GSIs.

Example

Assuming dedicated album items, you can list all the albums by putting them in a single GSI partition key called albums, but this will definitely cause throttling.

To avoid it, you can use GSI partition key album#{hash(Album} % 100} if you don't care about the order, or something like album#{Album[:2].lower()} if you do (but likely more sophistication is needed – th will be a very common album title prefix, and some album titles don't contain letters at all).

Even if throttling is not an issue (e.g. single infrequent reader), sharding allows you to query multiple partitions in parallel, which can speed up getting the entire result set.

So, how many shards should you have? That depends on the number, size, and how often you access the items, and is also a trade-off – too many shards means additional queries and latency, too few shards means you still overload the partitions sometimes.

Importantly, increasing the number of shards is non-trivial. For tables, you usually need to rebalance the items in place. For indexes, it's cleaner to move to a new index, or if you just need to list items by type, you can put all new items on new shards.

Regardless, you have to support it in code, do a backfill, and orchestrate the migration, which all become more complex if downtime and inconsistencies are not acceptable (e.g. if you expose a pagination token based on LastEvaluatedKey, you may want to support both versions during the switch).

See also

Sparse indexes #

An item with missing index partition/sort key attributes won't appear in the index, and you won't pay for it. This can be used deliberately to query a subset of the items in the table, like those of a specific type or in a specific state.

Example

Assuming dedicated album items, an alternative way to list all the albums is to have a GSI with {Album} as partition key, and just scan the entire index (the primary key has to be a dedicated attribute that only albums have, so that only album items appear in the index).

Or, you can use a dedicated GSI with CoverOf as primary key to list cover songs.

See also

Sparse indexes

Base table indexes #

In some cases, GSIs won't cut it – maybe you need a strongly consistent index, or need to model a many-to-one relationship (indexes map one item in the base table to one item in the index).

Instead, you can maintain an index in the base table by having additional index items associated with the main item; to guarantee atomic updates, use transactions. You then go from the main item to the index items via a main item attribute, and from the index items to the main item via their partition key.

Example

Songs have different identifiers in external systems, such as ISRC, ISWC, or MBID. To query songs by multiple external ids, you'd structure your database like this:

song
- partition key song#{Artist}#{Album}
- sort key {Song}
- external_{type}: id, ...
external ids
- partition key external#{type}#{id}
- sort key song#{Artist}#{Album}#{Song}

(Alternatively, you could have one sparse index per external id type, but then you lose strong consistency, and risk running out of GSIs).

Note that modeling one-to-many relationships isn't this involved, since it fits neatly into the related-items-same-partition variant of single table design.

See also

Working with item collections (modeling one-to-many relationships)
Many-to-many relationships

Optimistic locking #

Optimistic locking is a concurrency control method useful when conflicts are rare, so instead of acquiring a lock to do changes, you check if someone else changed the data right before commiting, as part of an atomic operation.

In DynamoDB, that operation is a conditional write; items get an integer version attribute, and every time you want to update an item, you:

read the item, including the version
increment the version and modify the item
update the item, using a condition expression to ensure the version matches
1. if successful, you're done
2. else, start over from the beginning

You can also do this in transactions to update groups of related items, like in the base table index pattern above, with only the main item needing a version.

The upside of optimistic locking is that it is faster on average, since updates usually succeed on the first try; for fewer conflicts, use strongly consistent reads.

The downside is that it requires explicit support – it must be possible to start over from the beginning, which complicates logic, especially if you need to interact with other systems besides updating the item (e.g. to send a notification).

See also

Implementing version control via optimistic locking (Python example)
Optimistic locking with version number (Java example)

Anyway, that's it for now.

See also

For mode details and examples, check out the official documentation:

Data modeling
Data modeling schemas (worked examples)

Learned something new today? Share it with others, it really helps! PyCoder's Weekly HN Bluesky linkedin Twitter

This is part of a series: