DynamoDB crash course: part 3 – design patterns
May 2026 ∙ nine minute read ∙
This is the last part of a series covering core DynamoDB concepts. The goal is to help you understand idiomatic usage and trade-offs in under an hour.
In the first part, I summarized DynamoDB's main proposition to its users like so:
data modeling complexity is always preferable to complexity coming from infrastructure maintenance, availability, and scalability
Today, we're looking at the design patterns that help manage this complexity, make the most of DynamoDB's data model and features, and work around its limits.
Contents
Composite keys #
Composite (aka synthetic) keys underpin most other patterns.
The idea is simple: keys don't have to be natural attributes of your data, they can be composed of other attributes that enable specific access patterns. This works both with table and index keys.
How do you compose keys? By string concatenation, of course – I did say low level! (Careful with numbers though, they need padding to be useful in sort keys.)
Example
To sort lexicographically by more than one attribute,
you group them in a sort key, e.g. {Album}#{Song}.
Or, in single table design,
you distinguish between item types
by prefixing keys with the type,
e.g. album#{Album}.
Or, in partition key sharding,
you spread the load on a GSI partition by splitting one partition key
into multiple ones, e.g. {Genre}#{shard}.
But denormalization has its trade-offs.
For sort key {Album}#{Song},
should Album and Song also be separate attributes?
If yes,
you need to ensure they never change,
but you can use them in indexes
(e.g. a GSI with Album as primary key).
If no,
the item cannot become inconsistent,
but you always need to parse the key.
This was inconvenient enough that DynamoDB finally added multi-attribute keys support to GSIs in 2025 (although not inconvenient enough to also add it to tables).
See also
Single table design #
The AWS guidance guidance is to use as few tables as possible:
As a general rule, you should maintain as few tables as possible in a DynamoDB application. [...] A single table with inverted indexes can usually enable simple queries to create and retrieve the complex hierarchical data structures required by your application.
This culminates in single table design, where you put all entities in the same table, and tell them apart based on the key format, usually using a prefix. With this pattern, one DynamoDB table corresponds to a whole relational database.
The easiest way is to put items related to a top-level entity on the same partition. The main benefit is that joins with the top-level entity become trivial. A second one is that you can sometimes get different entity types in a single query, which can be both faster and cheaper (fewer queries; small items pack into fewer capacity units).
Example
You can group items related to an Artist on the same partition,
with sort keys like
artist, album#{Album}, and song#{Album}#{Song}.
# table Music (partition key: Artist, sort key: sk)
Solar Fields: !btree
'album#Leaving Home': { Genre: Electronic }
'artist': { Variations: [ Solarfields ] }
'song#Leaving Home#Air Song': { Duration: 741 }
'song#Leaving Home#Monogram': { Duration: 944 }
Besides getting items of a single type,
you can also get artist details and albums in a single query
(sk BETWEEN "album#" AND "artist").
But choose wisely
– queries can have only one sort key condition,
so you can't also get album details and songs
in a single query with this schema;
sort keys {Album} and {Album}#{Song} would do it,
at the expense of the first query.
Sometimes, it can be useful to put some sub-entities on dedicated partitions, accepting that joins will have to be done in code.
Example
In the example above, a popular artist with lots of songs can lead to:
- throttling due to partition throughput limits
- slow list songs for artist due to sequential paginated queries
Perhaps it's better to put songs in each album on separate partitions:
- partition key
artist#{Artist}, sort keyartistoralbum#{Album} - partition key
song#{Artist}#{Album}, sort key{Song}
# table Music (partition key: pk, sort key: sk)
'artist#Solar Fields': !btree
'album#Leaving Home': { Genre: Electronic }
'artist': { Variations: [ Solarfields ] }
'song#Solar Fields#Leaving Home': !btree
'Air Song': { Duration: 741 }
'Monogram': { Duration: 944 }
This spreads the load onto multiple partitions, which should fix throttling.
The downside is that list songs for artist is now a two-step operation: first one query for the albums, then one query per album for the songs. The upside is that the per-album queries can be done in parallel, which wasn't possible before.
A consequence of this design is that you need a GSI to list items of a specific type (otherwise, you have to do a full table scan). Of note, exceeding the GSI partition throughput limit will cause write throttling on the base table; in the absence of a natural high-cardinality GSI partition key, sharding or some other composite key can help.
A final benefit of using a single table is better utilization with provisioned mode: usage gets averaged across entities and tends to be smoother, and spikes can share the same spare capacity.
See also
GSI overloading #
GSI overloading is just single table design for indexes – you put different values in the GSI key attributes, depending on item type. This way you can index more attributes than the 20 GSIs per table quota, and it can be cheaper too, since fewer indexes make better use of spare provisioned capacity.
Example
For a table that contains both artist and album items, a single GSI can be used for entirely different purposes:
- artist: partition key
artist#{Country}– list artists by country - album: partition key
album#{Genre}– list albums by genre
# table Music (partition key: Artist, sort key: sk)
2 Bit Pie: !btree
'album#2 Pie Island': { gsi1pk: 'album#Electronic' }
'artist': { gsi1pk: 'artist#United Kingdom' }
Ishome: !btree
'album#Confession': { gsi1pk: 'album#Electronic' }
'artist': { gsi1pk: 'artist#Russia' }
# GSI GSI1 (partition key: gsi1pk, sort key: Artist)
'artist#United Kingdom': !btree
2 Bit Pie: { sk: 'artist' }
'artist#Russia': !btree
Ishome: { sk: 'artist' }
'album#Electronic': !btree
2 Bit Pie: { sk: 'album#2 Pie Island' }
Ishome: { sk: 'album#Confession' }
See also
Partition key sharding #
Sometimes, a partition key composed of multiple natural attributes is not enough to spread the load evenly across partitions; you can deal with this by putting items with the same natural attributes on multiple partitions.
So, what partition key should you use? One option is to use a random suffix from a known range; this still allows you to list items for a natural attribute value by doing multiple queries, one for each suffix.
Example
For a table of songs, using Album as the partition key won't work, since not all songs are released on an album; Artist always has a value, but some artists have hundreds or even thousands of songs, which can lead to throttling.
Instead, we can use {Artist}#{randrange(10)} as partition key,
which allows ten times as many items
before we reach throughput limits.
To list an artist's songs:
for shard in range(10):
for item in dynamodb.query(f"{artist}#{shard}"):
yield item
A downside of random suffixes is that you can't get a specific item, because you don't know what its suffix is. A better option is to calculate the suffix from an attribute that you do know, for example using its hash modulo N.
Example
With primary key {Artist}#{hash(Song) % 10)},
we can get a song like this:
def hash(s):
return int.from_bytes(sha256(s.encode()).digest())
shard = hash(song_title) % 10
dynamodb.get_item(f"{artist}#{shard}", song_title)
A lot of times you need to list items by a low-cardinality attribute, so sharding may be even more important for GSIs.
Example
Assuming dedicated album items,
you can list all the albums by putting them
in a single GSI partition key called albums,
but this will definitely cause throttling.
To avoid it,
you can use GSI partition key album#{hash(Album} % 100}
if you don't care about the order,
or something like album#{Album[:2].lower()} if you do
(but likely more sophistication is needed –
th will be a very common album title prefix,
and some album titles don't contain letters at all).
Even if throttling is not an issue (e.g. single infrequent reader), sharding allows you to query multiple partitions in parallel, which can speed up getting the entire result set.
So, how many shards should you have? That depends on the number, size, and how often you access the items, and is also a trade-off – too many shards means additional queries and latency, too few shards means you still overload the partitions sometimes.
Importantly, increasing the number of shards is non-trivial. For tables, you usually need to rebalance the items in place. For indexes, it's cleaner to move to a new index, or if you just need to list items by type, you can put all new items on new shards.
Regardless, you have to support it in code, do a backfill, and orchestrate the migration, which all become more complex if downtime and inconsistencies are not acceptable (e.g. if you expose a pagination token based on LastEvaluatedKey, you may want to support both versions during the switch).
See also
Sparse indexes #
An item with missing index partition/sort key attributes won't appear in the index, and you won't pay for it. This can be used deliberately to query a subset of the items in the table, like those of a specific type or in a specific state.
Example
Assuming dedicated album items,
an alternative way to list all the albums
is to have a GSI with {Album} as partition key,
and just scan the entire index
(the primary key has to be a dedicated attribute
that only albums have,
so that only album items appear in the index).
Or, you can use a dedicated GSI with CoverOf as primary key to list cover songs.
See also
Base table indexes #
In some cases, GSIs won't cut it – maybe you need a strongly consistent index, or need to model a many-to-one relationship (indexes map one item in the base table to one item in the index).
Instead, you can maintain an index in the base table by having additional index items associated with the main item; to guarantee atomic updates, use transactions. You then go from the main item to the index items via a main item attribute, and from the index items to the main item via their partition key.
Example
Songs have different identifiers in external systems, such as ISRC, ISWC, or MBID. To query songs by multiple external ids, you'd structure your database like this:
- song
- partition key
song#{Artist}#{Album} - sort key
{Song} external_{type}:id, ...
- partition key
- external ids
- partition key
external#{type}#{id} - sort key
song#{Artist}#{Album}#{Song}
- partition key
(Alternatively, you could have one sparse index per external id type, but then you lose strong consistency, and risk running out of GSIs).
Note that modeling one-to-many relationships isn't this involved, since it fits neatly into the related-items-same-partition variant of single table design.
See also
- Working with item collections (modeling one-to-many relationships)
- Many-to-many relationships
Optimistic locking #
Optimistic locking is a concurrency control method useful when conflicts are rare, so instead of acquiring a lock to do changes, you check if someone else changed the data right before commiting, as part of an atomic operation.
In DynamoDB, that operation is a conditional write; items get an integer version attribute, and every time you want to update an item, you:
- read the item, including the version
- increment the version and modify the item
- update the item, using a condition expression to ensure the version matches
- if successful, you're done
- else, start over from the beginning
You can do also this in transactions to update groups of related items, like in the base table index pattern above, with only the main item needing a version.
The upside of optimistic locking is that it is faster on average, since updates usually succeed on the first try; for fewer conflicts, use strongly consistent reads.
The downside is that it requires explicit support – it must be possible to start over from the beginning, which complicates logic, especially if you need to interact with other systems besides updating the item (e.g. to send a notification).
See also
- Implementing version control via optimistic locking (Python example)
- Optimistic locking with version number (Java example)
Anyway, that's it for now.
See also
For mode details and examples, check out the official documentation:
- Data modeling
- Data modeling schemas (worked examples)
Learned something new today? Share it with others, it really helps!