Introduction
When designing databases, one common decision is how to uniquely identify rows in a table. UUIDs (Universally Unique Identifiers) are a popular option because they ensure global uniqueness, but they can introduce significant performance and storage challenges, especially at scale. In this blog, we’ll break down what UUIDs are, the performance issues they can cause, and look at better alternatives favored by developers globally. We’ll also explore when UUIDs are the right choice for your database.
What are UUIDs?
A UUID (Universally Unique Identifier) is a 128-bit number used to uniquely identify objects or records in computer systems. There are multiple versions of UUIDs, but the most commonly used is UUIDv4, which is generated using randomness. Here’s an example of a UUIDv4:
e942bbe9-afdc-4c62-a438-4efee77954b3
^
In the above example, you can tell it’s UUIDv4 because the digit ‘4’ appears in the 13th position. This is a key identifier for the version of the UUID.
UUIDs are particularly useful when you need globally unique identifiers that can be generated without coordination between systems, making them ideal for distributed applications. However, using UUIDs as database keys, particularly for primary keys, comes with some performance downsides.
Problem 1 — Insert Performance
Most databases use B+ Trees to organize indexes for fast lookups. Every time a record is inserted into a table, the B+ Tree must be rebalanced to keep queries optimized.
With an auto-incrementing integer key, the records are added in sequential order, meaning the database can easily place them in the correct position in the tree. But with UUIDs, which are inherently random, the insertion points are scattered across the B+ Tree, causing frequent rebalancing.
As your database scales to millions of rows, this constant rebalancing causes insert performance to degrade dramatically.
Problem 2 — Higher Storage Costs
UUIDs are larger than traditional auto-incrementing integer keys. Here’s how they compare:
- Auto-incrementing integers: 32 bits per value.
- UUIDs: 128 bits per value.
That’s 4x more storage per row! Additionally, many systems store UUIDs in a human-readable format, which consumes even more space (up to 688 bits per value, or 20x more per row compared to integers).
When considering the overall storage cost, using UUIDs for a table with millions of rows can significantly increase the database size.
Performance Simulation: UUIDs vs. Auto-Incrementing Integers
Consider two tables, one with UUID keys and the other with auto-incrementing integer keys. Each table contains 1 million rows:
- Total table size: The UUID table is approximately 2.3x larger than the integer table.
- ID field size: A single UUID field requires 9.3x more storage than an integer field.
- ID column size: The UUID column is 3.5x larger than the integer column.
These storage differences can significantly impact query performance and the cost of storing large datasets.
Alternatives to UUIDs
While UUIDs are widely used, there are alternatives that offer better performance and efficiency in certain use cases:
1. UUIDv7
- What it is: UUIDv7 is a time-based version of UUID, meaning that the identifiers are generated in an increasing order.
- Why it’s better: Because the UUIDs are time-based, they are ordered, which leads to better indexing performance compared to the randomness of UUIDv4. The structure still provides global uniqueness but without the major insert performance penalties of UUIDv4.
- Best for: Use cases where global uniqueness is needed, but you want to avoid random inserts that slow down indexing.
2. Auto-Incrementing Integers
- What it is: Auto-incrementing integers are sequential numbers generated automatically by the database for each new row.
- Why it’s better: The sequential nature of auto-incrementing integers makes them highly efficient for inserts and indexing. They use minimal storage (32 bits per value) and provide excellent query performance.
- Best for: Small to medium-sized databases or single-system applications where global uniqueness isn’t required.
3. ULID (Universally Unique Lexicographically Sortable Identifier)
- What it is: ULID is a UUID alternative that is also globally unique, but with the added benefit of being lexicographically sortable.
- Why it’s better: ULID uses a timestamp-based prefix followed by randomness. This allows for efficient insertion in sorted order while still providing global uniqueness.
- Best for: Distributed systems where UUIDs are needed but sequential ordering is important for performance.
When to Use UUIDs
Despite the drawbacks, UUIDs are still the best option in some cases, especially when global uniqueness is required. Here are a few scenarios where UUIDs shine:
1. Distributed Systems
In distributed applications where multiple nodes or systems are generating IDs independently, UUIDs ensure global uniqueness without coordination.
2. Merging Data from Multiple Sources
If you have a use case where data from multiple databases or systems needs to be merged, UUIDs prevent collisions between records.
3. External Exposure
UUIDs are harder to guess or predict than simple integers, making them more secure when exposed externally in URLs or APIs.
However, if you’re building a single-system application or one that doesn’t require merging data from different sources, using auto-incrementing integers or a time-based UUID variant like UUIDv7 or ULID is often the better choice.
Conclusion
While UUIDs provide global uniqueness and are essential in distributed systems, they come with significant performance and storage drawbacks when used in databases — particularly with UUIDv4 due to its random nature. Consider alternatives like UUIDv7 or ULID when you need ordering and better insert performance. For simpler applications, auto-incrementing integers remain a tried-and-true choice.
Ultimately, the decision to use UUIDs depends on your system’s needs. If global uniqueness is crucial, UUIDs (especially UUIDv7) are hard to beat. But if performance and storage are your priorities, consider other options that fit your use case.