
Introduction Readyset is designed to serve queries from cached views with sub-millisecond latency. This post focuses on the cold path—cases where a cache miss forces execution against the base tables. In these scenarios, Readyset must evaluate the query from scratch, including materializing intermediate results. The focus here is on straddled joins, where filtering predicates apply to both sides of the join in addition to the ON clause. Example: SELECT u.id, u.name, o.order_id, o.total FROM u

Marcelo Altmann
2025-08-18 · 5 min read
Introduction
Readyset is designed to serve queries from cached views with sub-millisecond latency. This post focuses on the cold path—cases where a cache miss forces execution against the base tables. In these scenarios, Readyset must evaluate the query from scratch, including materializing intermediate results. The focus here is on straddled joins, where filtering predicates apply to both sides of the join in addition to the ON clause.
Example:
SELECT u.id, u.name, o.order_id, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.email = 'some@provide.com'
AND o.status = 'SHIPPED';
In a previous optimization, we transitioned from nested loop joins to hash joins (see https://readyset.io/blog/introducing-hash-join-algorithm), eliminating the need to repeatedly scan one side of the join. This brought significant performance improvements for many workloads, however, some queries remained problematic.
Straddled joins are common in production workloads; for example, filtering users by some property on the left table and orders by a status flag on the right table, then joining on user_id. While our hash join algorithm improved over nested loops, it was still inefficient in these cases, especially when one side’s predicate had low cardinality (like a boolean flag or status column).
This post explains why the old execution strategy struggled, and how our new optimization using Index Condition Pushdown (ICP) changes the execution model to make straddled joins much more efficient.
Previous algorithm
During evaluation of straddled joins on cache misses, we observed significant latency in up-queries. To identify the underlying bottleneck, we profiled query execution and analyzed the resulting flamegraphs
Approximately 30% of the query execution time was attributed to data decompression. We initially suspected the RocksDB compression algorithm as the bottleneck and proceeded to benchmark alternative compression methods to validate this hypothesis.
Switching to ZSTD did not improve performance. Decompression remained a dominant contributor to query execution time. As the next step, we disabled compression in RocksDB entirely to isolate its impact. This came with a space tradeoff but was necessary to confirm whether compression was the root cause:
Disabling compression didn’t eliminate the problem, it merely shifted the bottleneck. The system began spending the majority of execution time on disk reads, as evident from increased I/O activity on ext4.
This made it clear that compression wasn’t the issue; rather, the excessive amount of data being read from disk was the primary cause. Isolated iostat output for the query confirmed this:
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme1n1 10068.00 55728.00 0.00 0.00 0.08 5.54 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.79 81.00
The NVMe device was handling approximately 10K IOPS, with the majority of reads around 5KB in size. Despite their small size, the device reached ~81% utilization while executing a single join query, indicating I/O saturation.
Upon examining the query pattern, we identified that one side of the join had a low-cardinality index, causing the engine to scan nearly the entire table on each execution. Due to internal constraints (such as maintaining key provenance and supporting incremental view maintenance) the join engine was independently evaluating both sides of the join, regardless of selectivity.
Execution example
SELECT u.id, u.name, o.order_id, o.totalFROM users uJOIN orders o ON u.id = o.user_idWHERE u.email = 'some@provide.com' AND o.status = 'SHIPPED';
Scenario:
Old Execution Strategy: Hash Joins
The old algorithm relied on the hash join approach:
Why This Is Inefficient:
New Execution Strategy: Index Condition Pushdown
The new algorithm issues an initial upquery to one side of the join, then combines the resulting join key with the original predicates on the other side. This composite condition is pushed down to RocksDB, allowing index-based retrieval of only the rows required to satisfy the join. This eliminates unnecessary data reads and avoids full-table scans.
Step‑by‑Step:
WHERE o.status = 'SHIPPED'
AND o.user_id = 123;
To quantify the performance improvement brought by the new Index Condition Pushdown strategy, we ran a controlled benchmark comparing the old hash join algorithm against the new execution model on the same hardware and dataset.
Under the previous strategy, Readyset executed straddled joins using independent filtering on both sides, followed by full materialization and hash-based probing. This approach was highly inefficient when one side had low-cardinality filters.
Results:
The system was CPU- and IO-bound, spending excessive time reading and decompressing unnecessary rows from disk; most of which were discarded after join evaluation.
With the ICP-enabled join model, Readyset instead defers right-side lookups until left-side predicates are evaluated, allowing the use of compound indexes (e.g. (user_id, status)) to fetch only relevant rows. This minimizes materialization and disk reads.
Results:
This represents a >450x throughput improvement and >450x latency reduction. The join strategy is now highly cache- and index-efficient, with the storage engine only returning matching rows based on precise key lookups.
While Readyset excels in delivering low-latency results via caching, optimizing the cold path is critical for consistent performance during cache misses. By rethinking how straddled joins are executed and leveraging Index Condition Pushdown, we’ve significantly improved real-world performance for these workloads.
Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.
Revolutionize your database performance with Readyset
Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.
Join our newsletter
Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.