Optimizing Straddled Joins in Readyset: From Hash Joins to Index Condition Pushdown

Introduction Readyset is designed to serve queries from cached views with sub-millisecond latency. This post focuses on the cold path—cases where a cache miss forces execution against the base tables. In these scenarios, Readyset must evaluate the query from scratch, including materializing intermediate results. The focus here is on straddled joins, where filtering predicates apply to both sides of the join in addition to the ON clause. Example: SELECT u.id, u.name, o.order_id, o.total FROM u

Marcelo Altmann

2025-08-18 · 5 min read

Introduction

Readyset is designed to serve queries from cached views with sub-millisecond latency. This post focuses on the cold path—cases where a cache miss forces execution against the base tables. In these scenarios, Readyset must evaluate the query from scratch, including materializing intermediate results. The focus here is on straddled joins, where filtering predicates apply to both sides of the join in addition to the ON clause.

Example:

SELECT u.id, u.name, o.order_id, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.email = 'some@provide.com'
 AND o.status = 'SHIPPED';

In a previous optimization, we transitioned from nested loop joins to hash joins (see https://readyset.io/blog/introducing-hash-join-algorithm), eliminating the need to repeatedly scan one side of the join. This brought significant performance improvements for many workloads, however, some queries remained problematic.

Straddled joins are common in production workloads; for example, filtering users by some property on the left table and orders by a status flag on the right table, then joining on user_id. While our hash join algorithm improved over nested loops, it was still inefficient in these cases, especially when one side’s predicate had low cardinality (like a boolean flag or status column).

This post explains why the old execution strategy struggled, and how our new optimization using Index Condition Pushdown (ICP) changes the execution model to make straddled joins much more efficient.

Previous algorithm

During evaluation of straddled joins on cache misses, we observed significant latency in up-queries. To identify the underlying bottleneck, we profiled query execution and analyzed the resulting flamegraphs

Approximately 30% of the query execution time was attributed to data decompression. We initially suspected the RocksDB compression algorithm as the bottleneck and proceeded to benchmark alternative compression methods to validate this hypothesis.

Switching to ZSTD did not improve performance. Decompression remained a dominant contributor to query execution time. As the next step, we disabled compression in RocksDB entirely to isolate its impact. This came with a space tradeoff but was necessary to confirm whether compression was the root cause:

Disabling compression didn’t eliminate the problem, it merely shifted the bottleneck. The system began spending the majority of execution time on disk reads, as evident from increased I/O activity on ext4.

This made it clear that compression wasn’t the issue; rather, the excessive amount of data being read from disk was the primary cause. Isolated iostat output for the query confirmed this:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme1n1       10068.00  55728.00     0.00   0.00    0.08     5.54    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.79  81.00

The NVMe device was handling approximately 10K IOPS, with the majority of reads around 5KB in size. Despite their small size, the device reached ~81% utilization while executing a single join query, indicating I/O saturation.

Upon examining the query pattern, we identified that one side of the join had a low-cardinality index, causing the engine to scan nearly the entire table on each execution. Due to internal constraints (such as maintaining key provenance and supporting incremental view maintenance) the join engine was independently evaluating both sides of the join, regardless of selectivity.

Execution example

SELECT u.id, u.name, o.order_id, o.totalFROM users uJOIN orders o ON u.id = o.user_idWHERE u.email = 'some@provide.com' AND o.status = 'SHIPPED';

Scenario:

Users table: The predicate u.email = 'some@provide.com' is highly selective, returning exactly one row.
Orders table: The predicate o.status = 'SHIPPED' is very low selectivity — ~99% of all rows match.

Old Execution Strategy: Hash Joins

The old algorithm relied on the hash join approach:

Lookup both sides independently:
1. users: filter by email → 1 row.
2. orders: filter by status → ~99% of the table.
Materialize both result sets:
1. Millions of orders rows, plus 1 users row.
Build and probe a hash table:
1. Hash on o.user_id.
2. Probe with the user row.
3. Discard nearly all rows after the join.

Why This Is Inefficient:

High I/O: We had to read essentially the entire orders table.
High memory usage: Millions of rows materialized only to discard them.
Wasteful CPU work: The join considered far more rows than necessary.

New Execution Strategy: Index Condition Pushdown

The new algorithm issues an initial upquery to one side of the join, then combines the resulting join key with the original predicates on the other side. This composite condition is pushed down to RocksDB, allowing index-based retrieval of only the rows required to satisfy the join. This eliminates unnecessary data reads and avoids full-table scans.

Step‑by‑Step:

Apply the left predicate first: u.email = 'some@provide.com' → 1 row (u.id = 123).
Group rows by join key: collect distinct values {123}.
For each join key, build a right‑side lookup:

WHERE o.status = 'SHIPPED'
AND o.user_id = 123;

Leverage storage engine indexes: with an index on (user_id, status), fetch only matching rows.
Build the result set incrementally: combine left row(s) with right rows per join key, no full materialization needed.

Benchmark: Measuring the Impact

To quantify the performance improvement brought by the new Index Condition Pushdown strategy, we ran a controlled benchmark comparing the old hash join algorithm against the new execution model on the same hardware and dataset.

Old Algorithm: Hash Join

Under the previous strategy, Readyset executed straddled joins using independent filtering on both sides, followed by full materialization and hash-based probing. This approach was highly inefficient when one side had low-cardinality filters.

Results:

Throughput: 7.0 events/s
Latency (avg): 2,284 ms
95th percentile latency: 4,129 ms
Max latency: 6,831 ms
Total events processed: 4,203 over 10 minutes

The system was CPU- and IO-bound, spending excessive time reading and decompressing unnecessary rows from disk; most of which were discarded after join evaluation.

New Algorithm: Index Condition Pushdown

With the ICP-enabled join model, Readyset instead defers right-side lookups until left-side predicates are evaluated, allowing the use of compound indexes (e.g. (user_id, status)) to fetch only relevant rows. This minimizes materialization and disk reads.

Results:

Throughput: 3,214.4 events/s
Latency (avg): 4.98 ms
95th percentile latency: 11.87 ms
Max latency: 3,467.2 ms
Total events processed: 1,928,645 over 10 minutes

This represents a >450x throughput improvement and >450x latency reduction. The join strategy is now highly cache- and index-efficient, with the storage engine only returning matching rows based on precise key lookups.

Conclusion

While Readyset excels in delivering low-latency results via caching, optimizing the cold path is critical for consistent performance during cache misses. By rethinking how straddled joins are executed and leveraging Index Condition Pushdown, we’ve significantly improved real-world performance for these workloads.

Want to see Readyset in action?

Book a demo and see how Readyset can accelerate your database.

Introducing rdst: The Client-Side SQL Performance Toolkit

Readyset has always been about making database scaling effortless. Today, we’re extending that mission from the server to the developer’s terminal with rdst, a production-native, AI-powered CLI that helps teams write better queries, diagnose performance issues, and close the loop on caching, all before bad queries ever reach production. The Database Scaling Problem Is a Query Problem Every engineering team eventually hits the same wall: the database can’t keep up. Latency spikes. CPU climbs.

Readyset

2026-04-28·11 min read

Uncategorized

How Readyset Rewrites Your SQL: Inside the Query Transformation Pipeline

Why Query Rewriting Matters for Dataflow Most databases work the same way: a query arrives, the engine builds an execution plan, scans tables, joins rows, filters, aggregates, and returns the result. Every time the query runs, the work repeats from scratch. This pull-based model has served relational databases for decades, but it carries an inherent cost: read latency is proportional to the complexity of the query and the size of the data it touches. Readyset takes a fundamentally different ap

Vassili Zarouba

2026-04-28·17 min read

Uncategorized

Why LLMs Write Incorrect SQL (and What That Means for Your Database)

Most LLM-generated SQL doesn't fail. It runs and returns results, and that's exactly what makes it dangerous. The errors don't surface until they're already in your data. AI-assisted development has made it easier than ever to generate SQL quickly. The problem is that quick and correct are not the same thing, and with databases, the gap between the two has consequences that compound quietly over time. SQL that runs is not the same as SQL that works. LLMs are good at producing SQL that looks ri

Readyset

2026-04-23·5 min read

Still scaling the hard way?

Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.

Revolutionize your database performance with Readyset

Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.

Product

Readyset Cloud Readyset Private Readyset Community QueryPilot rdst CLI Pricing

Resources

Blog Case Studies Company Documentation

Legal

Privacy Policy Terms of Service

Join our newsletter

Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.