Scaling databases is no small task, and in the rush to find fast, affordable solutions, teams often overlook a key technique: query caching. By intelligently caching database queries, we’re sidestepping many of the typical challenges and costs that come with sharding, complex materialized views and resource-intensive indexes. Yet, it’s surprising how often query caching is dismissed as a solution. Let’s dive into why it might be the most cost-effective way to achieve the scale you’re after. Ho

Gautam Gopinadhan
2024-11-18 · 13 min read
Scaling databases is no small task, and in the rush to find fast, affordable solutions, teams often overlook a key technique: query caching. By intelligently caching database queries, we’re sidestepping many of the typical challenges and costs that come with sharding, complex materialized views and resource-intensive indexes. Yet, it’s surprising how often query caching is dismissed as a solution. Let’s dive into why it might be the most cost-effective way to achieve the scale you’re after.
When a database receives a SELECT query, it follows a structured process to retrieve and format the requested data. Internally, this process involves multiple stages of parsing, planning, and executing, each aimed at making data retrieval as efficient as possible. Here’s an overview:
For large or complex datasets, these steps can be time-consuming. While databases are designed to optimize each phase, the repeated execution of identical queries—each requiring parsing, planning, and data retrieval—leads to considerable redundancy. Each time the same query is executed, the database repeats every step, consuming CPU, memory, and disk I/O resources—even when the data hasn’t changed.
Fig 1. Typical phases when processing a SELECT Query.
In an ideal scenario, a caching solution would store and reuse query results for repeated queries, avoiding this redundancy. However, data is often subject to frequent updates, albeit typically at a lower rate than read requests. This presents a challenge: cache query results efficiently without serving outdated data.
By recognizing this disparity between read frequency and data change frequency, a well-designed caching layer can sidestep redundant query processing, serving cached results when possible and only recalculating when necessary. This is where query caching can offer significant performance gains over the standard approach.
Indexes are the standard go-to for speeding up query performance. Indexes and Analyzing and rewriting queries should be the first stop for improving performance. They allow the database to locate rows faster without scanning the entire table. Still, they may not suffice in many cases:
Yes, databases typically implement a basic level of caching to avoid redundant I/O. They may use strategies like buffer pools or query result caches. However, these caches are often limited to single-node scope and don’t persist across sessions or application layers and most importantly do not provide good isolation. If newer workloads are introduced, they can easily trample on existing caches and cause universal slowdowns because of cache thrashing. They also tend to invalidate quickly, meaning frequent queries for high-demand applications still end up hitting the database repetitively. This approach doesn’t scale well across in highly dynamic, read-intensive applications.
Materialized views offer a way to store the results of a complex query for repeated use, essentially acting as a “snapshot” of data. But there are limitations:
Materialized views can be useful, but their limitations make them challenging for many scaling scenarios where data freshness and low-latency access are critical.
When databases become too large or slow to handle, sharding—splitting data across multiple databases—is a typical strategy. However, it introduces new considerations:
Redis and Memcached are popular for key-value caching outside the database. They’re great solutions for in-memory key-value lookups. Redis and Memcached have been repurposed for caching query results and can introduce their own pains:
One common approach to scaling read-heavy workloads is to set up read replicas—additional database instances that replicate data from the primary database in near-real-time. By distributing read queries across these replicas, the load on the primary database decreases, and response times improve as the workload is shared. However, while read replicas are widely used, they come with notable limitations:
In summary, while read replicas are useful for scaling reads, they are not a one-size-fits-all solution. They add cost and complexity and may not address performance challenges for applications. Query caching provides a more targeted solution by directly reducing the query load on the database without adding the operational and financial burdens that come with multiple replicas.
Readyset’s Smart Query Caching is built on a unique architecture that directly addresses performance and scalability issues inherent in traditional databases, offering a more efficient and cost-effective solution to scale read-heavy workloads. With Readyset, you’re not just adding yet another database replica or restructuring your data for sharding; instead, you’re adopting a system explicitly optimized for high-throughput, low-latency read performance.
Readyset operates as an independent layer that can be set up as a stand-in read replica or as a caching layer that selectively syncs with your primary database via WAL or binlog replication. This means Readyset can handle high-read workloads autonomously, processing and serving frequent queries independently of the primary database. With Readyset, only the most load-intensive queries must be offloaded, achieving response times up to 10x-100x faster than a typical database configuration. Because Readyset is a standalone cache that lives independently from the source database, you also have the option to deploy Readyset closer to the client, thereby working around even speed-of-light delays in database access with geo-located instances.
Fig 2. Readyset deployed as a proxy, intercepting all queries. Queries that are cached are serviced by Readyset, while all other queries are proxied through to the origin database.
Readyset integrates closely with industry-standard Query Routing software like ProxySQL, so existing systems already using ProxySQL can deploy Readyset into production with almost no effort.
Fig 3. Readyset can be deployed with a query routing layer, like ProxySQL, akin to Read Replicas. Only selective queries can be forwarded to Readyset.
From a resource perspective, a single Readyset instance provides the equivalent throughput of multiple read replicas, significantly reducing the need for resource-hungry replicas to handle demand spikes.

Fig 4. Readyset will dramatically cut query latency, often to the sub-millisecond range.
At the core of Readyset’s architecture is Dataflow technology, an advanced approach that enables dynamic data dependency tracking and real-time cache updates:
Readyset’s Partial Materialization offers a more targeted approach to caching than materialized views by caching only the high-demand keys within a query’s result:
Readyset’s unique approach makes it exceptionally cost-effective:
Readyset’s Query Caching is an eventually consistent system similar to Read Replicas and in-memory caches. For applications requiring strict transactional consistency, such as those needing to “read your writes”, Query Caching may not be suitable as the latest writes are not guaranteed to be reflected in the cache. However, Readyset has a built-in mechanism to proxy these reads to the primary database when needed, ensuring the latest data is served in these infrequent cases.
With a focus on read-heavy workloads, Readyset is ideal for applications where high performance, scalability, and cost-efficiency are key. Readyset supports a wide range of SQL constructs, and queries generated by most applications will be supported by Readyset. Some complex SQL operations may not yet be fully supported, and we prioritize adding capabilities based on user feedback to ensure broad compatibility.
For most applications, Readyset delivers robust scalability without the complexities associated with sharding or additional caching infrastructure, making it a highly effective choice for scaling read-intensive workloads.
In the race to scale databases, companies often jump straight to complex, resource-intensive solutions like sharding, heavy indexing, or traditional caching systems, believing these are the only viable paths to handle surging read demands. But as we’ve explored, query caching—and specifically, Readyset’s unique approach with Smart Query Caching—offers an innovative and cost-effective alternative.
By leveraging Dataflow technology and Partial Materialization, Readyset doesn’t just cache data indiscriminately; it allows users to adapt to usage patterns intelligently, selectively caching and updating only what’s necessary. This approach sidesteps many of the typical bottlenecks that slow down traditional databases under high load. More than that, it’s a solution designed for real-world conditions where applications need fresh data, scalability, and low operational overhead.
In environments where high read performance is essential and costs must be contained, Readyset provides a clear path forward. Its architecture brings efficiency without requiring extensive infrastructure changes, integrating seamlessly into existing systems with zero code modifications. For teams looking to maximize database performance without sacrificing budget, Readyset’s query caching can unlock a level of scalability that might otherwise seem out of reach.
By rethinking how we approach caching and scaling, Readyset opens up a new realm of possibilities, making it a tool engineers and businesses should no longer overlook.
Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.
Revolutionize your database performance with Readyset
Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.
Join our newsletter
Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.