
Readyset is a cache layer that sits between the application and the database, acting as a cache for supported queries and proxying unsupported queries upstream. This means that any query Readyset supports is theoretically supported by the upstream database, but not necessarily the other way around. What if that is no longer the case? What if Readyset can support queries that the upstream database can’t? What if you could keep your relational database and add extra functionality without any plug

Mohamed Abdeen
2025-11-25 · 5 min read
Readyset is a cache layer that sits between the application and the database, acting as a cache for supported queries and proxying unsupported queries upstream. This means that any query Readyset supports is theoretically supported by the upstream database, but not necessarily the other way around.
What if that is no longer the case? What if Readyset can support queries that the upstream database can’t? What if you could keep your relational database and add extra functionality without any plugins or extensions?
Readyset’s streaming Dataflow engine works well for systems that periodically poll for results or require quick updates, such as monitoring and near-real-time time-series workloads. While these queries can technically be served by upstream relational databases, their execution time is often too long to be practical.
This month, we introduced a Readyset-specific built-in function, which doesn’t exist in upstream databases, called Bucket.
The Bucket function helps you group time based data into regular intervals, for example by hour, day, or month, so you can easily see trends over time such as daily sales totals or hourly user activity.
It takes two arguments: the first is a timestamp or datetime column, and the second is an interval. The interval is a string that follows the format <positive integer> <unit>[s], where the unit can be year, month, day, hour, minute, or second. You can then use the result in a GROUP BY clause with an aggregate function to summarize your data, similar to how you might analyze events in a time-series database.
SELECT
BUCKET(event_time, '1 hour') AS bucket_hour,
SUM(value)
FROM events
GROUP BY bucket_hour
You can also take it a step further and take advantage of Readyset’s parameterization, like so:
CREATE CACHE ALWAYS FROM
SELECT
BUCKET(event_time, '30 minutes') AS bucket_30min,
MIN(value)
FROM events
WHERE event_time >= ?
AND event_time < ?
GROUP BY bucket_30min
then query by binding literals to the placeholders/parameters:
CREATE CACHE ALWAYS FROM
SELECT
BUCKET(event_time, '30 minutes') AS bucket_30min,
MIN(value)
FROM events
WHERE event_time >= '2023-09-01 23:59:59'
AND event_time < '2023-09-03 00:00:00'
GROUP BY bucket_30min
Let’s take a look at the NYC Yellow Taxi Trips dataset for the 4 years from 2020 to 2023, this is roughly 134 million records after some cleaning (like removing any rows after the date 31 Dec 2023). We will only import a few selected columns into a postgresql-15 database table.
CREATE TABLE IF NOT EXISTS nyc_taxi_trips (
pickup_datetime TIMESTAMP,
dropoff_datetime TIMESTAMP,
passenger_count BIGINT,
trip_distance DOUBLE PRECISION,
fare_amount DOUBLE PRECISION
);
While we will only use two of these columns, we still want the rows to be reasonably sized for the purpose of this demo. We run this demo on a local machine with 55 free GBs of memory.
Keep in mind that Bucket is not supported in PostgreSQL, so we will try to simulate it using PostgreSQL built-in functions:
SELECT
to_timestamp(floor(extract(epoch FROM pickup_datetime) / 1800) * 1800) AS bucket_30min,
MAX(fare_amount)
FROM nyc_taxi_trips
GROUP BY bucket_30min
ORDER BY bucket_30min DESC
LIMIT 10;
Giving us the output:
| bucket_30min | max |
|---|---|
| 2023-12-31 23:30:00+00 | 250 |
| 2023-12-31 23:00:00+00 | 295.6 |
| 2023-12-31 22:30:00+00 | 185 |
| 2023-12-31 22:00:00+00 | 400 |
| 2023-12-31 21:30:00+00 | 1087.3 |
| 2023-12-31 21:00:00+00 | 260 |
| 2023-12-31 20:30:00+00 | 220 |
| 2023-12-31 20:00:00+00 | 155.6 |
| 2023-12-31 19:30:00+00 | 109 |
| 2023-12-31 19:00:00+00 | 282.24 |
This takes 20.46 seconds to execute on average.
Now, let’s create a cache in Readyset with the new Bucket function and query it:
CREATE CACHE ALWAYS FROM SELECT
BUCKET(pickup_datetime, '30 minutes') as bucket_30min,
MAX(fare_amount)
FROM nyc_taxi_trips
GROUP BY bucket_30min
ORDER BY bucket_30min DESC
LIMIT 10;
Giving us the exact output:
| bucket_30min | max("public"."nyc_taxi_trips"."fare_amount") |
|---|---|
| 2023-12-31 23:30:00 | 250 |
| 2023-12-31 23:00:00 | 295.6 |
| 2023-12-31 22:30:00 | 185 |
| 2023-12-31 22:00:00 | 400 |
| 2023-12-31 21:30:00 | 1087.3 |
| 2023-12-31 21:00:00 | 260 |
| 2023-12-31 20:30:00 | 220 |
| 2023-12-31 20:00:00 | 155.6 |
| 2023-12-31 19:30:00 | 109 |
| 2023-12-31 19:00:00 | 282.24 |
The cache takes only a few minutes to build. Querying it returns results in 5.6 milliseconds, with the cache read itself taking just 0.396 milliseconds! That’s almost 365× faster, with no additional expression nesting.
Let’s also insert twenty thousand rows into the top two buckets and see how fast Readyset can keep the cache updated. We’ll use a simple script to generate data, making sure the generated fare_amount exceeds the previous maximum. After doing this, we see that Readyset replicates the changes almost instantly and still produces results in a familiar 5.6 milliseconds, with the cache read taking 0.468 milliseconds.
| bucket_30min | max("public"."nyc_taxi_trips"."fare_amount") |
|---|---|
| 2023-12-31 23:30:00 | 499.99 |
| 2023-12-31 23:00:00 | 499.91 |
| 2023-12-31 22:30:00 | 185 |
| 2023-12-31 22:00:00 | 400 |
| 2023-12-31 21:30:00 | 1087.3 |
| 2023-12-31 21:00:00 | 260 |
| 2023-12-31 20:30:00 | 220 |
| 2023-12-31 20:00:00 | 155.6 |
| 2023-12-31 19:30:00 | 109 |
| 2023-12-31 19:00:00 | 282.24 |
When working with this query pattern, keep in mind that it will be fully materialized, meaning all rows are stored in memory. Depending on the size of your dataset and the predicates used, this can increase memory usage during cache creation.
To optimize memory, you can create a filtered view that includes only the most relevant time period or subset of data. Using this view as the input to the cache helps Readyset focus on active data, reducing memory requirements and improving efficiency.
We’re not building a full-fledged time-series database, and that’s by design. While this pattern might look familiar if you’ve worked with time-series systems, we’re not aiming to compete with purpose-built solutions that handle both storage and execution.
Instead, our goal with this new pattern is to give you a practical alternative for handling time-series–like workloads without needing to introduce a new database into your infrastructure. If you’ve ever avoided building certain features because they seemed too complex or required a specialized database, Readyset can now help you unlock that functionality using your existing relational database.
You manage the storage. We’ll take care of making the queries fast. Let us know what other time-series functionality you’d like to see in Readyset.
With features like Bucket, Readyset isn’t just speeding up your existing queries—it’s enabling new ones your database can’t easily handle. It’s more than a cache layer; it’s a tool that extends your application’s capabilities. More power, better performance, zero database changes—just plug it in and go.
Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.
Revolutionize your database performance with Readyset
Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.
Join our newsletter
Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.