Readyset Adds New Query Capabilities With the Bucket Function

Readyset is a cache layer that sits between the application and the database, acting as a cache for supported queries and proxying unsupported queries upstream. This means that any query Readyset supports is theoretically supported by the upstream database, but not necessarily the other way around. What if that is no longer the case? What if Readyset can support queries that the upstream database can’t? What if you could keep your relational database and add extra functionality without any plug

Mohamed Abdeen

2025-11-25 · 5 min read

Readyset is a cache layer that sits between the application and the database, acting as a cache for supported queries and proxying unsupported queries upstream. This means that any query Readyset supports is theoretically supported by the upstream database, but not necessarily the other way around.

What if that is no longer the case? What if Readyset can support queries that the upstream database can’t? What if you could keep your relational database and add extra functionality without any plugins or extensions?

Readyset’s streaming Dataflow engine works well for systems that periodically poll for results or require quick updates, such as monitoring and near-real-time time-series workloads. While these queries can technically be served by upstream relational databases, their execution time is often too long to be practical.

A New Query Pattern Enabled by Readyset’s Bucket Function

This month, we introduced a Readyset-specific built-in function, which doesn’t exist in upstream databases, called Bucket.

The Bucket function helps you group time based data into regular intervals, for example by hour, day, or month, so you can easily see trends over time such as daily sales totals or hourly user activity.

It takes two arguments: the first is a timestamp or datetime column, and the second is an interval. The interval is a string that follows the format <positive integer> <unit>[s], where the unit can be year, month, day, hour, minute, or second. You can then use the result in a GROUP BY clause with an aggregate function to summarize your data, similar to how you might analyze events in a time-series database.

SELECT
BUCKET(event_time, '1 hour') AS bucket_hour,
SUM(value)
FROM events
GROUP BY bucket_hour

You can also take it a step further and take advantage of Readyset’s parameterization, like so:

CREATE CACHE ALWAYS FROM
SELECT 
BUCKET(event_time, '30 minutes') AS bucket_30min,
MIN(value)
FROM events
WHERE event_time >= ?
AND event_time < ?
GROUP BY bucket_30min

then query by binding literals to the placeholders/parameters:

CREATE CACHE ALWAYS FROM
SELECT 
BUCKET(event_time, '30 minutes') AS bucket_30min,
MIN(value)
FROM events
WHERE event_time >= '2023-09-01 23:59:59' 
AND event_time <  '2023-09-03 00:00:00'
GROUP BY bucket_30min

Examples and Considerations:

Let’s take a look at the NYC Yellow Taxi Trips dataset for the 4 years from 2020 to 2023, this is roughly 134 million records after some cleaning (like removing any rows after the date 31 Dec 2023). We will only import a few selected columns into a postgresql-15 database table.

CREATE TABLE IF NOT EXISTS nyc_taxi_trips (
    pickup_datetime TIMESTAMP,
    dropoff_datetime TIMESTAMP,
    passenger_count BIGINT,
    trip_distance DOUBLE PRECISION,
    fare_amount DOUBLE PRECISION
);

While we will only use two of these columns, we still want the rows to be reasonably sized for the purpose of this demo. We run this demo on a local machine with 55 free GBs of memory.

Keep in mind that Bucket is not supported in PostgreSQL, so we will try to simulate it using PostgreSQL built-in functions:

SELECT
    to_timestamp(floor(extract(epoch FROM pickup_datetime) / 1800) * 1800) AS bucket_30min,
    MAX(fare_amount)
FROM nyc_taxi_trips
GROUP BY bucket_30min
ORDER BY bucket_30min DESC
LIMIT 10;

Giving us the output:

bucket_30min	max
2023-12-31 23:30:00+00	250
2023-12-31 23:00:00+00	295.6
2023-12-31 22:30:00+00	185
2023-12-31 22:00:00+00	400
2023-12-31 21:30:00+00	1087.3
2023-12-31 21:00:00+00	260
2023-12-31 20:30:00+00	220
2023-12-31 20:00:00+00	155.6
2023-12-31 19:30:00+00	109
2023-12-31 19:00:00+00	282.24

This takes 20.46 seconds to execute on average.

Now, let’s create a cache in Readyset with the new Bucket function and query it:

CREATE CACHE ALWAYS FROM SELECT
    BUCKET(pickup_datetime, '30 minutes') as bucket_30min,
    MAX(fare_amount)
FROM nyc_taxi_trips
GROUP BY bucket_30min
ORDER BY bucket_30min DESC
LIMIT 10;

Giving us the exact output:

bucket_30min	max("public"."nyc_taxi_trips"."fare_amount")
2023-12-31 23:30:00	250
2023-12-31 23:00:00	295.6
2023-12-31 22:30:00	185
2023-12-31 22:00:00	400
2023-12-31 21:30:00	1087.3
2023-12-31 21:00:00	260
2023-12-31 20:30:00	220
2023-12-31 20:00:00	155.6
2023-12-31 19:30:00	109
2023-12-31 19:00:00	282.24

The cache takes only a few minutes to build. Querying it returns results in 5.6 milliseconds, with the cache read itself taking just 0.396 milliseconds! That’s almost 365× faster, with no additional expression nesting.

Let’s also insert twenty thousand rows into the top two buckets and see how fast Readyset can keep the cache updated. We’ll use a simple script to generate data, making sure the generated fare_amount exceeds the previous maximum. After doing this, we see that Readyset replicates the changes almost instantly and still produces results in a familiar 5.6 milliseconds, with the cache read taking 0.468 milliseconds.

bucket_30min	max("public"."nyc_taxi_trips"."fare_amount")
2023-12-31 23:30:00	499.99
2023-12-31 23:00:00	499.91
2023-12-31 22:30:00	185
2023-12-31 22:00:00	400
2023-12-31 21:30:00	1087.3
2023-12-31 21:00:00	260
2023-12-31 20:30:00	220
2023-12-31 20:00:00	155.6
2023-12-31 19:30:00	109
2023-12-31 19:00:00	282.24

When working with this query pattern, keep in mind that it will be fully materialized, meaning all rows are stored in memory. Depending on the size of your dataset and the predicates used, this can increase memory usage during cache creation.

To optimize memory, you can create a filtered view that includes only the most relevant time period or subset of data. Using this view as the input to the cache helps Readyset focus on active data, reducing memory requirements and improving efficiency.

Readyset’s Approach to Time-Series–Like Workloads

We’re not building a full-fledged time-series database, and that’s by design. While this pattern might look familiar if you’ve worked with time-series systems, we’re not aiming to compete with purpose-built solutions that handle both storage and execution.

Instead, our goal with this new pattern is to give you a practical alternative for handling time-series–like workloads without needing to introduce a new database into your infrastructure. If you’ve ever avoided building certain features because they seemed too complex or required a specialized database, Readyset can now help you unlock that functionality using your existing relational database.

You manage the storage. We’ll take care of making the queries fast. Let us know what other time-series functionality you’d like to see in Readyset.

Conclusion

With features like Bucket, Readyset isn’t just speeding up your existing queries—it’s enabling new ones your database can’t easily handle. It’s more than a cache layer; it’s a tool that extends your application’s capabilities. More power, better performance, zero database changes—just plug it in and go.

Want to see Readyset in action?

Book a demo and see how Readyset can accelerate your database.

Introducing rdst: The Client-Side SQL Performance Toolkit

Readyset has always been about making database scaling effortless. Today, we’re extending that mission from the server to the developer’s terminal with rdst, a production-native, AI-powered CLI that helps teams write better queries, diagnose performance issues, and close the loop on caching, all before bad queries ever reach production. The Database Scaling Problem Is a Query Problem Every engineering team eventually hits the same wall: the database can’t keep up. Latency spikes. CPU climbs.

Readyset

2026-04-28·11 min read

Uncategorized

How Readyset Rewrites Your SQL: Inside the Query Transformation Pipeline

Why Query Rewriting Matters for Dataflow Most databases work the same way: a query arrives, the engine builds an execution plan, scans tables, joins rows, filters, aggregates, and returns the result. Every time the query runs, the work repeats from scratch. This pull-based model has served relational databases for decades, but it carries an inherent cost: read latency is proportional to the complexity of the query and the size of the data it touches. Readyset takes a fundamentally different ap

Vassili Zarouba

2026-04-28·17 min read

Uncategorized

Why LLMs Write Incorrect SQL (and What That Means for Your Database)

Most LLM-generated SQL doesn't fail. It runs and returns results, and that's exactly what makes it dangerous. The errors don't surface until they're already in your data. AI-assisted development has made it easier than ever to generate SQL quickly. The problem is that quick and correct are not the same thing, and with databases, the gap between the two has consequences that compound quietly over time. SQL that runs is not the same as SQL that works. LLMs are good at producing SQL that looks ri

Readyset

2026-04-23·5 min read

Still scaling the hard way?

Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.

Revolutionize your database performance with Readyset

Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.

Product

Readyset Cloud Readyset Private Readyset Community QueryPilot rdst CLI Pricing

Resources

Blog Case Studies Company Documentation

Legal

Privacy Policy Terms of Service

Join our newsletter

Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.