PostgreSQL Magic: How to Use generate_series Like a Pro
Imagine you are developing an e-commerce platform and you want to implement analytics functionality. How would you write a query that would return the number of orders per each hour per given day?
As people eagerly await Black Friday sales all year, online sales experience a massive surge during this time. Have you ever wondered how to calculate the hours of peak activity on such a busy day?
In this blog post, I’ll introduce a powerful PostgreSQL feature that makes writing such time-based queries as simple as a walk in the park. Let’s dive in and explore!
Imagine you are developing e-commerce platform and your task is implement analytics functionality that would calculate the peak hours per given period — day, for example.
Given a table — orders with columns: order_id, customer_id, order_date, total_amount, order_status, payment_method, shipping_address.
To calculate the peak hours, we should group by orders by hours. It seems to be easy. Let’s write the query and see the result.
select order_date, count(order_id)
from orders
group by order_date ;
Ooops, that is not what we want. The output lists all the orders grouped by TIMESTAMP. Even if a small difference in milliseconds, the rows will be different. Our primary goal is to group all the orders by hours per day.
Let’s rewrite the query to get the desired output. To do so, we have to extract the hour without milliseconds and group by hours. Seems easy, right?
To extract exact hour from a DATE / TIMESTAMP, we can use date_trunc(text, TIMESTAMP)
functions.
SELECT NOW(), DATE_TRUNC('HOUR', NOW());
As we can see in the example above, we can extract — hour
from date / timestamp.
As we have learnt how to extract hour, lets re-write the initial query to group by hours
.
select date_trunc('hour', order_date) as hour,
count(order_id) as count
from orders
group by hour
order by hour ;
Well, we nearly achieved what we wanted initially. We get orders grouped by hours and we can easily find the peak hours for our analytics.
However, there is a moment to consider. If we look through hours carefully in the output, we find that — 2024–11–20 07:00:00.000000
is missing. There are no orders happened in that period.
But what if we want to list orders for every hour
. In case when no orders are made in that period, we should list 0
.
How can we do that? Well, we should generate all the values — hours and then join that with the orders. To generate a list of values we can use — generate_series
function in Postgres. Let’s learn how it works.
Generate_Series
-- Syntax
generate_series(
start: timestamp,
end: timestamp,
interval: interval
);
Demo:
Let’s generate all the hours per day as a series.
SELECT generate_series(
'2024-11-20 00:00:00'::timestamp,
'2024-11-20 23:59:59'::timestamp,
'1 hour'::interval
);
Using this Postgres Function
, we can generate any series starting from any time ending at any time by specifying interval values. Let’s generate a series of last 7 days.
SELECT generate_series(current_date - interval '7 days',
current_date,
'1 day');
-- start: 7 days ago
-- end: current_date
-- interval: 1 day
To achieve our initial goal — grouping orders by each hour — we need to address the final piece of the puzzle.
To join our series of values (hours)
with the orders
table, we should treat the hours
as a table. To accomplish this, we need to use a CTE
.
CTE
CTE (Common Table Expression) is a temporary result set that you can reference within a SQL SELECT
, INSERT
, UPDATE
, or DELETE
statement. CTEs can be used to simplify complex queries, break them into readable parts, or make them reusable within the same statement.
This is exactly what we need. To turn the generated series into as a result set and join with another table (result set).
-- SYNTAX
WITH cte_name AS (
- Your SQL query here
)
SELECT * FROM cte_name;
Let’s use CTE with our previous query and see the output.
WITH hours as (SELECT generate_series(
'2024-11-20 00:00:00'::timestamp,
'2024-11-20 23:59:59'::timestamp,
'1 hour'::interval
) time)
SELECT *
FROM hours;
Finally, when we have explored how to use CTE
and Generate_Series
, let’s put everything together.
Steps:
Generate Series — all the hours per day
Use
CTE
to turnhours series
into Result SetJoin with
orders
table
WITH hours as (SELECT generate_series(
'2024-11-20 00:00:00'::timestamp,
'2024-11-20 23:59:59'::timestamp,
'1 hour'::interval
) time)
SELECT hours.time, count(order_id)
FROM hours LEFT JOIN orders ON order_date BETWEEN hours.time AND hours.time + interval '1 hour'
GROUP BY hours.time
ORDER BY hours.time
As shown in the image above, the output includes all the hours of the given day and displays the number of orders. This time, we successfully included periods with zero orders as well. Achieving this was our initial goal. We have successfully implemented our analytics functionality, allowing us to identify peak hours and periods of zero activity.
In this post, we explored how to leverage PostgreSQL features — generate_series
and CTEs—to simplify and write efficient queries. What has been your experience with CTEs and generate_series
? Feel free to share your thoughts in the comments!
If you enjoyed this post, don’t forget to like, subscribe, and stay tuned for more in future posts. See you next time!