Scenario
Say, We have a table named “ARTICLES” that captures daily view counts for various articles from 1st June 2023 to 10th June 2023. Each entry in the table represents a date, article title, and the number of views recorded on that day. You can find the source data and code file in my GitHub Repository,
Here is the sample data,
--sample data from table ARTICLES
SELECT
*
FROM
ARTICLES
LIMIT 10;
The most convenient and straightforward approach to compute both moving average and running total in SQL is by utilising Window Functions. To revisit the concepts of Window Functions and Aggregate Functions, you can read the detailed explanation available here:
Calculating Running Total using Window Function
Continuing to our demo, imagine you need to find out a cumulative sum of the total views on article “Aggregate Functions in SQL” by the end of each day,
--total number for views for "Aggregate Functions in SQL" by end of each day
SELECT
`DATE`,
ARTICLE_TITLE,
NO_OF_VIEWS,
SUM(NO_OF_VIEWS) OVER (ORDER BY `DATE`
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RUNNING_TOTAL
FROM
ARTICLES
WHERE
ARTICLE_TITLE = "Aggregate Functions in SQL";
Here, we have used OVER() clause in the above query, it is essential as it identifies a function as a Window Function and its purpose is to define a specific group of rows (a window) on which the Window Function will perform its calculations. But wait, that’s not all.
In the code above we have also used a FRAME clause as,
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
What is it? Essentially, the Window Function relies on either ROW or RANGE to determine which values should be considered for the calculation within the partition, by specifying the starting and ending points of the selected subset.
So here, the FRAME clause specifies the size of the frame – the value of the current row and the values of all the rows above the current row – on which the SUM(NO_OF_VIEWS) needs to be performed. It keeps adding the value of “NO_OF_VIEWS” as it goes along, giving us a running total for each row ordered by DATE.
In the example above, if we omit the RANGE clause, the result will remain unchanged. Can you guess why? Whenever we use ORDER BY clause in a Window Function, the default frame is, ‘RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW’. However, personally, I find it beneficial to include the RANGE or ROW clause as it enhances clarity and understanding. Inevitably, there will be a point in the future when someone else inherits your code. It is always advisable to write your code in a manner that makes it easier for others to understand and work with it.
You can read more about FRAME clause here,
Now let’s do the same analysis for all the articles – find out a cumulative sum of the total views of each article by the end of each day,
--running total
SELECT
`DATE`,
ARTICLE_TITLE,
NO_OF_VIEWS,
SUM(NO_OF_VIEWS) OVER (PARTITION BY ARTICLE_TITLE
ORDER BY `DATE`
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RUNNING_TOTAL
FROM
ARTICLES;
Here, we divided the data into partitions based on the ARTICLE_TITLE. Then, we performed a SUM(NO_OF_VIEWS) calculation for each partition as defined by the FRAME clause.
For a better understanding, please refer to the image provided below. It illustrates the calculations performed for a single partition. The same logic and calculations are applicable to all other partitions as well.