Sometimes, we are not really interested in the specific row or column of data. Rather, we are interested in the summation or count of a column of data. This is where “Aggregation Functions” like SUM and COUNT in SQL are used.
Firstly, we have the SUM Function, which allows us to return the sum of a column.
To further illustrate, let’s look at the same world_data dataset.
To query the total gdp of the countries in the dataset, we will write the following query:
The output will be :
Next, we have the COUNT Function, which allows us to count the number of items in a column.
To illustrate, let’s consider the case where we want to find the total number of countries in the dataset.
We will write the following query:
The output will be :
This may seem simple and quite meaningless, however in a practical sense, the use of these aggregation functions plays a crucial role. Take the case of inventory tracking for example. Using the count function allows the user to keep track of how many orders are being made for a specific product, on a daily basis with a query such as :
SELECT product_name, COUNT(orders), date
WHERE product_name = ‘Toy1’
AND date = ‘2018-10-12’
*Think : Can you make the date dynamic? For instance, when we query on a daily basis, we will need to change the date. Can you remove the need for this by editing the code? Google for more SQL functions that can help you achieve this. There are multiple ways to do this though, there is no “one size fits all answer”.
Also, the 2 examples shown in this Topic involves only querying for 1 field (ie columns such as name, population). When querying for multiple columns involving aggregation functions, the “GROUP BY” Function will usually be involved. This will be explained in Topic 4.
In a practical sense, we often do not query only for a specific column. Take the case of a online store. Other than Order Volume, other relevant fields such as Views and Clicks of a product will also be queried to give us a clearer picture of a product’s performance.
An example of a query that is more likely to be used in practice would be :
SELECT Product_Category, SUM(product_clicks), SUM(product_views), SUM(product_orders)
GROUP BY Product_Category
ORDER BY SUM(product_orders) DESC
This query will allow us to analyse the performance of different product categories based on their overall orders,views and clicks.