How can you aggregate data using multiple sheets with the QUERY
function in Google Sheets?
To aggregate data sourced from multiple sheets, create a data set using the set notation {} by referencing each sheet then within the query
statement of the QUERY
function reference columns using ColX
(with X
being the index number of the column, starting at 1).
As the QUERY
function contains three parameters, the first parameter data
can be a range or a data set created using the {}
notation. It’s important to know when creating your custom data set whether the additional ranges from multiple sheets will be added as additional columns or concatenated as additional rows.
Let’s explore in a little more detail what each data type looks like.
Additional Columns Approach
If the sheets contained within your spreadsheet contain data that can be added as additional columns then your data
parameter in the QUERY
function would simply look something like this:
If you are in a locale where the comma is used as the decimal point, you would need to change the separator in the set notation to the backslash character, like so:
An example of how this looks is where, for example, all sheets in column B referenced the amount of sales made in the month for each sales staff person and each sheet represents a month.
Check the Columns
sheet to see how adding additional columns appears as a data set.
To perform an aggregate on this data set, you would just sum each column and would reference this like so in your SELECT
statement:
The result of this formula is seen in the Agg
sheet:
If you wanted to change the name of the aggregate columns, you could add the LABEL
in the query
statement and give more appropriate names for the headers in your result.
Concatenating Rows Approach
The other approach when aggregating data from multiple sheets is to concatenate the data from each sheet to create one long data set. This works well when each sheet has the same column headers, or if it doesn’t that your concatenation will be to common columns.
To do this your data
parameter in the QUERY
function needs to change to the following:
The result of the function above is displayed in the sheet labelled Rows
:
As you can see from the data displayed in the Rows
sheet each data contained within a sheet is appended to the data set created using the set notation.
To perform aggregation you would use an element in each row to group the data by and then apply an aggregate function on the grouped data.
In the Agg
sheet for this example the formula groups by the Employees
and uses the aggregate formula SUM()
to total their amount of sales, this formula looks as follows:
Notice with this formula that the same data is used in the data
parameter of the QUERY
function, but the SELECT
statement uses ColX
notation to reference the actual columns in the data set. The GROUP BY
clause helps to perform the aggregation on each Employee
.
You can see this result in the sheet labelled Agg1
below:
Notice in the Agg1
sheet that we have a little problem: the headers from the other sheets are somewhat being added and therefore used in the aggregate result. How can you exclude this?
There are two ways: amend the ranges in the data set to start from row 2, making the formula like this:
The problem with this approach is that you don’t have any names for the column headers and would mean using labels for the headers, making the query
statement a little longer with:
This produces the result as displayed in the Agg2
sheet:
Another alternative is to use a WHERE
clause on the data set. To remove the header from being inserted into the new data set, you could exclude it by using a WHERE
condition.
One example could be to write a WHERE
condition where the field is not empty:
As you can see from the above code the WHERE Col2 IS NOT NULL
is used to eliminate from the data set where column headers are imported, and this produces the result as seen in the Agg3
sheet below:
Either approach when using the concatenation technique works well, you just need to ensure you know what you’re excluding with the WHERE
clause and how to exclude properly.
Concatenating Rows Doesn’t Work
If you find when creating your new concatenated rows data set and see what appears to be only one sheet visible (the first sheet in your data set notation), be aware that if your data sets span entire columns, the empty rows are also added into the new data set.
For example, the data
set capture of QUERY({{Sheet1!A:B, Sheet2!A:B...
fetches all the data in Sheet1
and specifically all of the content in columns A
& B
including the empty cells! Therefore, when you’re looking at your concatenated data set, you might only see the data from Sheet1
but not the data from Sheet2
.
It is there, you just need to scroll down.
An approach to removing this potential confusion from happening is to add a WHERE
clause like WHERE Col1 IS NOT NULL
. This will help to filter out the empty cells from your new concatenated data set.
Or an alternative is to just remove the rows of empty cells from each sheet (which is what I did with the example above so as to not need extra code in the query
statement).
Summary
Aggregating data from multiple sheets using the QUERY
function is relatively simple and straightforward. Once you know how to create your own data set using the set notation {}
it then becomes a matter of how you would like to cobble your data together: creating new columns for each sheet, or concatenating rows of each sheet.
Each approach will have its own unique method for performing the aggregate functions needed on your data sets. Remember, though, that each approach requires the reference of columns in the query
to use the ColX
syntax, where X
is an index number of the column you wish to use in your query (starting at 1).