How can you aggregate data using multiple sheets with the
QUERY
function in Google Sheets?
To aggregate data sourced from multiple sheets, create a data set using the set notation {} by referencing each sheet then within the
query
statement of the
QUERY
function reference columns using
ColX
(with
X
being the index number of the column, starting at 1).
As the
QUERY
function contains three parameters, the first parameter
data
can be a range or a data set created using the
{}
notation. It’s important to know when creating your custom data set whether the additional ranges from multiple sheets will be added as
additional columns
or concatenated as
additional rows
.
Let’s explore in a little more detail what each data type looks like.
Additional Columns Approach
If the sheets contained within your spreadsheet contain data that can be added as additional columns then your
data
parameter in the
QUERY
function would simply look something like this:
=QUERY({Sheet1!B:B, Sheet2!B:B, Sheet3!B:B}, "SELECT *")
If you are in a locale where the comma is used as the decimal point, you would need to change the separator in the set notation to the backslash character, like so:
=QUERY({Sheet1!B:B \ Sheet2!B:B \ Sheet3!B:B}, "SELECT *")
An example of how this looks is where, for example, all sheets in column B referenced the amount of sales made in the month for each sales staff person and each sheet represents a month.
Check the
Columns
sheet to see how adding additional columns appears as a data set.
To perform an aggregate on this data set, you would just sum each column and would reference this like so in your
SELECT
statement:
=QUERY({Sheet1!B:B, Sheet2!B:B, Sheet3!B:B}, "SELECT SUM(Col1), SUM(Col2), SUM(Col3)")
The result of this formula is seen in the
Agg
sheet:
If you wanted to
change the name of the aggregate columns
, you could add the
LABEL
in the
query
statement and give more appropriate names for the headers in your result.
Concatenating Rows Approach
The other approach when aggregating data from multiple sheets is to concatenate the data from each sheet to create one long data set. This works well when each sheet has the same column headers , or if it doesn’t that your concatenation will be to common columns.
To do this your
data
parameter in the
QUERY
function needs to change to the following:
=QUERY({{Sheet1!A:B};{Sheet2!A:B};{Sheet3!A:B}}, "SELECT *")
The result of the function above is displayed in the sheet labelled
Rows
:
As you can see from the data displayed in the
Rows
sheet each data contained within a sheet is appended to the data set created using the set notation.
To perform aggregation you would use an element in each row to group the data by and then apply an aggregate function on the grouped data.
In the
Agg
sheet for this example the formula groups by the
Employees
and uses the aggregate formula
SUM()
to total their amount of sales, this formula looks as follows:
=QUERY({{Sheet1!A:B};{Sheet2!A:B};{Sheet3!A:B}},"SELECT Col1, SUM(Col2) GROUP BY Col1")
Notice with this formula that the same data is used in the
data
parameter of the
QUERY
function, but the
SELECT
statement uses
ColX
notation to reference the actual columns in the data set. The
GROUP BY
clause helps to perform the aggregation on each
Employee
.
You can see this result in the sheet labelled
Agg1
below:
Notice in the
Agg1
sheet that we have a little problem: the headers from the other sheets are somewhat being added and therefore used in the aggregate result. How can you exclude this?
There are two ways: amend the ranges in the data set to start from row 2, making the formula like this:
=QUERY({{Sheet1!A2:B};{Sheet2!A2:B};{Sheet3!A2:B}},"SELECT Col1, SUM(Col2) GROUP BY Col1")
The problem with this approach is that you don’t have any names for the column headers and would mean using
labels for the headers
, making the
query
statement a little longer with:
=QUERY({{Sheet1!A2:B};{Sheet2!A2:B};{Sheet3!A2:B}},"SELECT Col1, SUM(Col2) GROUP BY Col1 LABEL Col1 'Employee', SUM(Col2) 'Total Sales'")
This produces the result as displayed in the
Agg2
sheet:
Another alternative is to use a
WHERE
clause on the data set
. To remove the header from being inserted into the new data set, you could exclude it by using a
WHERE
condition.
One example could be to write a
WHERE
condition where the field is not empty:
=QUERY({{Sheet1!A:B};{Sheet2!A:B};{Sheet3!A:B}},"SELECT Col1, SUM(Col2) WHERE Col2 IS NOT NULL GROUP BY Col1 LABEL Col1 'Employees', SUM(Col2) 'Total Sales'")
As you can see from the above code the
WHERE Col2 IS NOT NULL
is used to eliminate from the data set where column headers are imported, and this produces the result as seen in the
Agg3
sheet below:
Either approach when using the concatenation technique works well, you just need to ensure you know what you’re excluding with the
WHERE
clause and how to exclude properly.
Concatenating Rows Doesn’t Work
If you find when creating your new concatenated rows data set and see what appears to be only one sheet visible (the first sheet in your data set notation), be aware that if your data sets span entire columns, the empty rows are also added into the new data set.
For example, the
data
set capture of
QUERY({{Sheet1!A:B, Sheet2!A:B...
fetches all the data in
Sheet1
and specifically all of the content in columns
A
&
B
including the empty cells
! Therefore, when you’re looking at your concatenated data set, you might only see the data from
Sheet1
but not the data from
Sheet2
.
It is there, you just need to scroll down.
An approach to removing this potential confusion from happening is to add a
WHERE
clause like
WHERE Col1 IS NOT NULL
. This will help to filter out the empty cells from your new concatenated data set.
Or an alternative is to just remove the rows of empty cells from each sheet (which is what I did with the example above so as to not need extra code in the
query
statement).
Summary
Aggregating data from multiple sheets using the
QUERY
function is relatively simple and straightforward. Once you know how to create your own data set using the set notation
{}
it then becomes a matter of
how
you would like to cobble your data together: creating new columns for each sheet, or concatenating rows of each sheet.
Each approach will have its own unique method for performing the aggregate functions needed on your data sets. Remember, though, that each approach requires the reference of columns in the
query
to use the
ColX
syntax, where
X
is an index number of the column you wish to use in your query (starting at 1).