Categories
Cloud Apps

4 Simple Steps to Merge 2 Columns Into 1 With QUERY Function

If you have two or more columns of data and you want to merge these columns into one column then using the Google Sheets QUERY formula may be one way you can achieve this.

Here’s an in depth explanation of how to achieve this:

Step 1 – Prep your data

If you data doesn’t contain any spaces then you’re good to go, if though your data does contain spaces then you will need to define a character at the end of each cell to get this to work.

Step 2 – Understanding the QUERY Function

Let’s have two simple data sets, one which does not contain spaces in its data set, and another which does:

Simple data set with QUERY no header
Simple data set showing QUERY function with no header
Data set with spaces & QUERY with no header
Simple data set containing spaces showing QUERY function with no header

With both data sets we have highlighted in grey the returned values from the QUERY function.

QUERY parameters

The QUERY function is a powerful formula in Google Sheets helps to treat data contained in tabular format like a database table which we can perform basic SQL queries on.

This function contains three parameters:

  1. The data set.
  2. The SELECT command.
  3. The number of headers in your data set.

It’s the third parameter of the QUERY function that will help us in being able to concatenate the data within each column.

At the moment, from above, we have set the third parameter to 0 which means our data set does not return any headers.

But what if I want to define the first row as being the header?

Then we would simply change this third parameter in the QUERY function to 1 – representing the first row. When we perform this change to our formula we see no change to our data, however, the returned values have now defined that first row as a header area.

Define header row in QUERY function
Defining the first row as header

QUERY Header Parameter

What if we were to increase the number represented in our QUERY function? How is the result set returned if we define the header parameter as 2?

Here’s how it looks for our data sets:

Data set QUERY with 2 header rows
Query function with 2 header rows defined

Did you notice what the QUERY function does when we define more than one header row?

It concatenates the header rows and inserts a space between each appended element. Therefore, the first row containing A and the second row containing B produce the header A B.

(Can you foresee the issues we’re going to run into when we merge our columnar data which contains spaces already?)

Data set with spaces 2 header rows
Query function with 2 header rows defined

So if we wanted to get all the elements within our column wrapped into one row we would just need to change that third parameter to the maximum height of our columns. We could use the ROWS() function to capture this number:

Use QUERY Function to Merge Column Into 1 Row
Capture all rows into the header

Step 3 – JOIN columns

As the columns have been merged into one header row, we now want to stitch these column headers together.

This is achieved by wrapping the QUERY function in a JOIN function, like so:

JOIN QUERY data set header rows
JOIN query header rows together to combine into one cell

Notice here we used the first parameter of the JOIN function as a space to mimic the behaviour of what the QUERY data did to our data too.

Do be aware the JOIN function does have a limit of 50,000 characters. If you have a lot of data to concatenate you might want to break it up to smaller data sets.

What if our data contains spaces?

As you can see this strategy isn’t going to work too well as our result now contains too many spaces between data cells, and looks as follows:

JOIN data from QUERY for data containing spaces
The problem with this strategy is data containing spaces

Therefore, we’re going to need to improvise on the spaces in our current data set.

What we’re going to do is to modify our original data by substituting spaces for an oblique character that isn’t used in our data. For example, my current data set with names isn’t likely to have a tilde character ~.

Therefore, what we want to be able to do is to change our imported data set in the QUERY function so that we are operating on data that doesn’t contain any spaces – we want to be able to reserve these spaces for the explicit purpose of being able to split our data back into cells again.

One such way I achieved this was by changing our original formula:

QUERY(A1:B3,,ROWS(A:A))

To this:

QUERY(ARRAYFORMULA(SUBSTITUTE(A1:B3," ","~")),,ROWS(A:A))

What has happened with our first QUERY parameter?

Instead of referencing the original data set we modify the data set to substitute spaces for our oblique tilde ~ character.

The ARRAYFORMULA provides us the functionality of being able to operate on each cell within the range, and all we want to do is apply the SUBSTITUTE formula to each cell – if it has a space character replace it with the tilde character.

The result of our change should produce something like this:

Query ArrayFormula Substitute Spaces
Modify each cell in our data set by using ARRAYFORMULA and SUBSTITUTE

We’ve now achieved our purpose of reserving the space characters for another specific purpose which we’ll see next.

Step 4 – Transform back to one column

Our data is in one cell, split by the space character.

To stretch this back out into one single column all we need to do is SPLIT the data by the special space character and then TRANSPOSE it to leave the data as a column, otherwise it will remain in row fashion if we don’t.

Here’s what happens to our first data set when we SPLIT first:

=SPLIT(JOIN(" ", QUERY(A1:B4,,ROWS(A:A)))," ")
Split data after JOIN and QUERY
SPLIT data after JOIN & QUERY functions

Then our concluding formula for the first data set:

=transpose(split(join(" ", query(A1:B4,,rows(A:A)))," "))
Transpose data after split
TRANSPOSE data after SPLIT, JOIN & QUERY

What about our data set with spaces?

A little more work is going to be required with our original data set which contains spaces.

If we copy the above formulas into our working “space” formula here’s the result:

=TRANSPOSE(SPLIT(JOIN(" ",QUERY(ARRAYFORMULA(SUBSITUTE(A1:B3," ","~")),,ROWS(A:A)))," "))
Transpose data after split
How our ‘spaced’ data looks applying the same TRANSPOSE and SPLIT formulas

As you can see we need to remove at the end each of those tilde characters.

How do you think we could do this?

If you were thinking ARRAYFORMULA and SUBSTITUTE again – well done! You’re acquiring a new skill today.

Here’s our concluding formula:

=ARRAYFORMULA(SUBSTITUTE(TRANSPOSE(SPLIT(JOIN(" ",QUERY(ARRAYFORMULA(SUBSTITUTE(A1:B3," ","~")),,ROWS(A:A)))," ")),"~"," "))
ArrayFormula Substitute, Split, Join & Query Functions
Final formula for concatenating multiple columns into one using QUERY function.

Conclusion

So there you have it!

Within this article you’ve been able to learn the following function skills when using Google Sheets:

  • What the third parameter of the QUERY formula does.
  • How we can modify each cell within a range by using ARRAYFORMULA – specifically using the SUBSTITUTE formula.
  • How to handle different data sets with the QUERY function, especially if they contain space characters.
  • How to JOIN our QUERY data to form a single cell of data (careful of the 50,000 character limit).
  • How to SPLIT our data set back into cells.
  • How to TRANSPOSE our data set back into columns.

Well done if you’ve been able to follow along and understand these complex topics.

Categories
Cloud Apps

Easily Merge Multiple Columns Into 1 with Query & ArrayFormula

In today’s post I’d like to explain how I used the following formula to help merge multiple columned data into a different columned arrangement as the purpose of the result was to export the original data set into a CSV file for import into NetSuite.

Here’s the formula we will dissect:

=ARRAYFORMULA(SPLIT(ARRAYFORMULA(SUBSTITUTE(TRANSPOSE(SPLIT(JOIN(" ",QUERY(DataConcat!A:L,,ROWS(DataConcat!A:A)))," ")),"~"," ")),";",TRUE,FALSE))

The problem

I had exported the following salary data from our budgeting software which contained how much each staff person would be receiving each month throughout the financial year.

An example of the raw data looked something like this:

EmployeeJan-2020Feb-2020
123 Smith, Bob2,0002,200
456 Doe, Jane3,0003,300
789 Doe, John 200

The original spreadsheet had all months of the year and is labelled as Data:

Original Data set Employee Salary per Month
Employee salaries per month

I had to import this data into NetSuite using the CSV Import process, which required changing the structure a little bit, so that the resulting CSV looked like so:

Employee IDEmployee NameFrom DateTo DateAmount
123Smith, Bob1/01/202031/01/20202,000
123Smith, Bob1/02/202029/02/20202,200
456Doe, Jane1/01/202031/01/20203,000
456Doe, Jane1/02/202029/02/20203,300
789Doe, John1/02/202029/02/2020200

As you can see with the requirements of the task, we need to be able to expand the data set to rows (as it’s currently laid out in columns), then merge the data back into one columnar set. Once we have achieved this we then use the handy File > Download > Comma-separated values (current sheet) option in Google Sheets.

Solution: Step 1 – Expansion

The first rule when manipulating data is to leave the original data set intact.

Therefore, we create a fresh new sheet and label this whatever name is appropriate for your needs. In this example we will name it appropriate to what we are doing: DataExpand.

In our expansion we want to achieve all necessary CSV fields required. This means we need to fetch the ID number from the original Employee field, and to copy the remaining data into another field for the Employee Name. Then we split the header column in our original sheet to a data range. Followed lastly with the amount in each date range.

Our first sheet, should look a little something like this:

Expand Employee Salary data
Perform the CSV requirement per month

Here are the details as displayed in the photo above:

  • Cells $B$1 and $G$1 contained the =COLUMN() reference to each of the months on the original data set. This just helps with creating an easy index function that uses that reference which can make it easier copying the same columnar set 12 times.
  • Row 1 this contains the following formula to extract the ID number from the Employee field. Here we check to make sure we have data in the original Data sheet, if we do we want to perform a regular expression where we just want all the digits, therefore, we use the regular expression of d+:
=if(len(Data!A2),REGEXEXTRACT(Data!A2,"d+"),"")
  • From $B$2 down we then use another regex formula to extract those digits and replace them with nothing. We also want to check if the employee received any salaries or wages, if they didn’t we don’t want to add them into the data set.
=if(index(Data!$1:$1000,row(),B$1),trim(regexreplace(Data!$A2,"d+","")),"")
  • From $C$2 down we just insert the ID that is needed for each row if we have a confirmed value in the previous column.
=if(len(B2),$A2,"")
  • For $D$2 down if we have a value in $B$2 then we get the value of the salaries for that respective month. We use the handy index function here to get that value:
=if(len(B2),index(Data!$1:$1000,row(),B$1),"")
  • For $E$2 down if we have a value in $B$2 then we get the column heading of the respective month using, once again, the index function:
=if(len(B2),index(Data!$1:$1000,1,B$1),"")
  • Lastly, for $F$2 down we get the value in the previous cell, if something exists, and calculate what the last day of the month will be for that cell:
=if(len(E2),date(year(E2),month(E2)+1,0),"")

As we have been mindful of the formulas used for each budget month, we now just need to copy cells $B$2:$F$1 and then paste this 12 times across. We could make it even easier by, after our first paste in cells $G$2:$K$4 changing cell $G$1 to be a formula: =$B$1 + 1:

Employee Salaries change header to formula
Change header row on first paste to formula

Then we copy this newly pasted range with formula edits across 10 more times.

Solution: Step 2 – Concatenation

The purpose of the next sheet is to zip up our columnar collections. This is a relatively simple step, however, there a couple of things to be wary of:

  1. Use TEXTJOIN when zipping up the columnar data rather than the JOIN or CONCATENATE functions. The primary reason for this is where there is no data in a cell TEXTJOIN creates the blank join, whereas the other functions neglect it. As an example, if we had the following table:
 ABC
1MissingPuzzlePiece
2Missing Piece

When using =JOIN(",",A1:C1) it produces the result Missing,Puzzle,Piece for the first row and =JOIN(",",A2:C2) will produce Missing,Piece for the second row.

Whereas using =TEXTJOIN(",",FALSE,A1:C1) produces the result Missing,Puzzle,Piece for the first row and =TEXTJOIN(",", FALSE, A2:C2) will produce Missing,,Piece for the second row.

As the TEXTJOIN function allows for the flexibility of the concatenation of your columnar data it’s import the data remain as it is represented elsewhere if there happens to be blank cells in your data.

  1. The other important aspect with this formula is that we want to replace the space characters within our cells as the space character will be used to join and split data back into columns again later. Therefore, it’s important at this step (if your data also contains spaces) to replace the space character with an oblique character that is not used anywhere in your data set.

I am going to replace my space characters within my data to the tilde character ~.

Therefore, I have created a new sheet, which I’ve labelled DataConcat and cell $A$1 has the formula:

=if(len(DataExpand!B2),SUBSTITUTE(TEXTJOIN(";",FALSE,DataExpand!B2:F2)," ","~"),"")

Be mindful when transferring the data across to adjacent columns that you don’t copy paste or drag across the formula, as each column here needs to reference the collection in DataExpand. For example, in the next adjacent cell $B$1 the formula is:

=if(len(DataExpand!G2),substitute(textjoin(";",FALSE,DataExpand!G2:K2)," ","~"),"")

You will need to do this 12 times for each month.

Substitute TEXTJOIN on Columnar data
Combine your data ready for merge using TEXTJOIN and SUBSTITUTE (if needed)

Solution: Step 3 – Merge

So here’s where we merge all the columns from the DataConcat sheet in Step 2 above into one big column collection.

=ARRAYFORMULA(SPLIT(ARRAYFORMULA(SUBSTITUTE(TRANSPOSE(SPLIT(JOIN(" ",QUERY(DataConcat!A:L,,ROWS(DataConcat!A:A)))," ")),"~"," ")),";",TRUE,FALSE))

Wow, what a massive formula!

As with mathematics, we have to start with the inner most functions and work our way out. So let’s break up each component within this formula and explain what’s happening:

QUERY function

QUERY(DataConcat!A:L,,ROWS(DataConcat!A:A))

Here we use the QUERY function to capture the data we want to operate on (the first parameter), being all the columns in the DataConcat sheet, and we then want to combine these cells into one row. We achieve this by setting the number of headers (the third parameter) to the maximum number of rows in the DataConcat sheet. This means every row in the DataConcat sheet will be zipped up into one row!

The way the QUERY function zips everything up is by applying the space character to each new cell. This was why it was important for us to remove the space characters from our own data set before we applied this function.

If you’d like to know more about the QUERY function I wrote up a more detailed post here explaining this type of use.

MERGE – Using JOIN, SPLIT & TRANSPOSE functions

TRANSPOSE(SPLIT(JOIN(" ", QUERY(...)), " "))

The combination of these three functions is to merge the single row returned by the QUERY function into one column.

First we JOIN the rows together using the special space character which helps to delimit each individual row. Then we SPLIT by the space character which produces the rows across individual cells horizontally. Finally, the TRANSPOSE function translates the horizontal cells into vertical formation – a single column.

Unfortunately the JOIN function does have a character length limitation of 50,000 characters. You may need to break up your requirements or look at other means if this ends up being an impediment.

EXPANSION – Using ARRAYFORMULA

The last step is expand our singular column so that it can be represented as needed for CSV export. If we’re expanding data sets we need to use the ARRAYFORMULA function, as demonstrated here:

ARRAYFORMULA(SPLIT(SUBSTITUTE(transpose(...), "~", " "), ";", TRUE, FALSE))

The ARRAYFORMULA provides us with the ability to loop through each of the cells in our data set and to apply individual functions to them. In our case we want to:

  1. SUBSTITUTE back the oblique tilde ~ character with its original space character.
  2. SPLIT back the data we concatenated in the DataConcat sheet. However, we need to be mindful there may be instances where we have blank cells, and we want the SPLIT function to return the blank cells, therefore, we need to change our fourth parameter in the SPLIT function to FALSE (default is TRUE). By switching this to FALSE it turns consecutive delimiters like ;; into three separate cells, rather than just one cell.

With these functions all combined they produce the wonderful result intended, like so:

ArrayFormula Split Substitute Expansion
Expansion of our singular column using ARRAYFORMULA, SUBSTITUTE and SPLIT

Conclusion

Within this post we were able to learn how we can extrapolate data from one form to another using several intermediary steps and complex Google Sheet functions.

Hopefully this post has been helpful with learning:

  • How to organise your steps by leaving the original data set (use multiple sheets if needed).
  • How to use the QUERY function and especially its third headers parameter.
  • Why you would use the TEXTJOIN function over JOIN.
  • The limitations of the JOIN function.
  • How to expand your data set using ARRAYFORMULA with any function when needing to loop through individual cells in the result.
  • How to set the SPLIT function’s fourth parameter so that consecutive empty cells aren’t treated as one.