Redshift: How to alter column in 3 clicks

Noaa Cohn  •  9 min read  • 3 Jun 2017

TL;DR There’s no simple, native way to alter a column’s data type in Redshift, BigQuery, or Snowflake. So, we built one which enables you to alter a column in 3 clicks within the Alooma Platform. Read on to learn about how we, as a product team, got there.

The pain point

As a data pipeline as a service, we see all kinds of data from all kinds of customers. What we’ve seen time and time again is that, as the old adage goes, "change is the only constant". Organizational data evolves, and schemas evolve right along with it. Flexibility is one of our core tenets, and we give our customers many ways to stay agile as their data changes. But, there was always one place things fell short - data type changes.

Data changed... now what?

Data warehouses like Redshift, BigQuery, and Snowflake don’t natively support an alter column operation to let a user change a column’s data type once it's been created. We’ve seen users deal with it in many ways, from moving the data to a new column to running complex scripts to add-drop the column in place.

To address this we introduced the alter column MVP: Supporting alter column requests manually by having one of our Data Solution Engineers run a script that creates a new column with the desired data type, copy the data over, and delete the old column. As MVPs go, it was super successful - more customers asked for it, and we learned the most common use cases (hello, longer varchar!).

Hacking alter column into an alpha

The alter column MVP upgrade arrived through one of our Hackathons. During this 12 hour junk-food-fueled coding binge, a few of our team members knocked out a first round of "alter column", automated and UI-enabled in the product. By changing the length of a field’s varchar setting in the Alooma Mapper screen, they indicated the alter intent and a modification of the manual script was teed up in the background. Despite stiff competition from other awesome Hackathon projects, the Alter Column team won by a landslide.

We tasted the freedom of automatically changing a column data type in a data warehouse. How could we ever go back?! We knew what we had to do, and the official alter column feature was born.

Time to get real and get it on the roadmap

After the hackathon, the product gears continued to turn. We prioritized some quick wins such as name and varchar length changes. Next, we’ll add expansive changes from a narrow type like integer to a more inclusive type like varchar. After that we’ll tackle restrictive type changes from a broad type like varchar to a more narrow one like timestamp.

Before we got too deep, we wanted to make sure it made sense to our users. Altering a column type is a big deal since it can be long running, and it can fail if data doesn’t match. We also know that when it comes to production environments, changing data structures is a serious operation. It’s important that users feel safe and in control, as in the rest of the system. Naturally, a usability test would help us make sure.

So, we put together a prototype of different scenarios, and put it in front of some customers. We heard a couple of very consistent responses for varchar length and expansive changes:

  • "This is cool! I want this."

  • "I get what is about to happen. You’re going to change the data type of my column in the backend."

However, for restrictive changes, we got more varied reactions:

  • "Wait… what actually happens in the backend? This could easily fail. What happens then? I’m hesitant to continue."

  • "You probably create a new column, copy everything over and delete the old one. If something doesn’t match, it fails. Got it."

It was clear that transparency in the process was key: what is actually happening in the backend, and what happens if it fails, is important. Users needed to know that everything would be run in a single transaction, and if the change failed for any reason, we would tell them why and their data would be unchanged. So, we made sure to add explanatory indicators to the design of restrictive type changes, and beefed up our error messaging.

It’s alive! A new and simple way to alter column

Today, we are proud to announce our newest capability for alter column.

By selecting the type dropdown, choosing a new type, and hitting Apply in Alooma’s Mapper, our users can now perform an alter column operation in 3 clicks.

We automatically detect the set of changes on the already-mapped fields for the event type, and execute the alter column commands.

If a column type remained the same but the column name changed - then we simply rename the column.

If the type on the mapped field changed and we support that type change such as going from varchar(1024) to varchar(2048) or integer to bigint, then we:

  • Rename the old column with a temporary name.

  • Add new column with the desired type and either original (or different, if requested) name.

  • Load data from the old column to the new column.

  • Drop the old column.

All of the changes happen in a single transaction. If any steps in the sequence fail, then the entire set of changes is cancelled. This ensures that there are no side effects of an alter that might not necessarily succeed - such as changing varchars to timestamps where not all the data can be cast.

In the backend, it looks something like this:

ALTER TABLE "${schema}"."${table_name}" 
  RENAME COLUMN "${col_name}" TO "${col_name}_${timestamp}";

ALTER TABLE "${schema}"."${table_name}" 
  ADD COLUMN "${new_col_name}" ${type};
  UPDATE "${schema}"."${table_name}" SET "${new_col_name}" = 
  "${col_name}_${timestamp}"::${type} WHERE "${new_col_name}" IS NULL;

ALTER TABLE "${schema}"."${table_name}" 
  DROP COLUMN "${col_name}_${timestamp}" CASCADE;

If you are a Redshift customer you can alter column names and varchar length, right from the Alooma Mapper (and, of course, programmatically via alooma.py). We are planning to expand the type changes and output support to include BigQuery and Snowflake in upcoming releases.

Let us know what you think by commenting below. If you’d like to see this in action, contact us.

Share  

Get your data flowing today!
Contact us for a demo or free trial.