How can CSV data be loaded into Amazon Redshift?

byAlooma Team
Updated Sep 27, 2018

Loading a CSV to Redshift is a pretty straightforward process, however some caveats do exist, especially when it comes to error-handling and keeping performance in mind.

The basic steps for loading CSV data into Redshift are:

  1. Load the CSV file/s to S3.
  2. Use the copy command to load the data from S3 to Redshift.

There are a few ways to use the copy command to load data from S3 into Redshift. I prefer the manifest file method: you simply create a manifest file that contains which CSV file/s to load to redshift and upload the manifest file to S3 together with the csv file/s.

I also recommend for performance reasons to gzip all the files you upload to S3.

Once everything is in S3, you run a copy command in your preferred sql client:

COPY <schema-name>.<table-name> (<ordered-list-of-columns>) FROM '<manifest-file-s3-url>' CREDENTIALS 'aws_access_key_id=<key>;aws_secret_access_key=<secret-key>' GZIP MANIFEST;

Of course, if you choose not use gzip, just omit the GZIP keyword from the command.

You should note that the CSV files should be without the CSV header (that is supplied in the query itself).

If any failures occur, you can look them up in the stl_errors table.

If you want an out-of-the-box solution for scale, error-handling, frequent schema changes or real-time availability of your data, I would recommend taking a look at Alooma's CSV to Redshift integration for a more robust solution.

Like what you read? Share on
Data IntegrationAmazon Redshift

Further reading

What is Striim?
Alooma Team • Updated Jul 26, 2018
What is Microsoft Integration Services?
Alooma Team • Updated May 3, 2018
What is IBM Infosphere Information Server?
Alooma Team • Updated May 10, 2018
What is Oracle Data Integrator (ODI)?
Alooma Team • Updated Apr 3, 2018