How can CSV data be loaded into Amazon Redshift?
Loading a CSV to Redshift is a pretty straightforward process, however some caveats do exist, especially when it comes to error-handling and keeping performance in mind.
The basic steps for loading CSV data into Redshift are:
- Load the CSV file/s to S3.
- Use the copy command to load the data from S3 to Redshift.
There are a few ways to use the copy command to load data from S3 into Redshift. I prefer the manifest file method: you simply create a manifest file that contains which CSV file/s to load to redshift and upload the manifest file to S3 together with the csv file/s.
I also recommend for performance reasons to gzip all the files you upload to S3.
Once everything is in S3, you run a copy command in your preferred sql client:
COPY <schema-name>.<table-name> (<ordered-list-of-columns>) FROM '<manifest-file-s3-url>' CREDENTIALS 'aws_access_key_id=<key>;aws_secret_access_key=<secret-key>' GZIP MANIFEST;
Of course, if you choose not use gzip, just omit the GZIP keyword from the command.
You should note that the CSV files should be without the CSV header (that is supplied in the query itself).
If any failures occur, you can look them up in the stl_errors table.
If you want an out-of-the-box solution for scale, error-handling, frequent schema changes or real-time availability of your data, I would recommend taking a look at Alooma's CSV to Redshift integration for a more robust solution.