Documentation

Connecting to S3

Abstract

SummaryHow to configure the connection to your S3 bucket and specify information about how the data should be stored.

Once you've granted Alooma access to S3, you'll need to specify the S3 information to configure the output.

  1. Click on the Output node on the Plumbing page.

  2. If it's not already selected, choose the Settings tab.

  3. Fill out the connection information:

    • AWS Access Key ID and AWS Secret Access Key

    • AWS Region (the region where your S3 bucket is located)

    • Bucket name (the S3 bucket to write data to) — this MUST match the value you entered in the policy you added

    • Top-level Folder (the default path within the bucket to write data to) 

  4. Specify the default partition strategy. Data files can be aggregated into time based directories, based on the granularity you specify (year, month, day, or hour). Alooma will create the necessary level of partitioning. So, if you specify "day", the URL will look something like this: s3://bucket/topLevelFolder/eventType/2018/9/30/data_file.csv. But if you specify "year", the URL will look something like this: s3://bucket/topLevelFolder/eventType/2018/data_file.csv

    If you want to base your partitioning on something other than time, you can create a custom partition via the Code Engine as described below.

  5. Choose the partitioning format: whether to use value based paths (s3://bucket/topLevelFolder/eventType/2018/9/30/) or key-value based paths (s3://bucket/topLevelFolder/eventType/year=2018/month=9/day=30/).

  6. Select the output format used to upload: CSV, JSON, PARQUET, or AVRO.

  7. Save the changes.

Note

If your S3 connection is unsuccessful, you will see a message in the Notification pane of the Dashboard screen.

And that's it! You've created your S3 output and Alooma can now start adding your data. The next step is to create some inputs.

Creating a custom partition

You can add the following transform to the Code Engine to specify a custom directory structure or partition (replacing the value inside the <>s with the partition name) for your S3 output.

Copy
event['_metadata']['output_hint']['partition'] = '<custom partitions>'

Incoming data will now appear under: s3://bucket/topLevelDirectory/table/<custom partitions> instead of: s3://bucket/topLevelDirectory/table/year=2018/month=12/etc.

A common use case might be partitioning by a different date than the date that Alooma is partitioning by, or partitioning by different value(s) altogether (or even a combination). In this example, we use the geographical values (country and city) from the event to specify the partitions.

Copy
event['_metadata']['output_hint']['partition'] = 'country={country}/city={city}'.format(
    country=event['country'], city=event['city'])

Search results

    No results found