How do I handle unstructured data?

byMaytal Shamir
Updated Dec 27, 2017

When I think about handling unstructured data that is being collected from many sources such as emails, videos, audio files, web pages, and social media messages - three stages come to my mind: collecting, storing and analyzing.

1. Collection

Your first step will be setting up your data sources across all domains. Whether using click stream, advertising data, usage data, operational feeds, CRM, or any other - you should think about what is the best client for the task.

2. Store

Persisting your data in the cloud or on premise allows you to query and analyze it. Fortunately, we’re living in the future, and we have some really great tools that don’t require a strict schema - like MongoDB, Elasticsearch or Cassandra (database). In mongoDB for example, you can store documents by a unique identifier and access them later using this id.

3. Analyze

Usually after processing and storing your data, you would like to access it again, and perhaps learn something from it.

Usually, you’ll have to organize your data a bit and adjust it for analytics.

Where I work (Alooma) for example, we translate our customers’ data (structured or unstructured) into JSON objects that we stream to a cloud data warehouse in real-time. As data passes through the stream, Alooma allows you to provide Python code in order to organize and prepare the data to be available for analytics on top of the data warehouse.

This end to end process gives a sense of “structure” to your “unstructured” data and lets you both store and extract value from your data.

Like what you read? Share on

Published at Quora. See Original Question here

Data AnalysisBig Data

Further reading

What is Striim?
Alooma Team • Updated Jul 26, 2018
What is Alteryx?
Alooma Team • Updated Jan 1, 2018
What is a data lake in the context of big data?
Dan-ya Shwartz • Updated Jul 3, 2017