Last Wednesday, May 27th, over 50 Data plumbers sat down together to discuss the different aspects of the data pipelining process. Most of the TLV data plumbers are data engineers, responsible for the daily, reliable transition of gigabytes of data, from various sources to various data stores: Hive, Spark, Redshift and even ElasticSearch.
Our first speaker, alooma's CTO Yair Weinberger, discussed the false promise of "Exactly-Once Processing". In his talk, Yair explained why there is no such thing as "Exactly-Once Processing", and how idempotency can only get you close when implementing states in Apache Storm / Trident.
Our second speaker, Gregory Bondar, the head of Wix Data Services team, described how they built a platform for running SQL queries on their real-time data stream. Their platform enables analysts all over Wix to extract insights on users' actions and even respond in a matter of seconds. Gregory and Igal Shilman presented how they built their amazing platform on top of Apache Storm and Esper.
Our final speaker, all the way from the Silicon Valley, Cloudera's Gwen Shapira, described different ways to load data from Kafka to Hadoop. Apparently there are many options, with advantages and disadvantages to each. In addition to choosing the right corner stones for you data pipeline, Gwen explained why pairing your data with schema is not just a recommended practice - it's crucial when working with NoSQL systems in the so-called-schema-less data stream world.
For our next meetup, we would love to hear your thoughts! What would you like to hear? Do you have an interesting data architecture to present? Feel free to contact us directly, or comment here on our blog!
- alooma: Exactly once processing by Yair Weinberger
- Wix: sql on-storm-platform by Gregory Bondar
- Cloudera: Kafka and Hadoop by Gwen Shapira
The TLV Data Plumbers group is focused on data infrastructures for collecting, processing and loading data. In our meetups, we'll make an effort to bring you the latest and greatest architectures of data applications, with technologies like Apache Storm, Kafka, Spark and more.