Apache: Big Data North America 2017 will be held at the Intercontinental Miami in Miami, Florida. 

Register now for the event taking place May 16-18, 2017. 
Back To Schedule
Wednesday, May 17 • 12:15pm - 1:05pm
Building Streaming Data Pipelines with Stateful Operations - Chandni Singh, Simplifi.it

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
There are a few streaming platforms which provide the exactly-once processing guarantee. This is done by checkpointing the state of the functional units (operators) that make up the streaming pipeline. Many real-world big data pipelines are typically composed of operators which maintain a large ever-growing state. However, periodically checkpointing the state of these operators is only practical when their state is small. To solve this problem, I created Managed State for the Apache Apex project, which is an incrementally checkpointed key-value data structure. Additionally, the community has developed a layer ontop of Managed State (Spillable Datastructures), which allows us to incrementally checkpoint a variety of common data structures. This presentation will cover the challenges of implementing fault-tolerant incremental checkpoint in Managed State.


Chandni Singh

I’m a software engineer who likes to build distributed frameworks/applications which are fault-tolerant and scalable. I am a PMC member and committer of Apache Apex project and have worked with few other distributed platforms and have co-founded a company which creates big data... Read More →

Wednesday May 17, 2017 12:15pm - 1:05pm EDT