Apache: Big Data North America 2017 will be held at the Intercontinental Miami in Miami, Florida. 

Register now for the event taking place May 16-18, 2017. 
Back To Schedule
Tuesday, May 16 • 12:05pm - 12:55pm
Continuous Applications with Apache Spark 2.0 - Peyman Mohajerian, Databricks

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Most streaming engines focus on performing computations on a stream: for example, one can map a stream to run a function on each record, reduce it to aggregate events by time, etc. However, as we worked with users, we found that virtually no use case of streaming engines only involved performing computations on a stream. Instead, stream processing happens as part of a larger application, which we’ll call a continuous application.

Online machine learning and serving real-time data are examples that show streaming computations are part of larger applications that include serving, storage, or batch jobs. Unfortunately, in current systems, streaming computations run on their own, in an engine focused just on streaming. This leaves developers responsible for the complex tasks of interacting with external systems (e.g. managing transactions) and making their result consistent with the the rest of the application (e.g., batch jobs). This is what we’d like to solve with continuous applications.


Peyman Mohajerian

Peyman is a Solution Architect at Databricks in the Southern California region. Prior to Databricks he had numerous consulting roles working for MapR and Teradata as a Big Data Engineer in the areas of data architecture, analytic and data science. Prior to Teradata at Fox Filmed Entertainment... Read More →

Tuesday May 16, 2017 12:05pm - 12:55pm EDT