In this talk, we will present practical case-studies of large scale stream processing applications at LinkedIn. Example applications discussed will include:
- LinkedIn’s real-time communication platform that delivers relevant content at massive scale to our 450M members.
- The LinkedIn feed that processes billions of events each day, and keeps track of what members viewed on their news feed.
We will present the hard scalability problems we had to solve in each of these applications and the techniques used to address them. Problems include scaling ingestion of events, partitioned processing, highly performant data access and performing efficient remote I/O. We will explain how we leveraged and improved Apache Samza in addressing these problems and how we scaled to process over a trillion events every day.