Apache: Big Data North America 2017 will be held at the Intercontinental Miami in Miami, Florida. 

Register now for the event taking place May 16-18, 2017. 
Back To Schedule
Wednesday, May 17 • 12:15pm - 1:05pm
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy - Stuart Pook, Criteo

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Hadoop has become a critical part of Criteo's operations. What started out as a proof of concept has turned into two in-house bare-metal clusters of over 2200 nodes. Hadoop contains the data required for billing and, perhaps even more importantly, the data used to create the machine learning models, computed every 6 hours by Hadoop, that participate in real time bidding for online advertising. Two clusters do not necessarily mean a redundant system, so Criteo must plan for any of the disasters that can destroy a cluster. This talk describes how Criteo built its second cluster in a new datacenter and how to do it better next time. How a small team is able to run and expand these clusters is explained. More importantly the talk describes how a redundant data and compute solution at this scale must function, what Criteo has already done to create this solution and what remains undone.

avatar for Stuart Pook

Stuart Pook

Senior DevOps Engineer, Criteo
Stuart loves storage (130 PB at Criteo) and is part of Criteo's Lake team that runs some small and two rather large Hadoop clusters. He also loves automation with Chef because configuring more than 2200 Hadoop nodes by hand is just too slow. Before discovering Hadoop he developed... Read More →

Wednesday May 17, 2017 12:15pm - 1:05pm EDT