Petabyte-Scale Text Processing with Spark

Have something interesting to share with Java Eastern Europe community?

Become a speaker now

Aleksey Slyusarenko

Grammarly, Ukraine

Research Engineer, 4 years of machine learning and NLP experience, 10 years of computational algorithm development, programming and math competitions winner (Google Code Jam – 11th absolute place, IMC – 1st prize).

Speaker's activity

Petabyte-Scale Text Processing with Spark

May 20th

14:30-15:15

Talk

Russian

At Grammarly, we have long used Amazon EMR with Hadoop and Pig in support of our big data processing needs. However, we were really excited about the improvements that the maturing Apache Spark offers over Hadoop and Pig, and so set about getting Spark to work with our petabyte text data set. This talk describes the challenges we had in the process and a scalable working setup of Spark that we have discovered as a result.

Slides

Video