Big data chronicles

Technical blog on big data technologies

Pages

▼

Saturday, October 31, 2020

Watermark architecture proposal for Spark Structured Streaming framework

🕥 7 min. The multiple aggregations problem with Spark Structured Streaming framework Developing the translation layer (called runner) fro...

Wednesday, July 8, 2020

Export metrics from Apache Beam pipelines

🕥 12 min. This blog post is about part of this talk that I gave at the ApacheCon 2018 about universal metrics in Apache Beam . M...

Friday, June 12, 2020

Nexmark: benchmark and CI tool for Apache Beam

🕥 10 min. This blog post is about the subject of this talk I gave at the ApacheCon 2017. While the talk focuses on building Nexmark for...

Monday, April 6, 2020

Code callouts on blogger

🕥 3 min. Why a code callout component ? I spent quite some time to make my previous article readable. This article contains a long...

Tuesday, March 17, 2020

How to create a custom Spark Encoder in ... java

🕥 13 min What is a Spark Encoder ? An Encoder is a wrapper class that specifies how to serialize and deserialize data with the Spark ...

Friday, February 7, 2020

Understand Apache Beam runners: focus on the Spark runner

🕥 5 min. Previously on Apache Beam runners 😀 In the previous article , we had a brief overview of what an Apache Beam runner is...

View web version

About Me

Etienne Chauchot: Focused on Big Data technologies. Open Source fan, I contribute to Apache projects such as Apache Flink, Apache Beam or Apache Spark. I'm an Apache Beam and Flink committer and Beam PMC member. I'm also an Apache Software Foundation member

View my complete profile

Powered by Blogger.