Big data chronicles
Technical blog on big data technologies
Pages
(Move to ...)
Articles
Talks and community activity
▼
Saturday, October 31, 2020
Watermark architecture proposal for Spark Structured Streaming framework
›
🕥 7 min. The multiple aggregations problem with Spark Structured Streaming framework Developing the translation layer (called runner) fro...
Wednesday, July 8, 2020
Export metrics from Apache Beam pipelines
›
🕥 12 min. This blog post is about part of this talk that I gave at the ApacheCon 2018 about universal metrics in Apache Beam . M...
Friday, June 12, 2020
Nexmark: benchmark and CI tool for Apache Beam
›
🕥 10 min. This blog post is about the subject of this talk I gave at the ApacheCon 2017. While the talk focuses on building Nexmark for...
Monday, April 6, 2020
Code callouts on blogger
›
🕥 3 min. Why a code callout component ? I spent quite some time to make my previous article readable. This article contains a long...
Tuesday, March 17, 2020
How to create a custom Spark Encoder in ... java
›
🕥 13 min What is a Spark Encoder ? An Encoder is a wrapper class that specifies how to serialize and deserialize data with the Spark ...
Friday, February 7, 2020
Understand Apache Beam runners: focus on the Spark runner
›
🕥 5 min. Previously on Apache Beam runners 😀 In the previous article , we had a brief overview of what an Apache Beam runner is...
‹
›
Home
View web version