Course Outline
Introduction
- Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink
Installing and Configuring Apache Beam
Overview of Apache Beam Features and Architecture
- Beam Model, SDKs, Beam Pipeline Runners
- Distributed processing back-ends
Understanding the Apache Beam Programming Model
- How a pipeline is executed
Running a sample pipeline
- Preparing a WordCount pipeline
- Executing the Pipeline locally
Designing a Pipeline
- Planning the structure, choosing the transforms, and determining the input and output methods
Creating the Pipeline
- Writing the driver program and defining the pipeline
- Using Apache Beam classes
- Data sets, transforms, I/O, data encoding, etc.
Executing the Pipeline
- Executing the pipeline locally, on remote machines, and on a public cloud
- Choosing a runner
- Runner-specific configurations
Testing and Debugging Apache Beam
- Using type hints to emulate static typing
- Managing Python Pipeline Dependencies
Processing Bounded and Unbounded Datasets
- Windowing and Triggers
Making Your Pipelines Reusable and Maintainable
Create New Data Sources and Sinks
- Apache Beam Source and Sink API
Integrating Apache Beam with other Big Data Systems
- Apache Hadoop, Apache Spark, Apache Kafka
Troubleshooting
Summary and Conclusion
Requirements
- Experience with Python Programming.
- Experience with the Linux command line.
Audience
- Developers
Testimonials (4)
Sufficient hands on, trainer is knowledgable
Chris Tan
Course - A Practical Introduction to Stream Processing
Las explicaciones eran muy buenas, si bien algunas preguntas pudieron ahorrarse si se hubieran tocado esos puntos al inicio de los temas se notó un buen dominio y experiencia en el tema.
Alan Jaime Rodríguez García - BANCO DE MEXICO
Course - Stream Processing with Kafka Streams
Muy poco, se me dificulto mucho y mas por que entre desfasado, no tome los primeras sesiones.
Rolando García - OIT para México y Cuba
Course - Apache NiFi for Administrators
La exposicion del maestro