Google Cloud Courses and Certifications

Serverless Data Processing with Dataflow

oogle Cloud's Serverless Data Processing with Dataflow course is a training program designed for big data professionals who want to advance their Dataflow skills and improve their data processing applications. The course starts with the basics, showing how Apache Beam and Dataflow work together to solve data processing needs, avoiding the risk of single-vendor lock-in. Next, the course focuses on pipeline development, explaining how to transform business logic into data processing applications that can run on Dataflow. The final part of the course is dedicated to operations, where critical aspects of managing a data application on Dataflow, such as monitoring, troubleshooting, testing, and reliability, are addressed. During the course, participants will have the opportunity to familiarize themselves with technologies such as the Beam Portability Framework, Shuffle and Streaming Engine, Flexible Resource Scheduling, and IAM. Finally, the course helps prepare for the Google Professional Data Engineer Certification exam .

Course Objectives

Below is a summary of the main objectives of the Serverless Data Processing with Dataflow Course :

  1. Understand the integration and use of Apache Beam with Dataflow.
  2. Develop scalable, serverless data pipelines.
  3. Apply operational practices for managing Dataflow applications.
  4. Use technologies such as Beam Portability Framework and Flexible Resource Scheduling.
  5. Prepare for the Google Professional Data Engineer Certification Exam
  6. Implement best practices for data pipeline design and optimization.
  7. Monitor and troubleshoot Dataflow jobs using Google Cloud’s tools.
  8. Leverage Dataflow’s autoscaling and fault-tolerance features.

Course Certification

This course helps you prepare to take the:
Google Cloud Certified Professional Data Engineer Exam;

Course Outline

Module 1: Introduction

  • Course Introduction
  • Beam and Dataflow Refresher

Module 2: Separating Compute and Storage with Dataflow

  • Dataflow
  • Beam Portability
  • Runner v2
  • Container Environments
  • Cross-Language Transforms

Module 3: Dataflow, Cloud Operations

  • Dataflow Shuffle Service
  • Dataflow Streaming Engine
  • Flexible Resource Scheduling

Module 4: IAM, Quotas, and Permissions

  • IAM
  • Quota

Module 5: Security

  • Data Locality
  • Shared VPC
  • Private IPs
  • CMEK

Module 6: Beam Concepts Review

  • Beam Basics
  • Utility Transforms
  • Document Lifecycle

Module 7: Windows, Watermarks, Triggers

  • Windows
  • Watermarks
  • Triggers

Module 8: Sources and Sinks

  • Sources and Sinks
  • Text IO and File IO
  • BigQuery IO
  • PubSub I
  • Kafka IO
  • Bigtable IO
  • I will have
  • Splittable DoFn

Module 9: Schemas

  • Beam Schemas
  • Code Examples

Module 10: State and Timers

  • State API
  • Hours API

Module 11: Best Practices

  • Schemas
  • Handling unprocessable Data
  • Error Handling
  • AutoValue Code Generator
  • JSON Data Handling
  • Utilize DoFn Lifecycle
  • Pipeline Optimizations

Module 12: Dataflow SQL and DataFrames

  • Dataflow and Beam SQL
  • Windowing in SQL
  • Beam DataFrames

Module 13: Beam Notebooks

  • Beam Notebooks

Module 14: Monitoring

  • Job List
  • Job Info
  • Job Graph
  • Job Metrics
  • Metrics Explorer

Module 15: Logging and Error Reporting

  • Logging
  • Error Reporting

Module 16: Troubleshooting and Debug

  • Troubleshooting Workflow
  • Types of Troubles

Module 17: Performance

  • Pipeline Design
  • Data Shape
  • Source, Sinks, and External Systems
  • Shuffle and Streaming Engine

Module 18: Testing and CI/CD

  • Testing and CI/CD Overview
  • Unit Testing
  • Integration Testing
  • Artifact Building
  • Deployment

Module 19: Reliability

  • Introduction to Reliability
  • Monitoring
  • Geolocation
  • Disaster Recovery
  • High Availability

Module 20: Flex Templates

  • Classic Templates
  • Flex Templates
  • Using Flex Templates
  • Google-provided Templates

Course Mode

Instructor-Led Remote Live Classroom Training;

Trainers

Trainers are GCP Official Instructors and certified in other IT technologies, with years of hands-on experience in the industry and in Training.

Lab Topology

For all types of delivery, the Trainee can access real Cisco equipment and systems in our laboratories or directly at the Cisco data centers remotely 24 hours a day. Each participant has access to implement the various configurations thus having a practical and immediate feedback of the theoretical concepts.
Here are some Labs topologies available:

 

Course Details

Course Prerequisites

Participation in the  Data Engineering on Google Cloud course is recommended .

Course Duration

Intensive duration 3 days

Course Frequency

Course Duration: 3 days (9.00 to 17.00) - Ask for other types of attendance.

Course Date

  • Serverless Data Processing with Dataflow   Course (Intensive Formula) – On request – 09:00 – 17:00

Steps to Enroll

Registration takes place by asking to be contacted from the following link, or by contacting the office at the international number +355 45 301 313 or by sending a request to the email info@hadartraining.com