Building Streaming Data Analytics Solutions on AWS

The Building Streaming Data Analytics Solutions on AWS (DASTRM) course is designed to guide participants through building streaming data analytics solutions using AWS services, including Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (Amazon MSK). During the course, participants will explore how Amazon Kinesis, a real-time data streaming service, and Amazon MSK, a fully managed Apache Kafka service, integrate with other AWS services such as AWS Glue and AWS Lambda. The course covers key components of the data analytics pipeline, including streaming data ingestion, stream storage, and stream processing, and teaches how to apply security, performance, and cost management best practices to the operation of Kinesis and Amazon MSK. This course helps prepare for the AWS Certified Data Engineer – Associate certification exam .

Course Objectives

Below is a summary of the main objectives of the Building Streaming Data Analytics Solutions on AWS (DASTRM) Course :

  • Understand the features and benefits of a modern data architecture and how AWS Streaming Services fits into it.
  • Design and implement a streaming data analytics solution.
  • Identify and apply appropriate techniques, such as compression, sharding, and partitioning, to optimize data storage.
  • Select and implement appropriate options to ingest, transform, and store data in real-time and near-real-time.
  • Choose the appropriate streams, clusters, topics, scaling approach, and network topology for a particular business use case.
  • Integrate AWS streaming services with other AWS analytics and machine learning tools for comprehensive data solutions.
  • Implement monitoring and logging to ensure the reliability and performance of streaming data applications.
  • Address common challenges and troubleshooting techniques in streaming data architectures.

Course Certification

This course helps you prepare to take the:

AWS Certified Data Engineer – Associate Exam ;

Course Outline

Module 1: Introduction to Data Warehousing

  • Relational databases
  • Data warehousing concepts
  • The intersection of data warehousing and big data
  • Overview of data management in AW
  • Hands-on lab 1: Introduction to Amazon Redshift

Module 2: Introduction to Amazon Redshift

  • Conceptual overview
  • Real-world use cases
  • Hands-on lab 2: Launching an Amazon Redshift cluster

Module 3: Launching clusters

  • Building the cluster
  • Connecting to the cluster
  • Controlling access
  • Database security
  • Load data
  • Hands-on lab 3: Optimizing database schemas

Module 4: Designing the database schema

  • Schemas and data types
  • Columnar compression
  • Data distribution styles
  • Data sorting methods

Module 5: Identifying data sources

  • Data sources overview
  • Amazon S3
  • Amazon DynamoDB
  • Amazon EMR
  • Amazon Kinesis Data Firehose
  • AWS Lambda Database Loader for Amazon Redshift
  • Hands-on lab 4: Loading real-time data into an Amazon Redshift database

Module 6: Loading data

  • Preparing Data
  • Loading data using COPY
  • Maintaining tables
  • Concurrent write operations
  • Troubleshooting load issues
  • Hands-on lab 5: Loading data with the COPY command

Module 7: Writing queries and tuning for performance

  • Amazon Redshift SQL
  • User-Defined Functions (UDFs)
  • Factors that affect query performance
  • The EXPLAIN command and query plans
  • Workload Management (WLM)
  • Hands-on lab 6: Configuring workload management

Module 8: Amazon Redshift Spectrum

  • Amazon Redshift Spectrum
  • Configuring data for Amazon Redshift Spectrum
  • Amazon Redshift Spectrum Queries
  • Hands-on lab 7: Using Amazon Redshift Spectrum

Module 9: Maintaining clusters

  • Audit logging
  • Performance monitoring
  • Events and notifications
  • Lab 8: Auditing and monitoring clusters
  • Resizing clusters
  • Backing up and restoring clusters
  • Resource tagging and limits and constraints
  • Hands-on lab 9: Backing up, restoring and resizing clusters

Module 10: Analyzing and visualizing data

  • Power of visualizations
  • Building dashboards
  • Amazon QuickSight editions and features

Course Mode

Instructor-Led Remote Live Classroom Training;

Trainers

Trainers are Amazon AWS accredited instructors and certified in other IT technologies, with years of practical experience in the sector and in training.

Lab Topology

For all types of delivery, the participant can access the equipment and actual systems in our laboratories or directly in international data centers remotely, 24/7. Each participant has access to implement various configurations, Thus immediately applying the theory learned. Below are some scenarios drawn from laboratory activities.

Course Details

Course Prerequisites

Participation in the following courses is recommended:

  • Architecting on AWS
  • Building Data Lakes on AWS

Course Duration

Intensive duration 1 days;

Course Frequency

Course Duration: 1 days (9.00 to 17.00) - Ask for other types of attendance.

Course Date

  • Building Streaming Data Analytics Solutions on AWS  (Intensive Formula) – On Request – 9:00 – 17:00

Steps to Enroll

Registration takes place by asking to be contacted from the following link, or by contacting the office at the international number +355 45 301 313 or by sending a request to the email info@hadartraining.com