About Lesson
Module A: Overview of Data Analytics and the Data Pipeline
- Data analytics use cases
- Using the data pipeline for analytics
Module 1: Introduction to Amazon EMR
- Using Amazon EMR in analytics solutions
- Amazon EMR cluster architecture
Interactive Demo 1: Launching an Amazon EMR cluster
- Cost management strategies
Module 2: Data Analytics Pipeline Using Amazon EMR: Ingestion and Storage
- Storage optimization with Amazon EMR
- Data ingestion techniques
Module 3: High-Performance Batch Data Analytics Using Apache Spark on Amazon EMR
- Apache Spark on Amazon EMR use cases
- Why Apache Spark on Amazon EMR
- Spark concepts
Interactive Demo 2: Connect to an EMR cluster and perform Scala commands using the Spark shell
- Transformation, processing, and analytics
- Using notebooks with Amazon EMR
Practice Lab 1: Low-latency data analytics using Apache Spark on Amazon EMR
Module 4: Processing and Analyzing Batch Data with Amazon EMR and Apache Hive
- Using Amazon EMR with Hive to process batch data
- Transformation, processing, and analytics
Practice Lab 2: Batch data processing using Amazon EMR with Hive
- Introduction to Apache HBase on Amazon EMR
Module 5: Serverless Data Processing
- Serverless data processing, transformation, and analytics
- Using AWS Glue with Amazon EMR workloads
Practice Lab 3: Orchestrate data processing in Spark using AWS Step Functions
Module 6: Security and Monitoring of Amazon EMR Clusters
- Securing EMR clusters
Interactive Demo 3: Client-side encryption with EMRFS
- Monitoring and troubleshooting Amazon EMR clusters
Demo: Reviewing Apache Spark cluster history
Module 7: Designing Batch Data Analytics Solutions
- Batch data analytics use cases