Agenda

(Amazon) AWS Big Data (FR/EN)

Registration

 

Big Data on AWS

In this 3-day course, you'll learn about cloud-based Big Data solutions such as Amazon EMR, Amazon Redshift, Amazon Kinesis, and the rest of the AWS Big Data platform. You'll discover how to use Amazon EMR to process data using the broad ecosystem of Hadoop tools like Hive and Hue. The course also teaches you how to create Big Data environments, work with Amazon DynamoDB, Amazon Redshift, Amazon QuickSight, Amazon Athena, and Amazon Kinesis, and leverage best practices to design Big Data environments for security and cost-effectiveness.

Objectives 

you will learn how to:

Fit AWS Solutions inside of a big data ecosystem

Leverage Apache Hadoop in the context of Amazon EMR

Identify the components of an Amazon EMR cluster

Leverage common programming frameworks available for Amazon EMR including Hive, Pig et Streaming

Leverage Hue to improve the ease-of-use of Amazon EMR

Use in-memory analytics with Spark on Amazon EMR

Choose appropriate AWS data storage options

Identify the benefits of using Amazon Kinesis for near real-time Big Data processing

Leverage Amazon Redshift to efficiently store and analyze data

Leverage AWS Glue for ETL workloads

Comprehend and manage costs and security for a Big Data solution

Secure a Big Data solution

Identify options for ingesting, transferring, and compressing data

Leverage Amazon Athena for ad hoc query analytics

Use visualization software to depict data and queries using Amazon QuickSight

Orchestrate big data workflows using AWS Data Pipeline

Prerequisites 

Basic familiarity with big data technologies, including Apache Hadoop, HDFS, and SQL/NoSQL querying

Students should complete the free Big Data Technology Fundamentals web-based training or have equivalent experience

Working knowledge of core AWS services and public cloud implementation

Students should complete the AWS Technical Essentials course or have equivalent experience

Basic understanding of data warehousing, relational database systems, and database design

Intended audience 

Individuals responsible for designing and implementing big data solutions, namely Solutions Architects.
Data Scientists and Data Analysts interested in learning about the services and architecture patterns behind big data solutions on AWS.

Skills

Implement core AWS Big Data services according to basic architecture best practices

Design and maintain Big Data

Leverage tools to automate data analysis

Delivery Method 

AWS official training ebook

Hands-On labs

Digital signing twice a day

Class Evaluation

Certificate of attendance

Program

Day1: 

Describe AWS solutions for data lakes and NoSQL databases
Describe the factors to consider when choosing a data store

Module 1: Overview of Big Data

Define Big Data

Identify some sources of big data

List examples of big data use cases

Describe the big data ecosystem

Module 2: Big Data Ingestion and Transfer

Describe options for ingesting data into AWS

Describe AWS solutions for transferring data Module 3 : Real-Time Data Ingestion

Module 3: Real-Time Data Ingestion

Explain the need for stream processing and analytics

List features of stream processing and analytics

Explain the architecture of an Amazon kinesis

Streams application

List the benefits of Amazon Kinesis Video Streams,

Amazon Kinesis Firehose and Amazon Kinesis Analytics

Module 4: Big Data Storage Solutions

Identify the data storage options available in AWS

Explain storage solution concepts like Data Lake and NoSQL

Module 5: Big Data Processing and Analytics

Introduce big data processing/analytics

List cases for big data processing and use cases for

Amazon EMR and Amazon Redshift

Contrast Hadoop and data warehouse solutions for simple querying

Day 2 

Module 6: Apache Hadoop and Amazon EMR

Define the purpose and business value of Apache Hadoop

Contrast Apache Hadoop with relational databases

List the components of Apache Hadoop and the

Apache Hadoop ecosystem

Contrast on-premises Apache Hadoop with Amazon EMR

List the advantages of using Amazon EMR for big data

Detail the improvements made to Hadoop with YARN

Explain the architecture of a typical Amazon EMR environment

Module 7: Using Amazon EMR

List the steps to launch an Amazon EMR cluster

Describe when to use long-running versus transient clusters

Detail the differences between the Quick and

Advanced consoles in Amazon EMR cluster creation

Explain the Amazon Machine Image options for your cluster

Identify which instance types are suitable for your workload

Explain how to resize a cluster

Define the purpose of bootstrap actions

Identify methods of sending work to an Amazon

EMR cluster

Module 8: Hadoop Programming Frameworks

Detail how programming frameworks work

Hadoop frameworks and use cases

Discuss the most popular Hadoop applications

Module 9: Web Interfaces on Amazon EMR

Describe web interfaces available on Amazon EMR

Identify what Hue is and how it makes using Hadoop on Amazon EMR easier

Describe the Hadoop applications that Hue supports

Detail the advantages of using Hue vs traditional

command-line Hive queries and Pig scripts

Module 10: Apache Spark on Amazon EMR

Describe the motivation for using Spark

Identify use cases for Spark

Describe the Spark programming model

Detail the modules included with Spark

Explain how Spark is deployed on Amazon EMR

Name the advantages of running Spark on Amazon EMR

Day 3 

Module 11: Using Amazon Glue to automate ETL workloads

Describe the importance of serverless technology in a big data platform

Describe AWS Glue for serverless ETL

Analyze use cases for using AWS Glue

Module 12: Amazon Redshift and Big Data

Contrast data warehouses with traditional databases

Describe common data warehouse design approaches

Illustrate the differences between common data schemas used in data warehouses

Identify common use cases for Amazon Redshift • Describe the architecture of Amazon Redshift

Module 13 : Securing your Amazon EMR deployments

Explain the AWS shared responsibility model
Describe how Amazon EMR integrates with Amazon

Virtual Private Cloud

Detail how a basic implementation of AWS Identity and Access Management works

Explain how Amazon EMR leverages Amazon EC2

Security Groups and IAM
List options for securing data at rest and data in transit

Security overview: Amazon Kinesis, Amazon

DynamoDB and Amazon Redshift

Module 14: Managing Big Data Costs

List the cost considerations for Amazon EMR • Detail the various pricing models and cost considerations for Amazon EC2 instances, Amazon

Kinesis, Amazon DynamoDB, and Amazon Redshift • Present use cases and strategies for leveraging Spot

Instances with big data
Describe methods of managing Amazon EC2 costs for Amazon EMR

Explain how to leverage more than one pricing model with Amazon EMR

Explain the factors to consider when planning for storage and data transfer costs

Provide the best practices for a cost-efficient infrastructure

Module 15: Visualizing and orchestrating Big Data

Explain the purpose of visualizing big data
Describe AWS solutions for visualizing big data
Describe how AWS Data Pipeline can orchestrate big data workflows

Module 16: Big Data Design Patterns

Review how to leverage multiple AWS solutions to perform analysis and processing jobs

Certification recommended 

AWS Certified Big Data - Speciality

 

Inscription

  • Price2150.00 €
  • Limit date of registration31 December 2021
  • LocationRemote
  • Minimum enrollment2 participants
  • TermsGeneral conditions of sales

 

Back Registration