By Know Asap
Online
Can be taken anytime
Professional Training Course
Yes (Details)
English
Course Overview
Big Data Hadoop training will make you an expert in HDFS, MapReduce, Hbase, Hive, Pig, Yarn, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain. You will get Eckovation's Hadoop certification at the end of the course.
According to Forbes Big Data & Hadoop Market is expected to reach $99.31B by 2022 growing at a CAGR of 42.1% from 2015. McKinsey predicts that by 2018 there will be a shortage of 1.5M data experts. According to Indeed Salary Data, the Average salary of Big Data Hadoop Developers is $135k. Once you complete the course and all the assignments you will be granted a soft and hard copy of completion certificate.
Who should take this course
There is an increasing demand for skilled data scientists across all industries that make this course suitable for participants at all levels of experience. We recommend this data science training especially for the following professionals:
- Graduates looking to build a career in Hadoop
- Analytics professionals who want to work with Big data Hadoop Functions
- IT professionals looking for a career switch in the fields of Big data Hadoop
- Software developers interested in pursuing a career in Big data Hadoop
- Experienced professionals who would like to harness Big data Hadoop in their fields
Accreditation
Internationally Accepted Certificate
Course content
BIG DATA HADOOP Duration of Course:
- 40+ hours
BIG DATA HADOOP Topics Covered are:
Session 1 - Introduction to Big Data:
- Importance of Data
- ESG Report on Analytics
- Big Data & It's Hype
- What is Big Data?
- Structured vs Unstructured data
- Definition of Big Data
- Big Data Users & Scenarios
- Challenges of Big Data
- Why Distributed Processing?
Session 2 - Hadoop:
- History Of Hadoop
- Hadoop Ecosystem
- Hadoop Animal Planet
- When to use & when not to use Hadoop
- What is Hadoop?
- Key Distinctions of Hadoop
- Hadoop Components/Architecture
- Understanding Storage Components
- Understanding Processing Components
- Anatomy Of a File Write
- Anatomy of a File Read
Session 3 - Understanding Hadoop Cluster:
- Handout discussion
- Walkthrough of CDH setup
- Hadoop Cluster Modes
- Hadoop Configuration files
- Understanding Hadoop Cluster configuration
- Data Ingestion to HDFS
Session 4 - MapReduce:
- Meet MapReduce
- Word Count Algorithm - Traditional approach
- Traditional approach on a Distributed system
- Traditional approach - Drawbacks
- MapReduce approach
- Input & Output Forms of a MR program
- Map, Shuffle & Sort, Reduce Phases
- Workflow & Transformation of Data
- Word Count Code walkthrough
Session 5 - MapReduce:
- Input Split & HDFS Block
- Relation between Split & Block
- MR Flow with Single Reduce Task
- MR flow with multiple Reducers
- Data locality Optimization
- Speculative Execution
Session 6 - Advanced MapReduce:
- Combiner
- Partitioner
- Counters
- Hadoop Data Types
- Custom Data Types
- Input Format & Hierarchy
- Output Format & Hierarchy
- Side Data distribution - Distributed cache
Session 7 - Advanced MapReduce:
- Joins
- Map side Join using Distributed cache
- Reduce side Join
- MR Unit - An Unit testing framework
Session 8 - Pig:
- What is Pig?
- Why Pig?
- Pig vs Sql
- Execution Types or Modes
- Running Pig
- Pig Data types
- Pig Latin relational Operators
- Multi Query execution
- Pig Latin Diagnostic Operators
Session 9 - Pig:
- Pig Latin Macro & UDF statements
- Pig Latin Commands
- Pig Latin Expressions
- Schemas
- Pig Functions
- Pig Latin File Loaders
- Pig UDF & executing a Pig UDF
Session 10 - Hive:
- Introduction to Hive
- Pig Vs Hive
- Hive Limitations & Possibilities
- Hive Architecture
- Metastore
- Hive Data Organization
- Hive QL
- Sql vs Hive QL
- Hive Data types
- Data Storage
- Managed & External Tables
Session 11 - Hive:
- Partitions & Buckets
- Storage Formats
- Built-in Serdes
- Importing Data
- Alter & Drop Commands
- Data Querying
Session 12 - Hive:
- Using MR Scripts
- Hive Joins
- Sub Queries
- Views
- UDFs
Session 13 - HBase:
- Introduction to NoSql & HBase
- Row & Column oriented storage
- Characteristics of a huge DB
- What is HBase?
- HBase Data-Model
- HBase vs RDBMS
- HBase architecture
- HBase in operation
- Loading Data into HBase
- HBase shell commands
- HBase operations through Java
- HBase operations through MR
Session 14 - ZooKeeper & Oozie:
- Introduction to Zookeeper
- Distributed Coordination
- Zookeeper Data Model
- Zookeeper Service
- Zookeeper in HBase
- Introduction to Oozie
- Oozie workflow
Session 15 - Sqoop:
- Introduction to Sqoop
- Sqoop design
- Sqoop Commands
- Sqoop Import & Export Commands
- Sqoop Incremental load Commands
Session 16 - Hadoop 2.0 & YARN:
- Hadoop 1 Limitations
- HDFS Federation
- NameNode High Availability
- Introduction to YARN
- YARN Applications
- YARN Architecture
- Anatomy of an YARN application
About Course Provider
Knowasap provides best online self learning SAP courses and high end technologies courses that maximizes learning outcomes and career opportunity for professionals and as well as students. Experienced consultants, project team members, support professionals, end users, executives and students will find courses to meet their needs that are accessible anytime, anywhere.
How to enroll?
You can book the course instantly by paying on GulfTalent.