Hadoop Administration
About The Course
The Hadoop ClusterAdministration training course is designed to provide knowledge and skills to
become a successful Hadoop Architect. It starts with the fundamental concepts
of Apache Hadoop and Hadoop Cluster. It covers topics to deploy, configure,
manage, monitor, and secure a Hadoop Cluster. The course will also cover HBase
Administration. There will be many challenging, practical and focused hands-on
exercises for the learners. By the end of this Hadoop Cluster Administration
training, you will be prepared to understand and solve real world problems that
you may come across while working on Hadoop Cluster.
Course Objectives
After the completion of 'Hadoop Administration' course at HarshithaTechnology, you should be able to:
1. Get a clear understanding of Apache Hadoop, HDFS, Hadoop
Cluster and Hadoop Administration
2. Gain insight on Hadoop 2.0, Name Node High Availability, HDFS
Federation, YARN, MapReduce v2
3. Plan and Deploy a Hadoop Cluster
4. Load Data and Run Applications
5. Configuration and Performance Tuning
6. Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster
7. Secure a deployment and understand Backup and Recovery
8. Understand about Oozie, Hcatalog/Hive, and HBase Administration
Who should go for this course?
This course is best
suited to systems administrators, windows administrators, linux administrators,
Infrastructure engineers, DB Administrators, Big Data Architects, Mainframe
Professionals and IT managers who are interested in learning Hadoop
Administration.
Why Learn Hadoop Administration?
With the advent of
Hadoop, there comes the need for professionals skilled in Hadoop Administration
making it imperative to be skilled as a Hadoop Admin for better career, salary
and job opportunities.
The following blogs will
help you understand the significance of Hadoop Administration training:
What are the pre-requisites for this Course?
This course assumes no prior knowledge of Apache Hadoop, Hadoop
Cluster Administration or Java.
Basic knowledge of Linux is required as Hadoop runs on Linux.
Harshitha Technology offers a complementary course on "Linux
Fundamentals" to all the Hadoop Administration course participants.
How will I execute the Practicals?
For your practical work,
we will help you set up a Virtual Machine in your System. This will be a local
access for you. You can also create an account on AWS EC2 and use 'Free tier
usage' eligible servers to create your Hadoop Cluster on AWS EC2. Step by step
procedure is documented and shared in LMS. Our 24/7 expert support team will
also be available to assist you.
Which Case-Studies will be a part of the
Course?
Towards the end of the
Course, you will be working on a live project, which will use the different
Hadoop ecosystem components to work together in a Hadoop implementation to
solve Big Data Problems.
1. Setup a minimum 2 Node
Hadoop Cluster
Node 1 - Namenode,
datanode, tasktracker
Node 2 - Jobtracker,
datanode, tasktracker
2. Create a simple text
file and copy to HDFS
Find out the location of
the node to which it went
Find in which data node
the output files are written
3. Create a large text
file and copy to HDFS with block size 256 MB Keep all the other files in
default block size and find how block size has an impact on the performance
4. Set a spaceQuota of
200MB for projects and copy a file of 70MB with replication=2
What is the reason it is
not letting you copy the file?
How will you solve this
problem without increasing the spaceQuota?
5. Configure Rack
Awareness and copy the file to HDFS
Find its rack
distribution and the command used for it
How to change the
replication factor of the existing file
The final certification
project is based on real world use cases as follows:
Problem Statement 1:
1. Setup a Hadoop with
single node or 2 node cluster with all daemons like namenode, datanode,
jobtracker, tasktracker that must run in the cluster with block size =
128MB
2. Write a Namespace ID
for the cluster and create a directory with name space quota as 10 & Space
Quota of 100MB in the directory
3. Use distcp command to
copy the projects to the same cluster and create the list of data nodes
participating in the cluster
Problem statement 2:
1. Save the namespace of
the Namenode, without using secondary namenode and edits file must merge,
without stopping the namenode daemon.
2. Set include file, so
that no other nodes can talk to the namenode
3. Set cluster
Re-balancer threshold to 40%.
4. Set the map and reduce
slots to s4 and 2 respectively for each node
And many sub topics are
there for more details please go through the website.
Please call us for the Demo Classes we have regular batches and weekend batches.
Contact Number: USA: +1-6109903968, INDIA: +91 9989630313,
Email: mhtspvtltd@gmail.com
Web: http://www.harshithatechnology.com
Please call us for the Demo Classes we have regular batches and weekend batches.
Contact Number: USA: +1-6109903968, INDIA: +91 9989630313,
Email: mhtspvtltd@gmail.com
Web: http://www.harshithatechnology.com
No comments:
Post a Comment