Skip to main content

Simplilearn: Big Data Specialist


Big Data Specialist

Amazon Web Services (AWS)

This course will be an online self-paced period of instruction in Big Data Hadoop that lets you master the concepts of the Hadoop framework and prepares you for Cloudera's CCA175 Big data certification. It is appropriate for any learner with no existing skills or prerequisites in these areas who has the time and willingness to put in a minimum of 5 hours of study-time a week.

Download brochure Download PDF of Big Data Specialist



»Product Demo & Course Description

Product Demo

For exclusive 3-day access to a product demo, send an email to with your name and email address in the body of the email. 

Big Data Specialist Course Description

The world is getting increasingly digital, and this means big data is here to stay. In fact, the importance of big data and data analytics is going to continue growing in the coming years. Choosing a career in the field of big data and analytics might just be the type of role that you have been trying to find to meet your career expectations.

Professionals who are working in this field can expect an impressive salary, with the median salary for data scientists being $116,000. Even those who are at the entry level will find high salaries, with average earnings of $92,000. As more and more companies realize the need for specialists in big data and analytics, the number of these jobs will continue to grow. Close to 80% of data scientists say there is currently a shortage of professionals working in the field.

Big Data Hadoop training will enable you to master the concepts of the Hadoop framework and its deployment in a cluster environment.

By the end of this Big Data Hadoop training you will be able to:

  • Understand the different components of Hadoop ecosystem such as Hadoop 2.7, Yarn, MapReduce, Pig, Hive, Impala, HBase, Sqoop, Flume, and Apache Spark with this Hadoop course.
  • Understand Hadoop Distributed File System (HDFS) and YARN architecture, and learn how to work with them for storage and resource management
  • Understand MapReduce and its characteristics and assimilate advanced MapReduce concepts
  • Ingest data using Sqoop and Flume
  • Create database and tables in Hive and Impala, understand HBase, and use Hive and Impala for partitioning
  • Understand different types of file formats, Avro Schema, using Arvo with Hive, and Sqoop and Schema evolution
  • Understand Flume, Flume architecture, sources, flume sinks, channels, and flume configurations
  • Understand and work with HBase, its architecture and data storage, and learn the difference between HBase and RDBMS
  • Gain a working knowledge of Pig and its components
  • Do functional programming in Spark, and implement and build Spark applications
  • Understand resilient distribution datasets (RDD) in detail
  • Gain an in-depth understanding of parallel processing in Spark and Spark RDD optimization techniques
  • Understand the common use cases of Spark and various interactive algorithms
  • Learn Spark SQL, creating, transforming, and querying data frames
  • Prepare for Cloudera CCA175 Big Data certification

If you're looking to master functions of Big Data and Hadoop, a core fundamental to your training will be to understand Core Java.

At the conclusion of this course, you will understand the methods related to Big Data and Java as well as a basic understanding of Java 8 and appropriate use cases.


»Curriculum & Course Details

The online Big Data Specialist Certificate program is comprised of the following mandatory module:

  • Big Data and Hadoop Spark Developer
  • Core Java Training Course

Big Data and Hadoop Spark Developer

The Big Data Hadoop training course lets you master the concepts of the Hadoop framework and prepares you for Cloudera's CCA175 Big data certification. With our online Hadoop training, you'll learn how the components of the Hadoop ecosystem, such as Hadoop 2.7, Yarn, MapReduce, HDFS, Pig, Impala, HBase, Flume, Apache Spark, etc. fit in with the Big Data processing lifecycle. Implement real life projects in banking, telecommunication, social media, insurance, and e-commerce on CloudLab.

Course Outline

Introduction: Course Information

Lesson One: Introduction to Big Data and Hadoop Ecosystem

Lesson Two: HDFS and YARN

Lesson Three: MapReduce and Sqoop

Lesson Four: Basics of Hive and Impala

Lesson Five: Working with Hive and Impala

Lesson Six: Types of Data Formats

Lesson Seven: Advanced Hive Concept and Date File Partitioning

Lesson Eight: Apache Flume and HBase

Lesson Nine: Pig

Lesson Ten: Basics of Apache Spark

Lesson Eleven: RDDs in Spark

Lesson Twelve: Implementation of Spark Applications

Lesson Thirteen: Spark Parallel Processing

Lesson Fourteen: Spark RDD Optimization Techniques

Lesson Fifteen: Spark Algorithm

Lesson Sixteen: Spark SQL

Core Java Training Course

If you’re looking to master functions of Big Data and Hadoop, a core fundamental to your training will be to understand Core Java. Java by Oracle, is used in a variety of platforms from gaming consoles, laptops and mobile technology. Java is considered a central platform due to having its own runtime environment. At the conclusion of this course, you will understand the methods related to Big Data and Java as well as a basic understanding of Java 8 and appropriate use cases.

Lesson One: Introduction to Java

Lesson Two: Working with Java Variables

Lesson Three: Java Operators and Decision Constructs

Lesson Four: Using Loop Constructs in Java

Lesson Five: Creating and Using Array

Lesson Six: Methods and Encapsulation

Lesson Seven: Inheritance

Lesson Eight: Exception Handling

Lesson Nine: Work with Selected Classes from the Java API

Lesson Ten: Additional Topics

Lesson Eleven: JDBC

Lesson Twelve: Miscellaneous and Unit Testing

Lesson Thirteen: Introduction to Java 8

Lesson Fourteen: Lambda Expression




  • Be at least 18 years of age
  • Possess word processing and internet skills.
  • Be fluent in the English language (including reading and writing).
  • Have a familiarity with computers, how online programs work, and comfort in using them.
  • Possess reliable internet access and a valid email account. (Personal email accounts are preferred as they are less likely to trigger fire walls and spam filters.)
  • Meet the computer technical requirements specifications listed on this site. 
  • Pay tuition in full.
  • Sign and return program Terms of Use form.  Program access will not be granted until form has been returned via email to
  • Test drive the program! Send an email to with your name and email address in the body of the email. A password will be sent via email granting time-limited access to the product demo.

Admission is discretionary. The office of Community Education and SMC Extension, requires students be at least 18 years of age and meet minimum suitability standards. Students are not matriculated Santa Monica College students and student privileges do not apply to Communiy Education and Extension students.

Santa Monica College Community Education and SMC Extension reserves the exclusive right, at its sole and absolute discretion, to withhold registration or require withdrawal from the program of any student or applicant.  

The materials used in this course, including video lessons, workbooks, quizzes and tests, are copyrighted. They are intended for individual use by registered students.  Unauthorized or illegal use, or attempted unauthorized or illegal use, of these materials may result in a suspension and/or cancellation of the user's account.

Certificate requirements

Step 1: Complete Required Modules

  • Big Data and Hadoop Spark Developer
  • Core Java Training Course (Optional)

Step 2: Attain a minimum score of 80% on the final assessment test.

Quizzes and Testing

The quizzes located throughout each module test students' knowledge to help prepare them for the cumulative test at the end of each course.  Quizzes are not graded assignments, but they will help students determine their level of understanding prior to taking the final assessment for each module.During this time, students may access other modules and continue their learning. 

One graded test per course or sub-course is required, and students must pass with at least 80% accuracy.  Tests are timed and are in a multiple choice format.  Students will receive their grade upon completion of each test.  Students may attempt the module quizzes as many times as they like, but after three final test attempts they must request additional attempts individually.


Technical requirements

The following minimum technical requirements are supported*:

  • Windows XP, Windows 7, Windows 8, 1GB RAM, Flash player 11
  • Mac OS X, 1GB RAM, Flash player 11
  • iOS 8,  Android 4 (for apps)

Students must also have the following:

  • Adobe Acrobat Reader
  • Reliable internet connection with a bandwidth of at least 1Mpbs (typical DSL is 1.5 to 6Mbs) and web browser (Firefox, Chrome or Safari)
  • E-mail account (to be able to register and to receive e-mail from the system regarding registration, course status, etc.)

*Course materials, such as streaming video lessons, are generally accessible via smart phones, but some functionality may be limited or unavailable.  Other devices such as Blackberries may not be supported. 

Technical questions should be directed to our affiliate, Simplilearn, by sending an email to Hours of support are from 8:00 am to 5:00 pm Monday through Friday, Pacific Time.  Technical support issues submitted prior to 2:00 pm Monday through Friday will typically be handled the same day, and in most cases no later than the next business day.

»Schedule & Registration


Registration is ongoing; therefore students can begin the program when it is convenient for them! 

Note: our enrollment/registration system is a separate external system from that used for your online training and therefore access to online materials is not an automated process upon your registration and subsequent payment. Please allow a time delay of up to 5 business days to receive details via email regarding online access. This delay will not impact your "allotted" completion time for the program.

To enroll and receive program access:

  • Register for this program
  • Pay in full
  • Return the program Terms of Use form to our Community Education Program Manager with your signature (typically emailed to the student within five business days upon registering)
  • Upon completing the first three requirements, expect to receive within five (5) business days an email with login instructions.
    • (Please note personal email accounts are preferred as they are less likely to trigger fire walls and spam filters)
  • Once you receive your access email you are able to log in and begin your coursework.
  • Please note your 90-days of access begins on the date the access email is sent to you.
  • There is a calendar within the learning system to assist you with keeping track of time remaining in your access window to complete the program.

Registration Methods

Register by Phone at 310-434-3402 during our business hours (Monday through Friday, 9:00am to 4:00pm Pacific Time).


»Frequently Asked Questions

Who should take this Big Data Hadoop Training Course?

Big Data career opportunities are on the rise, and Hadoop is quickly becoming a must-know technology in Big Data architecture. Big Data training is best suited for IT, data management, and analytics professionals looking to gain expertise in Big Data, including:

  • Software Developers and Architects
  • Analytics Professionals
  • Senior IT professionals
  • Testing and Mainframe Professionals
  • Data Management Professionals
  • Business Intelligence Professionals
  • Project Managers
  • Aspiring Data Scientists
  • Graduates looking to build a career in Big Data Analytics

How long do students have access to this online program?

Students will have access to this program for 60-days from the date the access email is sent from SMC Community Education.

How long does the program take to complete?

It will take about 45 - 50 hours to complete the Big Data Hadoop course certification successfully. it is recommended that students commit between 5 and 10 hours per week to the coursework. Students should plan to begin their studies immediately upon being granted access.

What projects will I complete as part of the course?

The Hadoop Training course includes five real-life, industry-based projects on CloudLab. Successful evaluation of one of the following two projects is a part of the certification eligibility criteria.

What are the pre-requisites for this Hadoop Training Course?

There are no prerequisites for learning this course. However, knowledge of Core Java and SQL will be beneficial, but certainly not a mandate. If you wish to brush up your Core-Java skills, Simplilearn offers a complimentary self-paced course "Java essentials for Hadoop" when you enroll for this course. For Spark, this course uses Python and Scala, and an e-book is provided to support your learning.

Who provides the certification?

Upon successful completion of the Big Data Hadoop certification training, you will be awarded the course completion certificate from Simplilearn.

How do I pass the Big Data Hadoop exam?

Online Self-learning: complete 85% of the course and complete one project and one simulation test with a minimum score of 80%

Project 1: Domain- Banking
A Portuguese banking institution ran a marketing campaign to convince potential customers to invest in a bank term deposit. Their marketing campaigns were conducted through phone calls, and sometimes the same customer was contacted more than once. Your job is to analyze the data collected from the marketing campaign.

Project 2: Domain- Telecommunication
A mobile phone service provider has launched a new Open Network campaign. The company has invited users to raise complaints about the towers in their locality if they face issues with their mobile network. The company has collected the dataset of users who raised a complaint. The fourth and the fifth field of the dataset has a latitude and longitude of users, which is important information for the company. You must find this latitude and longitude information on the basis of the available dataset and create three clusters of users with a k-means algorithm.

For additional practice, we have three more projects to help you start your Hadoop and Spark journey.

Project 3: Domain- Social Media
As part of a recruiting exercise, a major social media company asked candidates to analyze a dataset from Stack Exchange. You will be using the dataset to arrive at certain key insights.

Project 4: Domain- Website providing movie-related information
IMDB is an online database of movie-related information. IMDB users rate movies on a scale of 1 to 5 -- 1 being the worst and 5 being the best -- and provide reviews. The dataset also has additional information, such as the release year of the movie. You are tasked to analyze the data collected.

Project 5: Domain- Insurance
A US-based insurance provider has decided to launch a new medical insurance program targeting various customers. To help a customer understand the market better, you must perform a series of data analyses using Hadoop.

What is Cloudlab?

CloudLab is a cloud-based Hadoop and Spark environment lab that Simplilearn offers with the Hadoop Training course to ensure a hassle-free execution of your hands-on projects. There is no need to install and maintain Hadoop or Spark on a virtual machine. Instead, you'll be able to access a preconfigured environment on CloudLab via your browser. This environment is very similar to what companies are using today to optimize Hadoop installation scalability and availability.

You'll have access to CloudLab from the Simplilearn LMS (Learning Management System) for the duration of the course.

What types of jobs require Big Data Hadoop trained professionals?

The jobs that require Big Data Hadoop trained professionals include:

  • IT professionals
  • Data scientists
  • Data engineers
  • Data analysts
  • Project managers
  • Program managers

How will Big Data Training help your career?

The field of big data and analytics is a dynamic one, adapting rapidly as technology evolves over time. Those professionals who take the initiative and excel in big data and analytics are well-positioned to keep pace with changes in the technology space and fill growing job opportunities. Some trends in big data include: 

  • Global Hadoop Market to Reach $84.6 Billion by 2021 – Allied Market Research
  • Shortage of 1.4 -1.9 million Hadoop Data Analysts in the US alone by 2018– McKinsey
  • Hadoop Administrators in the US receive salaries of up to $123,000 –

For general FAQ-type questions on our Simplilearn online courses, visit General Program Questions.




Ronald van Loon

Top 10 Big Data & Data Science Influencer, Director - Adversitement

Named by Onalytica as one of the three most influential people in Big Data, Ronald is also an author for a number of leading Big Data and Data Science websites, including Datafloq, Data Science Central, and The Guardian. He also regularly speaks at renowned events.

Please note that the monthly mentoring webinars and Q&A forum features are monitored by industry subject matter experts.

Java Traning Instructors: All our trainers are certified and are highly qualified, with more than 10 years of experience in implementing Java.


»Tuition & Funding Sources



Payment is due in full upon registration.

There are no books required for this program. Study guides for each module will be available for self-download.

Group Discount

Do you have a group, SMC or non-SMC entity, interested in training?

Contact the Program Manager at 310-434-3402 for details.

Funding Sources

Our Professional Certificate programs are not-for-credit and therefore, they are NOT eligible for federal education loans. Please note that we do not offer a payment plan for this program.


»Policies & Procedures

Refunds for Online Courses

There are no refunds or cancellations for online courses.

A one-time 10 day extension may be granted to students who demonstrate in writing (via email to the reason required to complete the program.  Requests will be considered by the Program Manager and a response will be delivered via email. It is at the discretion of the Program Manager to determine if the situation warrants an extension.

If an extension is granted the student must complete the program within 10 days of receiving approval by email. No further extensions will be granted. If a student cannot finish the program within that extension window, and desires to finish the program, they will be responsible for purchasing the program again in its entirety at full tuition.

Tax Deductions

Course fees and expenses are sometimes tax deductible. Please consult an accountant concerning this matter. Not-for-credit programs at SMC Extension do not generate 1098-T forms in accordance with IRS guideline.


If you have any questions email us at


  • Class Dates:  Open 
  • Duration: 90 Days 
  • Delivery:  Online,
    Self-paced Learning
  • Tuition:  $699


To Register:
Contact Information

To register or to receive more information about this program, please contact the Program Manager
at 310-434-3402 or email at



Class:  Big Data Hadoop

"The trainer was knowledgeable and patient in explaining things. Many things were significantly easier to grasp with a live interactive instructor. I also like that he went out of his way to send additional information and solutions after the class via email. "

Richard Kershner

Software Developer