9923170071 / 8108094992 info@dimensionless.in

Big Data Analytics using SPARK

With a shifting focus of industry on analyzing Big Data, this module will prepare the students to do exactly that. The emphasis of the module will be on mastering Spark, which emerged as the most important big data processing framework. Beginning with the fundamentals and going on to building important concepts like Spark ML and Spark Streaming (to analyse streaming data).

Participants will gain the ability to initiate and design highly scalable systems that can accept, store and analyse large volumes of data in batch mode or real time.

LEARN

  • Statistical concepts [Basics to Advanced levels]
  • Exploratory Data Analysis using Python, Excel and Tableau
  • Data Importing, Exporting, Manipulation, Cleansing, Analysis, Visualization using Python
  • Model Planning, Data Modelling and Model Evaluation using real time case studies
  • Various Machine Learning algorithms and its implementation through hands-on in the class
  • Communicate the findings through effective Data Visualizations

FEATURES

  • Instructor led online LIVE session for entire course duration
  • Small batch size– Personalized attention
  • Highly Interactive sessions (Two way participation – Chat and Speech)
  • Highly experience and Qualified Trainers [Analytics experts, 10+ years industry experience (IITians)
  • Access to Session Recordings & Case studies thru Learning Manangement Portal for 2 years
  • Course Completion Certification
  • Highly approachable faculty – 24*7 support available
  • Reattend LIVE sessions – If you miss a Lecture due to some reason

DEMO VIDEOS

COURSE DURATION

48 hours (2 weeks)

SESSION TIMINGS


3:30pm-7:30pm [Sat,Sun]

CURRICULUM

INTRODUCTION TO SPARK

●  Introduction to Apache Hadoop

●  Overview of Hadoop Ecosystem

●  Spark – Introduction

●  Spark – Ecosystem Components

SPARK BASICS

●  Spark – Features and Use Cases

●  Spark – SparkContext

●  Spark – Stage

●  Spark – Executor

WORKING WITH RDDS IN SPARK

●  Spark – RDD

●  Spark – Ways to Create RDD

●  Spark – RDD Persistence & Caching

●  Spark – RDD Features

●  Spark – Paired RDD

●  Spark – RDD limitations

●  Spark – Transformations Actions

●  Spark – RDD Lineage

SPARK SQL AND DATAFRAME

●  Spark SQL – Introduction

●  Spark SQL – Features

●  Spark SQL – DataFrame

●  Spark SQL – DataSet

●  Spark SQL – Optimization

●  HIVE Fundamentals

SPARK CONFIGURATION, MONITORING AND TUNING

●  Spark – In-Memory Computation

●  Spark – Directed Acyclic Graph

●  Spark – Cluster Managers

●  Spark – Performance Tuning

●  RDD vs DataFrame vs DataSet

KAFKA

●  Kafka – Introduction

●  Kafka Architecture

●  Kafka Workflow

●  Kafka – Cluster configuration

● Kafka monitoring tools

SPARK STREAMING

●  Spark Streaming – Introduction

●  Spark Streaming – DStream

●  Spark Streaming – Transformations

●  Spark Streaming – Checkpointing

●  Spark – Batch vs Real Time

APPLYING MACHINE LEARNING ALGORITHMS

● Spark – MLlib

TEACHING METHODOLOGY

  • Personalized attention
  • LIVE instructor-led training throughout the training duration
  • Entirely Hands-On – Case Study based
  • Practical Inputs from real-time scenarios
  • Lifetime Access to Session Recordings

INSTRUCTOR PROFILE

Surajit has more than 6 years of experience in building Data Pipelines on AWS. He is an expert in BigData, Cloud and Machine Learning. He was part of a team which created Digital Ecosystem (Cloud based solution to manage and build Data Product) for leading insurance provider Globally.

Read More

Pranali is a professional Data Science Trainer with more than 15 years of experience in the teaching various training programs on Databases, Programing and Machine Learning.
Her core competency include Databases, Data Science and Big Data. She holds a Masters degree in Computer Engineering from University of Pune.

FAQS

Why Should I Learn Big Data Analytics from Dimensionless?

  • Dimensionless Tech provides best online Big Data Analytics training that provides in-depth course coverage, case study based learning, entirely Hands-on driven sessions with Personalised attention to every participant. We guarantee Learning.

What Are The Various Modes Of Training That you Offer?

  • We provide only instructor-led LIVE online training sessions. We do not provide classroom trainings.

How is your online training better than classroom training?

  • In physical classrooms, students generally feel hesitant to ask questions. Unlike other online courses,  we allow you to speak in the session and ask your doubts. The interactivity level is similar to classroom training and you get it at the comfort of your home. If you miss any class or didn’t understand some concepts, you can’t go through the class again. However, in online courses, it’s possible to do that. We share the recordings of all our classes after each class with the student. Also, there’s no hassle of long-distance commuting and disrupting your schedule.

Can I ask my doubts during the session?

  • All participants are encouraged to speak up and ask their doubts. We answer all the doubts with same sincerity.

Is there a hardware requirement for this course?

  • Any laptop with 2GB RAM and Windows 7 and above is perfectly fine for this course. For large data, the access will be given on the online lab.

What if I miss a session, due to some unavoidable situation?

  • We understand that while balancing your personal and professional commitments you might miss a session. Hence, all our sessions are recorded and the recordings are shared with you through our Learning Management Portal.

How long will I have access to the Learning Management Portal?

  • You will have lifetime access to the portal and you can view the Videos, Notes, Books, Assignments as many time

What Kind Of Projects Will I Be Working On As Part Of The Training?

  • During the training you will be solving multiple case studies from different domains. Once the LIVE training is done, you will start implementing your learnings on Real Time Datasets.  You can work on data from various domains like Retail, Manufacturing, Supply Chain, Operations, Telecom, Oil and Gas and many more. You would be working on multiple projects so that you can gain enough content and confidence to enter into the field of Data Science.

Do You Provide Placement Assistance?

  • Yes, we provide you with real-time industry requirements on a daily basis through our connect in the industry. These requirements generally come through referral channels, hence the probability to get through increases manifold. The HR from the team, helps you with Resume Building and Interview Preparation as well.

Do I get a Course Completion Certificate?

  • Yes, we will be issuing a course completion certification to all individuals who successfully complete the training.

POPULAR COURSES

Machine Learning Specialization

Duration: 40 Hours

Deep Learning

Duration: 8 Weeks