Big Data Analytics using SPARK
With a shifting focus of industry on analyzing Big Data, this module will prepare the students to do exactly that. The emphasis of the module will be on mastering Spark, which emerged as the most important big data processing framework. Beginning with the fundamentals and going on to building important concepts like Spark ML and Spark Streaming (to analyse streaming data).
Participants will gain the ability to initiate and design highly scalable systems that can accept, store and analyse large volumes of data in batch mode or real time.
LEARN
- Statistical concepts [Basics to Advanced levels]
- Exploratory Data Analysis using Python, Excel and Tableau
- Data Importing, Exporting, Manipulation, Cleansing, Analysis, Visualization using Python
- Model Planning, Data Modelling and Model Evaluation using real time case studies
- Various Machine Learning algorithms and its implementation through hands-on in the class
- Communicate the findings through effective Data Visualizations
FEATURES
- Instructor led online LIVE session for entire course duration
- Small batch size– Personalized attention
- Highly Interactive sessions (Two way participation – Chat and Speech)
- Highly experience and Qualified Trainers [Analytics experts, 10+ years industry experience (IITians)
- Access to Session Recordings & Case studies thru Learning Manangement Portal for 2 years
- Course Completion Certification
- Highly approachable faculty – 24*7 support available
- Reattend LIVE sessions – If you miss a Lecture due to some reason
DEMO VIDEOS
COURSE DURATION
48 hours (2 weeks)
SESSION TIMINGS
3:30pm-7:30pm [Sat,Sun]
CURRICULUM
INTRODUCTION TO SPARK
● Introduction to Apache Hadoop
● Overview of Hadoop Ecosystem
● Spark – Introduction
● Spark – Ecosystem Components
SPARK BASICS
● Spark – Features and Use Cases
● Spark – SparkContext
● Spark – Stage
● Spark – Executor
WORKING WITH RDDS IN SPARK
● Spark – RDD
● Spark – Ways to Create RDD
● Spark – RDD Persistence & Caching
● Spark – RDD Features
● Spark – Paired RDD
● Spark – RDD limitations
● Spark – Transformations Actions
● Spark – RDD Lineage
SPARK SQL AND DATAFRAME
● Spark SQL – Introduction
● Spark SQL – Features
● Spark SQL – DataFrame
● Spark SQL – DataSet
● Spark SQL – Optimization
● HIVE Fundamentals
SPARK CONFIGURATION, MONITORING AND TUNING
● Spark – In-Memory Computation
● Spark – Directed Acyclic Graph
● Spark – Cluster Managers
● Spark – Performance Tuning
● RDD vs DataFrame vs DataSet
KAFKA
● Kafka – Introduction
● Kafka Architecture
● Kafka Workflow
● Kafka – Cluster configuration
● Kafka monitoring tools
SPARK STREAMING
● Spark Streaming – Introduction
● Spark Streaming – DStream
● Spark Streaming – Transformations
● Spark Streaming – Checkpointing
● Spark – Batch vs Real Time
APPLYING MACHINE LEARNING ALGORITHMS
● Spark – MLlib
TEACHING METHODOLOGY
- Personalized attention
- LIVE instructor-led training throughout the training duration
- Entirely Hands-On – Case Study based
- Practical Inputs from real-time scenarios
- Lifetime Access to Session Recordings
INSTRUCTOR PROFILE
Surajit has more than 6 years of experience in building Data Pipelines on AWS. He is an expert in BigData, Cloud and Machine Learning. He was part of a team which created Digital Ecosystem (Cloud based solution to manage and build Data Product) for leading insurance provider Globally.
Read More
FAQS
Why Should I Learn Big Data Analytics from Dimensionless?
-
Dimensionless Tech provides best online Big Data Analytics training that provides in-depth course coverage, case study based learning, entirely Hands-on driven sessions with Personalised attention to every participant. We guarantee Learning.
What Are The Various Modes Of Training That you Offer?
-
We provide only instructor-led LIVE online training sessions. We do not provide classroom trainings.
How is your online training better than classroom training?
-
In physical classrooms, students generally feel hesitant to ask questions. Unlike other online courses, we allow you to speak in the session and ask your doubts. The interactivity level is similar to classroom training and you get it at the comfort of your home. If you miss any class or didn’t understand some concepts, you can’t go through the class again. However, in online courses, it’s possible to do that. We share the recordings of all our classes after each class with the student. Also, there’s no hassle of long-distance commuting and disrupting your schedule.
Can I ask my doubts during the session?
-
All participants are encouraged to speak up and ask their doubts. We answer all the doubts with same sincerity.
Is there a hardware requirement for this course?
-
Any laptop with 2GB RAM and Windows 7 and above is perfectly fine for this course. For large data, the access will be given on the online lab.
What if I miss a session, due to some unavoidable situation?
-
We understand that while balancing your personal and professional commitments you might miss a session. Hence, all our sessions are recorded and the recordings are shared with you through our Learning Management Portal.
How long will I have access to the Learning Management Portal?
-
You will have lifetime access to the portal and you can view the Videos, Notes, Books, Assignments as many time
What Kind Of Projects Will I Be Working On As Part Of The Training?
-
During the training you will be solving multiple case studies from different domains. Once the LIVE training is done, you will start implementing your learnings on Real Time Datasets. You can work on data from various domains like Retail, Manufacturing, Supply Chain, Operations, Telecom, Oil and Gas and many more. You would be working on multiple projects so that you can gain enough content and confidence to enter into the field of Data Science.
Do You Provide Placement Assistance?
-
Yes, we provide you with real-time industry requirements on a daily basis through our connect in the industry. These requirements generally come through referral channels, hence the probability to get through increases manifold. The HR from the team, helps you with Resume Building and Interview Preparation as well.
Do I get a Course Completion Certificate?
-
Yes, we will be issuing a course completion certification to all individuals who successfully complete the training.
POPULAR COURSES
Machine Learning Specialization
Duration: 40 Hours
AI and Data Science Specialization
Duration: 25 Weeks
Deep Learning
Duration: 8 Weeks