Analyzing Big Data in R using Apache Spark

Loading...
icon

icon
Loading...
course-icon

Course

org-logo

Analyzing Big Data in R using Apache Spark

Online Live Classes

flag-icon

Starts on

Sep 30, 2022

time-icon

Duration

fee-icon

Fee

$0

Loading...

Master Apache Spark, a popular cluster computing framework used for performing large scale data analysis. SparkR provides a distributed data frame API that enables structured data processing with a syntax familiar to R users.

  • Learn why R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks.
  • Learn how SparkR, an R package that provides a light-weight frontend, uses Apache Spark from R.

  • Module 1 - Introduction to SparkR
    • Learn what SparkR is
    • Understand why you would use SparkR
    • List the features of SparkR
    • Understand the interfaces into SparkR
  • Module 2 - Data manipulation in SparkR
    • Understand how to use dataframes
    • Learn to select data
    • Learn to filter data
    • Learn to aggregate data
    • Learn to operate on columns
    • Understand how to write SQL queries
  • Lab 1 - Getting started with SparkR
  • Lab 2 - Data manipulation in SparkR
  • Module 3 - Machine learning in SparkR
    • Understand machine learning
    • Learn how to use GLM model
  • Lab 3 - Linear models in SparkR

  • None.

  • None.

Course Staff Image #1
Alan Barnes

Alan Barnes is a Senior IBM Information Management Course Developer / Consultant. He has worked in several companies as a Senior Technical Consultant, Database Team Manager, Application Programmer, Systems Programmer, Business Analyst, DB2 Team Lead and more. His career in IT spans more than 35 years.