Solid understanding and experience, with core tools, in any field promote excellence and innovation. Apache Spark, as a general engine for large scale data processing, is such a tool within the big data realm. This learning path addresses the fundamentals of this program's design and its application in the everyday.
Ever waited over night to run a report and to come back to your computer in the morning to find it still running. When the heat is on and you have a deadline, something is not working. With larger and larger data sets you need to be fluent in the right tools to be able to make your commitments. This learning path is your opportunity to learn from industry leaders about Spark. This path provides hands on opportunities and projects to build your confidence within this tool set.
Ignite your interest in Apache Spark with an introduction to the core concepts that make this general processor an essential tool set for working with Big Data. Get hands-on experience with Spark in our lab exercises, hosted in the cloud.
Building on your foundational knowledge of Spark, take this opportunity to move your skills to the next level. With a focus on Spark Resilient Distributed Data Set operations this course exposes you to concepts that are critical to your success in this field.
Spark provides a machine learning library known as MLlib. Spark MLlib provides various machine learning algorithms such as classification, regression, clustering, and collaborative filtering. It also provides tools such as featurization, pipelines, persistence, and utilities for handling linear algebra operations, statistics and data handling.
Apache Spark provides a graph-parallel computation library in GraphX. Graph-parallel is a paradigm that allows representation of your data as vertices and edges. Spark GraphX provides a set of fundamental operators in addition to a growing collection of algorithms and builders to simplify graph analytics tasks.
Spark Fundamentals I
Spark Fundamentals II
Exploring Spark's GraphX