VOOZH about

URL: https://dzone.com/articles/the-complete-apache-spark-collection-tutorials-and

⇱ The Complete Apache Spark Collection [Tutorials and Articles]


Related

  1. DZone
  2. Data Engineering
  3. Big Data
  4. The Complete Apache Spark Collection [Tutorials and Articles]

The Complete Apache Spark Collection [Tutorials and Articles]

By Dec. 02, 19 Β· Presentation
Likes
Comment
Save
59.3K Views

Join the DZone community and get the full member experience.

Join For Free

In this edition of "Best of DZone," we've compiled our best tutorials and articles on one of the most popular analytics engines for data processing, Apache Spark. Whether you're a beginner or are a long-time user, but have run into inevitable bottlenecks, we've got your back!

Before we begin, we'd like need to thank those who were a part of this article. DZone has and continues to be a community powered by contributors like you who are eager and passionate to share what they know with the rest of the world. 

Let's get started!

Getting Started

Installation

  • Apache Spark on Windows by Kuldeep Singh β€” If you were confused by Spark's quick-start guide, this article contains resolutions to the more common errors encountered by developers.

  • Apache Spark Tutorial (Fast Data Architecture Series) by Bill Ward β€” In this article, a data scientist and developers gives an Apache Spark tutorial that demonstrates how to get Apache Spark installed.

Theory

Spark vs Kafka vs Flink

Streaming and Structured Streaming

Spark Clusters

Databases, RDDs, and DataFrames

Performance Optimization

PySpark Tutorials

Scala and Spark

Spark and Machine Learning

  • Churn Prediction With Apache Spark Machine Learning by Carol McDonald β€” Learn how to get started using Apache Spark’s machine learning decision trees and machine learning pipelines for classification.

  • Predictive Analytics With Spark ML by David Moyers β€” Whether you're running Spark on a large cluster or embedded within a single node app, Spark makes it easy to create predictive analytics with just a few lines of code.

  • Data Clustering Using Apache Spark by Konur Unyelioglu β€” This article looks at the analysis of cancer survival using K-means and Gaussian Mixture algorithms.

  • A Glimpse at the Future of Apache Spark 3.0 With Deep Learning and Kubernetes by Oliver White β€” Learn how Spark 3.0, Kubernetes, and deep learning all come together.

No One Puts Baby in a Container

Miscellaneous

  • Quick Start With Apache Livy by Guglielmo Iozzia β€” Learn how to get started with Apache Livy, a project in the process of being incubated by Apache that interacts with Apache Spark through a REST interface.

  • Example ETL Application Using Apache Spark and Hive by Emrah Mete β€” In this article, we'll read a sample data set with Spark on HDFS (Hadoop File System), do a simple analytical operation, then write to a table that we'll make in Hive.

  • Game Theory With Apache Spark Part 1, Part 2, Part 3, and Part 4 by Konur Unyelioglu β€” Go in-depth on Game Theory with Apache Spark in this four-part series.

Be a Part of the Conversation!

Think we missed something? Want to contribute? Let us know in the comments below... or, join the conversation by becoming a member of our community of thousands of developers eager to share their knowledge and passion for programming with others.


Further Reading

Apache Spark Machine learning Big data Data science Docker (software) Database clustering application Kubernetes pyspark

Opinions expressed by DZone contributors are their own.

Related

  • The Magic of Apache Spark in Java
  • Building an ETL Pipeline With Airflow and ECS
  • Snowflake vs. Databricks: How to Choose the Right Data Platform
  • Comparing Pandas, Polars, and PySpark: A Benchmark Analysis

Partner Resources

Γ—

Comments

The likes didn't load as expected. Please refresh the page and try again.

Let's be friends: