-
Enterprise Java👁 Image
Reading and Writing Deeply Partitioned Files in Apache Spark
In large-scale data engineering and analytics, files are often stored in deeply partitioned directories to improve performance and manageability. This…
Read More » -
Enterprise Java👁 java-interview-questions-answers
Real-Time Data Streams: Building Analytics with Kafka and Spark
In today’s fast-paced digital world, businesses demand real-time insights to make critical decisions. Batch processing is no longer enough—organizations want…
Read More » -
Software Development👁 Image
Apache Spark: Unleashing Big Data Power
1. Introduction Apache Spark is a powerful open-source, distributed computing system that has become a cornerstone in the world of…
Read More » -
Software Development👁 Image
Where is Apache Spark heading?
I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks,…
Read More » -
Enterprise Java👁 Image
Long Live ETL
Extract transform load is process for pulling data from one datasystem and loading into another datasystem. Datasystem involved are called…
Read More » -
Enterprise Java👁 Image
Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)
In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can…
Read More » -
Enterprise Java👁 Image
Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 1)
One interesting and promising Open Source project that caught my attention lately is Spline, a data lineage tracking and visualization tool…
Read More » -
Enterprise Java👁 Image
Insights from Spark UI
As continuation of anatomy-of-apache-spark-job post i will share how you can use Spark UI for tuning job. I will continue with same…
Read More » -
Enterprise Java👁 Image
Anatomy of Apache Spark Job
Apache Spark is general purpose large scale data processing framework. Understanding how spark executes jobs is very important for getting most of…
Read More »
