List: Big data | Curated by sarath chandra

Feb 19, 2025
14 stories
Big data 
In
Data Engineer Things
by
B V Sarath Chandra
15 Databricks Interview Questions | Medium To Hard LevelMust read for DE Databricks Interview.
Feb 13
1
Feb 13
1
In
DataVidhya
by
Darshil Parmar
Understand Apache Airflow Like Never BeforeIn the world of data engineering, one of the most critical tasks you’ll encounter is building data pipelines.
Jan 29
8
Jan 29
8
In
Data Engineer Things
by
Sahil Sharma
Apache Spark Interview Scenarios: Key Configurations Every Data Engineer Should KnowOptimizing Apache Spark: Essential Configurations for Performance, Resource Management, and Scalability.
Dec 15, 2024
1
Dec 15, 2024
1
In
Dev Genius
by
Muttineni Sai Rohith
Understanding Parquet Files | Efficient Data StorageIn the modern data-driven world, efficiency in data storage and retrieval is paramount. As datasets grow in size, traditional file formats…
Jan 5
Jan 5
Prem Vishnoi(cloudvala)
Apache Spark MindmapApache Spark Mindmap
May 29, 2024
May 29, 2024
In
Dev Genius
by
Sutanu Dutta
Why Kafka ditched ZookeeperFor many years, Apache Kafka relied on Apache ZooKeeper to manage metadata, cluster configurations and maintain a distributed state across…
Nov 3, 2024
Nov 3, 2024
Mayurkumar Surani
Ace Your Data Engineering Interview: 20 Questions and Answers to Land Your Dream JobSo, you’re gearing up for a data engineering interview? Congratulations! It’s an exciting field with tons of opportunity. But let’s be…
Sep 29, 2024
Sep 29, 2024
In
The Resume Whisperer
by
KudosWall
Why Data Professionals Need a Portfolio (and How to Create One)For a data professional, having a well-crafted resume is essential, but it might not be enough. Whether you’re a Data Engineer, Data…
Oct 21, 2024
1
Oct 21, 2024
1
In
Data Engineer Things
by
Vu Trinh
I spent 8 hours learning the details of the Apache Spark scheduling process.Anatomy of a Spark job and the typical scheduling process.
Oct 29, 2024
Oct 29, 2024
Irem Ertürk
Stream Processing with Python: Part 2: Kafka Producer-Consumer with Avro Schema and Schema RegistryIn Part 2 of Stream Processing with Python series, we will deal with a more structured way of managing the messages with the help of…
Jul 26, 2024
Jul 26, 2024
In
Netflix TechBlog
by
Netflix Technology Blog
Maestro: Netflix’s Workflow OrchestratorBy Jun He, Natallia Dzenisenka, Praneeth Yenugutala, Yingyi Zhang, and Anjali Norwood
Jul 22, 2024
12
Jul 22, 2024
12
Swathi Thokala
YouTube Trend Analysis Pipeline: ETL with Airflow, Spark, S3 and DockerIn this article, we will walk through creating an automated ETL (Extract, Transform, Load) pipeline using Apache Airflow and PySpark. This…
Jun 18, 2024
2
Jun 18, 2024
2
In
Data Engineer Things
by
Vu Trinh
Apache Kafka — OverviewThe terminology and the architecture.
Jul 6, 2024
12
Jul 6, 2024
12
In
The Deep Hub
by
Vu Trinh
How does LinkedIn process 4 Trillion Events every day?Key insights on how LinkedIn leverages Apache Beam for real-time processing
Jun 10, 2024
5
Jun 10, 2024
5