Prepare Interview

Mock Exams

Make Homepage

Bookmark this page

Subscribe Email Address

Question: What is the difference between a DataFrame and an RDD in Spark?
Answer: A DataFrame is a distributed collection of data organized into named columns, similar to a relational table. An RDD (Resilient Distributed Dataset) is a low-level abstraction representing a distributed collection of objects.

Example:

val df = spark.read.json("/path/to/data.json")
Is it helpful? Yes No

Most helpful rated by users:

©2025 WithoutBook