가장 많이 묻는 면접 질문과 답변 & 온라인 테스트
면접 준비, 온라인 테스트, 튜토리얼, 라이브 연습을 위한 학습 플랫폼

집중 학습 경로, 모의고사, 면접 준비 콘텐츠로 실력을 키우세요.

WithoutBook은 주제별 면접 질문, 온라인 연습 테스트, 튜토리얼, 비교 가이드를 하나의 반응형 학습 공간으로 제공합니다.

Prepare Interview

PySpark 면접 질문과 답변

Ques 6. Explain the concept of a SparkSession in PySpark.

SparkSession is the entry point to any PySpark functionality. It is used to create DataFrames, register DataFrames as tables, and execute SQL queries.

Example:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('example').getOrCreate()

도움이 되었나요? Add Comment View Comments
 

Ques 7. What is the purpose of the 'cache' operation in PySpark?

The 'cache' operation is used to persist a DataFrame or RDD in memory, enhancing the performance of iterative algorithms or repeated operations.

Example:

df.cache()

도움이 되었나요? Add Comment View Comments
 

Ques 8. How can you handle missing or null values in a PySpark DataFrame?

You can use the 'na' functions like 'drop' or 'fill' to handle missing values in a PySpark DataFrame.

Example:

df.na.drop()

도움이 되었나요? Add Comment View Comments
 

Ques 9. Explain the purpose of the 'collect' action in PySpark.

The 'collect' action retrieves all elements of a distributed dataset (RDD or DataFrame) and brings them to the driver program.

Example:

data = df.collect()

도움이 되었나요? Add Comment View Comments
 

Ques 10. What is the role of the 'broadcast' variable in PySpark?

A 'broadcast' variable is used to cache a read-only variable in each node of a cluster to enhance the performance of joins.

Example:

from pyspark.sql.functions import broadcast

result = df1.join(broadcast(df2), 'key')

도움이 되었나요? Add Comment View Comments
 

Most helpful rated by users:

Copyright © 2026, WithoutBook.