Plateforme d'apprentissage pour la preparation aux entretiens, les tests en ligne, les tutoriels et la pratique en direct

Developpez vos competences grace a des parcours cibles, des tests blancs et un contenu pret pour l'entretien.

WithoutBook rassemble des questions d'entretien par sujet, des tests pratiques en ligne, des tutoriels et des guides de comparaison dans un espace d'apprentissage reactif.

Rechercher dans la bibliotheque

Preparation a l'entretien

PySpark Questions et reponses d'entretien

1
2
3
4
5
6

Question 21. What is the purpose of the 'groupBy' operation in PySpark?

'groupBy' is used to group the data based on one or more columns. It is often followed by aggregation functions to perform operations on each group.

Example:

grouped_data = df.groupBy('Category').agg({'Price': 'mean'})

Est-ce utile ? Oui Non Ajouter un commentaire Voir les commentaires

Question 22. Explain the difference between 'cache' and 'persist' operations in PySpark.

'Cache' is a shorthand for 'persist(memory_only=True)', while 'persist' allows more flexibility by specifying storage levels (memory-only, disk-only, etc.).

Example:

df.cache()

Est-ce utile ? Oui Non Ajouter un commentaire Voir les commentaires

Question 23. How can you create a temporary view from a PySpark DataFrame?

You can use the 'createOrReplaceTempView' method to create a temporary view from a PySpark DataFrame.

Example:

df.createOrReplaceTempView('temp_view')

Est-ce utile ? Oui Non Ajouter un commentaire Voir les commentaires

Question 24. What is the purpose of the 'orderBy' operation in PySpark?

'OrderBy' is used to sort the rows of a DataFrame based on one or more columns.

Example:

result = df.orderBy('column')

Est-ce utile ? Oui Non Ajouter un commentaire Voir les commentaires

Question 25. Explain the role of the 'broadcast' variable in PySpark.

A 'broadcast' variable is used to cache a read-only variable in each node of a cluster to enhance the performance of joins.

Example:

from pyspark.sql.functions import broadcast

result = df1.join(broadcast(df2), 'key')

Est-ce utile ? Oui Non Ajouter un commentaire Voir les commentaires

1
2
3
4
5
6

Les plus utiles selon les utilisateurs :

Copyright © 2026, WithoutBook.