Python Pandas Interview Questions and Answers
Freshers / Beginner level questions & answers
Ques 1. What is Pandas in Python?
Pandas is an open-source data manipulation and analysis library for Python.
Ques 2. How do you import the Pandas library?
import pandas as pd
Ques 3. How do you create a DataFrame in Pandas?
pd.DataFrame(data)
Example:
df = pd.DataFrame({'column1': [1, 2, 3], 'column2': ['a', 'b', 'c']})
Ques 4. How do you select specific columns from a DataFrame?
df[['column1', 'column2']]
Ques 5. How can you apply a function to each element in a DataFrame?
Use the apply function. df.apply(my_function)
Ques 6. How can you rename columns in a Pandas DataFrame?
Use the rename function. df.rename(columns={'old_name': 'new_name'})
Ques 7. Explain the difference between Series and DataFrame in Pandas.
A Series is a one-dimensional labeled array, and a DataFrame is a two-dimensional table.
Ques 8. How do you convert a Pandas DataFrame to a NumPy array?
Use the values attribute. df.values
Ques 9. How can you reset the index of a Pandas DataFrame?
Use the reset_index function. df.reset_index()
Ques 10. How do you sort a Pandas DataFrame by a specific column?
Use the sort_values function. df.sort_values(by='column')
Ques 11. What is the purpose of the to_csv function in Pandas?
to_csv is used to write a DataFrame to a CSV file.
Example:
df.to_csv('output.csv', index=False)
Ques 12. How do you check for the existence of a specific value in a Pandas DataFrame?
Use the isin function. df['column'].isin([value])
Ques 13. What is the purpose of the read_csv function in Pandas?
read_csv is used to read data from a CSV file into a DataFrame.
Example:
df = pd.read_csv('file.csv')
Ques 14. Explain the use of the describe function in Pandas.
describe generates descriptive statistics of a DataFrame, excluding NaN values.
Example:
df.describe()
Ques 15. How can you drop columns from a Pandas DataFrame?
Use the drop function. df.drop(['column1', 'column2'], axis=1)
Ques 16. How do you handle duplicate values in a Pandas DataFrame?
Use the drop_duplicates() function. df.drop_duplicates()
Ques 17. Explain the purpose of the to_datetime() function in Pandas.
to_datetime() is used to convert the argument to datetime.
Example:
df['date_column'] = pd.to_datetime(df['date_column'])
Ques 18. How do you change the data type of a Pandas Series or DataFrame column?
Use the astype() function. df['column'] = df['column'].astype('new_dtype')
Ques 19. Explain the purpose of the nlargest() function in Pandas.
nlargest() returns the first n largest elements from a DataFrame or Series.
Example:
df.nlargest(5, 'column')
Ques 20. How can you create a Pandas DataFrame from a dictionary of Series or dictionaries?
Use the pd.DataFrame() constructor. df = pd.DataFrame({'column1': series1, 'column2': series2})
Ques 21. What is the purpose of the to_excel() function in Pandas?
to_excel() is used to write a DataFrame to an Excel file.
Example:
df.to_excel('output.xlsx', index=False)
Ques 22. How do you calculate the correlation matrix for a Pandas DataFrame?
Use the corr() function. df.corr()
Intermediate / 1 to 5 years experienced level questions & answers
Ques 23. Explain the DataFrame in Pandas.
A DataFrame is a 2-dimensional labeled data structure with columns that can be of different types. It is similar to a spreadsheet or SQL table.
Ques 24. What is the difference between loc and iloc in Pandas?
loc is label-based indexing, and iloc is integer-based indexing.
Ques 25. Explain the use of the groupby function in Pandas.
groupby is used to split the data into groups based on some criteria and then apply a function to each group independently.
Example:
df.groupby('column1').mean()
Ques 26. How do you handle missing data in a DataFrame?
df.dropna() or df.fillna(value)
Ques 27. What is the purpose of the merge function in Pandas?
merge is used to combine two DataFrames based on a common column or index.
Example:
pd.merge(df1, df2, on='common_column')
Ques 28. What is the purpose of the melt function in Pandas?
melt is used to transform wide-format data to long-format data.
Example:
pd.melt(df, id_vars=['id_column'], value_vars=['value_column'])
Ques 29. Explain the concept of broadcasting in Pandas.
Broadcasting is the ability of NumPy and Pandas to perform operations on arrays or DataFrames of different shapes.
Ques 30. What is the purpose of the concat function in Pandas?
concat is used to concatenate DataFrames along a particular axis.
Example:
pd.concat([df1, df2], axis=1)
Ques 31. What is the purpose of the nunique function in Pandas?
nunique returns the number of unique elements in a Series or DataFrame.
Example:
df['column'].nunique()
Ques 32. Explain the use of the cut function in Pandas.
cut is used to segment and sort data values into bins.
Example:
pd.cut(df['column'], bins=[0, 25, 50, 75, 100])
Ques 33. Explain the concept of method chaining in Pandas.
Method chaining is a way of applying multiple operations on a DataFrame in a single line of code.
Example:
df.dropna().mean()
Ques 34. What is the purpose of the iterrows() function in Pandas?
iterrows() is used to iterate over DataFrame rows as (index, Series) pairs.
Example:
for index, row in df.iterrows():
print(index, row['column'])
Ques 35. Explain the use of the get_dummies() function in Pandas.
get_dummies() is used to convert categorical variable(s) into dummy/indicator variables.
Example:
pd.get_dummies(df['column'])
Ques 36. What is the difference between Series.value_counts() and DataFrame['column'].value_counts()?
Series.value_counts() returns the counts of unique values in a Series, while DataFrame['column'].value_counts() returns counts for a specific column.
Ques 37. What is the purpose of the pd.to_numeric() function?
pd.to_numeric() is used to convert argument to a numeric type.
Example:
df['column'] = pd.to_numeric(df['column'], errors='coerce')
Ques 38. Explain the use of the pd.cut() function with the `bins` parameter.
pd.cut() is used to segment and sort data values into bins. The `bins` parameter defines the bin edges.
Example:
pd.cut(df['column'], bins=[0, 25, 50, 75, 100])
Ques 39. How can you merge two DataFrames based on multiple columns?
Use the on parameter with a list of column names. pd.merge(df1, df2, on=['column1', 'column2'])
Ques 40. How do you pivot a Pandas DataFrame using the pivot() function?
Use the pivot() function to reshape the DataFrame based on column values.
Example:
df.pivot(index='index_column', columns='column_to_pivot', values='value_column')
Ques 41. What is the purpose of the crosstab() function in Pandas?
crosstab() computes a simple cross-tabulation of two (or more) factors.
Example:
pd.crosstab(df['factor1'], df['factor2'])
Ques 42. How do you apply a custom function to each element in a Pandas DataFrame?
Use the applymap() function. df.applymap(my_function)
Ques 43. Explain the concept of method chaining in Pandas.
Method chaining is a way of applying multiple operations on a DataFrame in a single line of code.
Example:
df.dropna().mean()
Experienced / Expert level questions & answers
Ques 44. Explain the pivot_table function in Pandas.
pivot_table is used to create a spreadsheet-style pivot table as a DataFrame.
Example:
pd.pivot_table(df, values='value', index='index_column', columns='column_to_pivot')
Ques 45. Explain the concept of MultiIndex in Pandas.
MultiIndex is used to represent hierarchical index levels in a DataFrame.
Ques 46. Explain the use of the transform() function in Pandas.
transform() is used to perform group-specific computations and return a DataFrame with the same shape as the input.
Example:
df['normalized_column'] = df.groupby('group_column')['value_column'].transform(lambda x: (x - x.mean()) / x.std())
Ques 47. What is the purpose of the pipe() function in Pandas?
pipe() is used to apply a function to a DataFrame using method chaining.
Example:
df.pipe(my_function).dropna()
Ques 48. Explain the purpose of the stack() and unstack() functions in Pandas.
stack() is used to pivot the columns of a DataFrame to the rows. unstack() does the reverse operation.
Example:
df.stack()
Most helpful rated by users:
Related interview subjects
Python Pandas interview questions and answers - Total 48 questions |
Django interview questions and answers - Total 50 questions |
Python Matplotlib interview questions and answers - Total 30 questions |
Pandas interview questions and answers - Total 30 questions |
Deep Learning interview questions and answers - Total 29 questions |
Flask interview questions and answers - Total 40 questions |
PySpark interview questions and answers - Total 30 questions |
PyTorch interview questions and answers - Total 25 questions |
Data Science interview questions and answers - Total 23 questions |
SciPy interview questions and answers - Total 30 questions |
Generative AI interview questions and answers - Total 30 questions |
NumPy interview questions and answers - Total 30 questions |
Python interview questions and answers - Total 106 questions |