Statistics Interview Questions and Answers
Experienced / Expert level questions & answers
Ques 1. Explain the difference between Type I and Type II censoring in survival analysis.
Type I censoring occurs when an observation is right-censored if the event of interest has not occurred by the end of the study. Type II censoring occurs when an observation is left-censored if the exact time of the event is not known.
Example:
In a study tracking time to failure of light bulbs, if some bulbs are still functional at the end of the study, it is Type I censoring.
Ques 2. What is the Mann-Whitney U test used for?
The Mann-Whitney U test is a nonparametric test used to assess whether there is a difference between two independent groups in terms of a continuous dependent variable.
Example:
Comparing the distributions of test scores between two different teaching methods when assumptions for a t-test are not met.
Ques 3. Define Simpson's Paradox.
Simpson's Paradox occurs when a trend appears in different groups of data but disappears or reverses when these groups are combined. It highlights the importance of considering confounding variables in statistical analysis.
Example:
A treatment is effective in both men and women, but when the data is combined, it seems ineffective due to differences in baseline characteristics.
Ques 4. What is the purpose of the Akaike Information Criterion (AIC) in model selection?
The Akaike Information Criterion (AIC) is used for model selection, comparing the goodness of fit of different models. It penalizes models for having more parameters, encouraging the selection of simpler models that still explain the data well.
Example:
Choosing between linear and quadratic regression models based on AIC values.
Ques 5. Explain the concept of bootstrapping in statistics.
Bootstrapping is a resampling technique where multiple random samples are drawn with replacement from the observed data. It is used to estimate the sampling distribution of a statistic and to calculate confidence intervals.
Example:
Creating multiple bootstrap samples from a dataset to estimate the uncertainty around the mean.
Most helpful rated by users: