Data Mining 面试题与答案
问题 21. What is ensemble learning, and how does it improve model performance?
Ensemble learning combines predictions from multiple models to achieve better accuracy and generalization. It helps reduce overfitting and increase robustness.
Example:
Building a random forest by combining predictions from multiple decision trees.
问题 22. Explain the concept of a ROC curve in the context of classification models.
A ROC curve visualizes the trade-off between true positive rate and false positive rate at various classification thresholds. It helps evaluate the model's performance across different decision boundaries.
Example:
Assessing a medical diagnostic model's ability to discriminate between healthy and diseased individuals.
问题 23. What is the Apriori algorithm, and how does it work?
Apriori is a frequent itemset mining algorithm used for association rule discovery. It identifies frequent itemsets and generates rules based on their support and confidence levels.
Example:
Finding association rules like {milk, bread} => {eggs} in a supermarket transaction dataset.
问题 24. What is the difference between batch and online learning in the context of machine learning?
Batch learning involves training a model on the entire dataset at once, while online learning updates the model continuously as new data becomes available.
Example:
Batch learning: Training a model on a year's worth of customer data. Online learning: Updating a recommendation system in real-time as users interact with the platform.
问题 25. What is the concept of lift in association rule mining?
Lift measures the ratio of the observed support of a rule to the expected support if the antecedent and consequent were independent. It helps assess the significance of a rule.
Example:
If the lift is 2, it indicates that the rule has twice the likelihood of occurring compared to random chance.
用户评价最有帮助的内容: