Handling missing data MCQs

u930973931_answers

3 months ago

1. What is the primary reason for handling missing data in a dataset?

a) To reduce the complexity of the data
b) To prevent bias in the analysis and ensure model accuracy
c) To create a smaller dataset
d) To increase the number of features in the dataset

Answer: b) To prevent bias in the analysis and ensure model accuracy

2. Which of the following is a technique for handling missing data in a dataset?

a) Data normalization
b) Data imputation
c) Feature scaling
d) Principal component analysis (PCA)

Answer: b) Data imputation

3. Which method replaces missing values with the mean or median of the available data in the column?

a) Imputation
b) Deletion
c) Forward fill
d) Backward fill

Answer: a) Imputation

4. What is listwise deletion in the context of missing data handling?

a) Replacing missing values with the mean of the column
b) Deleting rows that contain any missing values
c) Filling missing values with zeros
d) Replacing missing data using predictive models

Answer: b) Deleting rows that contain any missing values

5. Which of the following is NOT a common method for handling missing data?

a) Mean imputation
b) Data deletion
c) Forward or backward filling
d) Data transformation

Answer: d) Data transformation

6. In forward fill, missing data is replaced with:

a) The previous non-missing value in the dataset
b) The mean of the column
c) A predicted value from a machine learning model
d) The value that occurs most frequently in the column

Answer: a) The previous non-missing value in the dataset

7. What is the potential issue with using mean imputation to handle missing data?

a) It can introduce bias, especially if the data is not normally distributed
b) It increases the size of the dataset
c) It only works for categorical data
d) It requires complex algorithms for imputation

Answer: a) It can introduce bias, especially if the data is not normally distributed

8. What is multiple imputation?

a) A technique where missing values are replaced by the most frequent value
b) A method that replaces missing values with predictions from a model
c) A process where multiple values are generated for each missing data point to account for uncertainty
d) A technique where missing data is simply ignored

Answer: c) A process where multiple values are generated for each missing data point to account for uncertainty

9. When should you consider using predictive modeling for missing data?

a) When there are only a few missing values
b) When the dataset is small
c) When the missing data is highly correlated with other variables
d) When you have enough data to estimate the missing values accurately

Answer: c) When the missing data is highly correlated with other variables

10. What is the effect of missing data on machine learning models?

a) Missing data has no impact if the dataset is large
b) Missing data can lead to incorrect model predictions or biased results
c) Missing data can increase the performance of the model
d) Missing data can only be handled through feature scaling

Answer: b) Missing data can lead to incorrect model predictions or biased results

11. Which of the following techniques is generally used when the data is missing completely at random (MCAR)?

a) Imputation with the mean or median
b) Deletion of rows with missing data
c) Predictive modeling
d) Binning the data

Answer: b) Deletion of rows with missing data

12. Which of the following is a disadvantage of using mean imputation to handle missing data?

a) It works best for highly skewed data
b) It can reduce the variance of the dataset
c) It introduces outliers
d) It increases the model complexity

Answer: b) It can reduce the variance of the dataset

13. K-nearest neighbors (KNN) imputation works by:

a) Filling missing values with the mean of the column
b) Predicting missing values using the closest available neighbors in the data
c) Removing the rows with missing values
d) Using the previous value to fill in missing data

Answer: b) Predicting missing values using the closest available neighbors in the data

14. What is a key assumption when using multiple imputation?

a) The data is missing completely at random (MCAR)
b) The missing values follow a uniform distribution
c) The missing data can be inferred by a single predictive model
d) Missing data does not affect the analysis

Answer: a) The data is missing completely at random (MCAR)

15. Which of the following can be a consequence of not handling missing data properly?

a) Faster computation and simpler models
b) Bias in analysis or misleading results
c) Increased accuracy of models
d) Removal of outliers from the dataset

Answer: b) Bias in analysis or misleading results