Data Preprocessing MCQs

1. Which of the following is the first step in data preprocessing?

a) Data cleaning
b) Data transformation
c) Data integration
d) Data reduction

Answer: a) Data cleaning


2. What is the primary purpose of data cleaning in preprocessing?

a) To convert data into a more useful format
b) To eliminate irrelevant features
c) To identify and correct errors in the data
d) To visualize data for better understanding

Answer: c) To identify and correct errors in the data


3. Which technique is used to handle missing values in a dataset?

a) Normalization
b) Imputation
c) Standardization
d) Feature extraction

Answer: b) Imputation


4. Which of the following is an example of data transformation?

a) Handling missing data
b) Scaling or normalizing numerical values
c) Removing duplicate records
d) Combining data from different sources

Answer: b) Scaling or normalizing numerical values


5. In data preprocessing, what does normalization refer to?

a) Removing duplicate records from the dataset
b) Adjusting data values to a common scale without distorting differences
c) Handling missing values by substituting with the mean
d) Selecting a subset of relevant features for analysis

Answer: b) Adjusting data values to a common scale without distorting differences


6. What is the main goal of data reduction in preprocessing?

a) To improve the accuracy of data models
b) To reduce the size of the data without losing important information
c) To combine data from multiple sources
d) To eliminate irrelevant features from the dataset

Answer: b) To reduce the size of the data without losing important information


7. Which of the following is a technique for handling outliers during preprocessing?

a) Normalization
b) Z-score transformation
c) Feature extraction
d) Data augmentation

Answer: b) Z-score transformation


8. What is feature scaling?

a) Creating new features from existing ones
b) Adjusting feature values to a uniform range
c) Reducing the number of features in the dataset
d) Selecting important features for analysis

Answer: b) Adjusting feature values to a uniform range


9. What is one disadvantage of removing missing values during preprocessing?

a) It increases the dataset size
b) It might lead to a loss of important data
c) It causes the model to overfit
d) It introduces bias in the data

Answer: b) It might lead to a loss of important data


10. Which method can be used for encoding categorical variables into numeric values?

a) Data cleaning
b) One-hot encoding
c) Normalization
d) Feature selection

Answer: b) One-hot encoding


11. Why is data transformation important in data preprocessing?

a) It increases the complexity of the data
b) It helps in handling different data formats
c) It improves model performance by making data more consistent
d) It removes all irrelevant data

Answer: c) It improves model performance by making data more consistent


12. Which of the following preprocessing techniques is used to handle categorical data?

a) Min-max scaling
b) One-hot encoding
c) Standardization
d) Principal component analysis (PCA)

Answer: b) One-hot encoding


13. What is the effect of data normalization on features with different units of measurement?

a) It removes units and adjusts all features to the same scale
b) It increases the variance of the data
c) It keeps the data in its original form without change
d) It performs feature selection based on importance

Answer: a) It removes units and adjusts all features to the same scale


14. Which of the following is an example of data imputation?

a) Replacing missing values with the mean or median of the column
b) Scaling data to a specific range
c) Combining datasets from multiple sources
d) Removing rows with missing values

Answer: a) Replacing missing values with the mean or median of the column


15. What is the primary challenge of feature extraction in preprocessing?

a) Reducing the number of features too much
b) Creating new features that represent the data well
c) Scaling features to a uniform range
d) Dealing with missing values

Answer: b) Creating new features that represent the data well

Leave a Comment