1. Which of the following is the first step in data preprocessing?
a) Data cleaning
b) Data transformation
c) Data integration
d) Data reduction
Answer: a) Data cleaning
2. What is the primary purpose of data cleaning in preprocessing?
a) To convert data into a more useful format
b) To eliminate irrelevant features
c) To identify and correct errors in the data
d) To visualize data for better understanding
Answer: c) To identify and correct errors in the data
3. Which technique is used to handle missing values in a dataset?
a) Normalization
b) Imputation
c) Standardization
d) Feature extraction
Answer: b) Imputation
4. Which of the following is an example of data transformation?
a) Handling missing data
b) Scaling or normalizing numerical values
c) Removing duplicate records
d) Combining data from different sources
Answer: b) Scaling or normalizing numerical values
5. In data preprocessing, what does normalization refer to?
a) Removing duplicate records from the dataset
b) Adjusting data values to a common scale without distorting differences
c) Handling missing values by substituting with the mean
d) Selecting a subset of relevant features for analysis
Answer: b) Adjusting data values to a common scale without distorting differences
6. What is the main goal of data reduction in preprocessing?
a) To improve the accuracy of data models
b) To reduce the size of the data without losing important information
c) To combine data from multiple sources
d) To eliminate irrelevant features from the dataset
Answer: b) To reduce the size of the data without losing important information
7. Which of the following is a technique for handling outliers during preprocessing?
a) Normalization
b) Z-score transformation
c) Feature extraction
d) Data augmentation
Answer: b) Z-score transformation
8. What is feature scaling?
a) Creating new features from existing ones
b) Adjusting feature values to a uniform range
c) Reducing the number of features in the dataset
d) Selecting important features for analysis
Answer: b) Adjusting feature values to a uniform range
9. What is one disadvantage of removing missing values during preprocessing?
a) It increases the dataset size
b) It might lead to a loss of important data
c) It causes the model to overfit
d) It introduces bias in the data
Answer: b) It might lead to a loss of important data
10. Which method can be used for encoding categorical variables into numeric values?
a) Data cleaning
b) One-hot encoding
c) Normalization
d) Feature selection
Answer: b) One-hot encoding
11. Why is data transformation important in data preprocessing?
a) It increases the complexity of the data
b) It helps in handling different data formats
c) It improves model performance by making data more consistent
d) It removes all irrelevant data
Answer: c) It improves model performance by making data more consistent
12. Which of the following preprocessing techniques is used to handle categorical data?
a) Min-max scaling
b) One-hot encoding
c) Standardization
d) Principal component analysis (PCA)
Answer: b) One-hot encoding
13. What is the effect of data normalization on features with different units of measurement?
a) It removes units and adjusts all features to the same scale
b) It increases the variance of the data
c) It keeps the data in its original form without change
d) It performs feature selection based on importance
Answer: a) It removes units and adjusts all features to the same scale
14. Which of the following is an example of data imputation?
a) Replacing missing values with the mean or median of the column
b) Scaling data to a specific range
c) Combining datasets from multiple sources
d) Removing rows with missing values
Answer: a) Replacing missing values with the mean or median of the column
15. What is the primary challenge of feature extraction in preprocessing?
a) Reducing the number of features too much
b) Creating new features that represent the data well
c) Scaling features to a uniform range
d) Dealing with missing values
Answer: b) Creating new features that represent the data well