1. What is the main purpose of data transformation in the data preprocessing pipeline?
a) To reduce the number of features in the dataset
b) To convert the data into a more suitable format for analysis or modelingfcom
c) To combine data from different sources
d) To remove duplicates and errors in the dataset
Answer: b) To convert the data into a more suitable format for analysis or modeling
2. Which of the following is a common method used for data transformation?
a) Scaling
b) Data imputation
c) Data cleaning
d) Feature extraction
Answer: a) Scaling
3. What is normalization in the context of data transformation?
a) Converting categorical variables into numeric form
b) Rescaling features to a fixed range, usually [0, 1]
c) Removing duplicate records from the dataset
d) Merging data from multiple sources
Answer: b) Rescaling features to a fixed range, usually [0, 1]
4. Which of the following is an example of log transformation?
a) Adding noise to the data
b) Converting values using the natural logarithm function
c) Applying the Min-Max scaling
d) Encoding categorical variables as binary values
Answer: b) Converting values using the natural logarithm function
5. When is standardization typically used during data transformation?
a) When the data has a skewed distribution
b) When the data needs to be rescaled to a fixed range
c) When the data has a normal distribution and needs to be centered around zero
d) When missing values need to be handled
Answer: c) When the data has a normal distribution and needs to be centered around zero
6. What does log transformation help with in data preprocessing?
a) Reducing the impact of extreme values or skewed distributions
b) Increasing the variance of data
c) Handling missing values
d) Scaling features to a common range
Answer: a) Reducing the impact of extreme values or skewed distributions
7. What does binning refer to in the context of data transformation?
a) Scaling numerical values to a specific range
b) Grouping continuous data into discrete categories or intervals
c) Encoding categorical variables as numerical values
d) Removing outliers from the data
Answer: b) Grouping continuous data into discrete categories or intervals
8. Which of the following is an example of feature extraction during data transformation?
a) Applying one-hot encoding to categorical variables
b) Selecting the most relevant features using statistical methods
c) Combining two or more features to create a new feature
d) Scaling data to a standard range
Answer: c) Combining two or more features to create a new feature
9. What is the z-score transformation used for in data transformation?
a) Removing duplicate entries from the dataset
b) Rescaling data to a fixed range
c) Standardizing data to have a mean of 0 and a standard deviation of 1
d) Handling missing values by imputing the mean
Answer: c) Standardizing data to have a mean of 0 and a standard deviation of 1
10. Which of the following transformation methods is used when the data has a skewed distribution?
a) Standardization
b) Log transformation
c) Binning
d) One-hot encoding
Answer: b) Log transformation
11. In data transformation, what does feature scaling refer to?
a) Reducing the number of features by selecting the most relevant ones
b) Converting categorical data into numerical data
c) Rescaling numerical features to a similar range or distribution
d) Creating new features by combining existing ones
Answer: c) Rescaling numerical features to a similar range or distribution
12. What is the purpose of Min-Max scaling in data transformation?
a) To rescale all features into a range between 0 and 1
b) To reduce the dataset size
c) To eliminate outliers from the data
d) To combine multiple features into a single feature
Answer: a) To rescale all features into a range between 0 and 1
13. What does one-hot encoding do during data transformation?
a) Converts categorical values into binary columns
b) Scales numerical values to a standard range
c) Combines categorical features into a single column
d) Imputes missing values with the mean
Answer: a) Converts categorical values into binary columns
14. Which of the following methods is used to handle skewed data before applying machine learning algorithms?
a) Binning
b) Normalization
c) Log transformation
d) Data imputation
Answer: c) Log transformation
15. What is the result of applying data discretization in data transformation?
a) Converting continuous data into categorical data by grouping values into bins
b) Normalizing the data to a range of [0,1]
c) Creating new features by combining multiple attributes
d) Removing irrelevant features from the dataset
Answer: a) Converting continuous data into categorical data by grouping values into bins