Data cleaning and integration MCQs December 22, 2025November 18, 2024 by u930973931_answers 15 min Score: 0 Attempted: 0/15 Subscribe 1. What is the primary goal of data cleaning? (A) To remove redundant features (B) To remove noise and correct errors in the dataset (C) To reduce the size of the dataset (D) To increase the dataset’s complexity 2. Which of the following is NOT typically considered a data quality issue that data cleaning addresses? (A) Data visualization (B) Duplicate records (C) Incorrect formatting (D) Missing data 3. What is one common method used to handle missing values in a dataset? (A) Imputation (B) Data transformation (C) Data mining (D) Data reduction 4. In data cleaning, what does data deduplication refer to? (A) Identifying and removing duplicate records (B) Combining datasets from multiple sources (C) Removing irrelevant features from the dataset (D) Correcting misformatted data 5. Which of the following methods is commonly used to handle outliers in the data cleaning process? (A) Z-score transformation (B) Normalization (C) One-hot encoding (D) Data imputation 6. What is data integration in the context of data preprocessing? (A) Converting data from one format to another (B) Scaling data to a common range (C) Combining data from multiple sources into a unified dataset (D) Removing duplicate entries from the dataset 7. What is a challenge commonly faced during data integration? (A) Dealing with missing values (B) Normalizing numerical values (C) Ensuring all data sources have the same format (D) Removing outliers 8. What does schema integration refer to during data integration? (A) Handling conflicts between data types (B) Resolving discrepancies in data definitions across different sources (C) Combining data based on common attributes (D) Merging datasets with the same attributes 9. In data cleaning, what does standardization typically involve? (A) Removing rows with missing values (B) Ensuring consistency in data formatting and units (C) Converting categorical data into numeric form (D) Scaling numerical data to a standard range 10. When integrating data from multiple sources, which issue is likely to arise? (A) Inconsistent data formats (B) Data transformation (C) Feature selection (D) Data visualization 11. Which technique can be used to handle categorical data when performing data cleaning? (A) One-hot encoding (B) Imputation (C) Data reduction (D) Normalization 12. What is the best approach when data contains outliers that cannot be removed? (A) Impute missing values with the median (B) Apply robust models that are less sensitive to outliers (C) Perform normalization (D) Ignore them, as they do not affect the model 13. Which of the following is an important step during the data cleaning process to ensure accurate analysis? (A) Removing all missing data (B) Ensuring data consistency and integrity (C) Removing irrelevant features (D) Reducing the dimensionality of data 14. Which of the following is an example of data imputation in data cleaning? (A) Merging data from multiple sources (B) Removing rows with missing data (C) Replacing missing values with the mean of the column (D) Scaling numerical values to a range between 0 and 1 15. What does entity resolution aim to achieve in data integration? (A) Identifying and merging records that refer to the same real-world entity (B) Standardizing data formats (C) Removing duplicates within a single dataset (D) Mapping data to a target schema