Data cleaning and integration MCQs

1. What is the primary goal of data cleaning?

a) To remove redundant features
b) To remove noise and correct errors in the dataset
c) To reduce the size of the dataset
d) To increase the dataset’s complexity

Answer: b) To remove noise and correct errors in the dataset


2. Which of the following is NOT typically considered a data quality issue that data cleaning addresses?

a) Missing data
b) Duplicate records
c) Incorrect formatting
d) Data visualization

Answer: d) Data visualization


3. What is one common method used to handle missing values in a dataset?

a) Data transformation
b) Imputation
c) Data mining
d) Data reduction

Answer: b) Imputation


4. In data cleaning, what does data deduplication refer to?

a) Removing irrelevant features from the dataset
b) Combining datasets from multiple sources
c) Identifying and removing duplicate records
d) Correcting misformatted data

Answer: c) Identifying and removing duplicate records


5. Which of the following methods is commonly used to handle outliers in the data cleaning process?

a) One-hot encoding
b) Normalization
c) Z-score transformation
d) Data imputation

Answer: c) Z-score transformation


6. What is data integration in the context of data preprocessing?

a) Combining data from multiple sources into a unified dataset
b) Scaling data to a common range
c) Converting data from one format to another
d) Removing duplicate entries from the dataset

Answer: a) Combining data from multiple sources into a unified dataset


7. What is a challenge commonly faced during data integration?

a) Ensuring all data sources have the same format
b) Normalizing numerical values
c) Dealing with missing values
d) Removing outliers

Answer: a) Ensuring all data sources have the same format


8. What does schema integration refer to during data integration?

a) Handling conflicts between data types
b) Merging datasets with the same attributes
c) Combining data based on common attributes
d) Resolving discrepancies in data definitions across different sources

Answer: d) Resolving discrepancies in data definitions across different sources


9. In data cleaning, what does standardization typically involve?

a) Removing rows with missing values
b) Scaling numerical data to a standard range
c) Converting categorical data into numeric form
d) Ensuring consistency in data formatting and units

Answer: d) Ensuring consistency in data formatting and units


10. When integrating data from multiple sources, which issue is likely to arise?

a) Data transformation
b) Inconsistent data formats
c) Feature selection
d) Data visualization

Answer: b) Inconsistent data formats


11. Which technique can be used to handle categorical data when performing data cleaning?

a) Imputation
b) One-hot encoding
c) Data reduction
d) Normalization

Answer: b) One-hot encoding


12. What is the best approach when data contains outliers that cannot be removed?

a) Impute missing values with the median
b) Apply robust models that are less sensitive to outliers
c) Perform normalization
d) Ignore them, as they do not affect the model

Answer: b) Apply robust models that are less sensitive to outliers


13. Which of the following is an important step during the data cleaning process to ensure accurate analysis?

a) Removing all missing data
b) Removing irrelevant features
c) Ensuring data consistency and integrity
d) Reducing the dimensionality of data

Answer: c) Ensuring data consistency and integrity


14. Which of the following is an example of data imputation in data cleaning?

a) Replacing missing values with the mean of the column
b) Removing rows with missing data
c) Merging data from multiple sources
d) Scaling numerical values to a range between 0 and 1

Answer: a) Replacing missing values with the mean of the column


15. What does entity resolution aim to achieve in data integration?

a) Standardizing data formats
b) Identifying and merging records that refer to the same real-world entity
c) Removing duplicates within a single dataset
d) Mapping data to a target schema

Answer: b) Identifying and merging records that refer to the same real-world entity

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>