Density-based clustering (e.g., DBSCAN) MCQs

1. What is the main idea behind DBSCAN (Density-Based Spatial Clustering of Applications with Noise)?

A) Cluster data based on the distance between data points
B) Cluster data based on density of points in the dataset
C) Partition data into predefined K clusters
D) Minimize the variance within clusters

Answer: B) Cluster data based on density of points in the dataset
Explanation: DBSCAN clusters data based on the density of data points, grouping points that are close together and separating points that are far apart.

2. Which of the following parameters is required to run the DBSCAN algorithm?

A) The number of clusters (K)
B) The radius of the neighborhood (epsilon, ε) and the minimum number of points (minPts)
C) The distance metric
D) The number of iterations

Answer: B) The radius of the neighborhood (epsilon, ε) and the minimum number of points (minPts)
Explanation: DBSCAN requires two parameters: epsilon (ε), which defines the neighborhood radius, and minPts, which is the minimum number of points required to form a dense region (cluster).

3. What is the role of the epsilon (ε) parameter in DBSCAN?

A) It defines the maximum distance between two points to be considered part of the same cluster
B) It specifies the number of clusters to form
C) It controls the number of iterations the algorithm runs
D) It defines the number of outliers in the dataset

Answer: A) It defines the maximum distance between two points to be considered part of the same cluster
Explanation: The epsilon (ε) parameter defines the maximum distance between two points for them to be considered part of the same cluster.

4. What is the significance of the minPts parameter in DBSCAN?

A) It defines the minimum number of data points required to form a cluster
B) It defines the minimum number of clusters to be created
C) It specifies the number of outliers in the dataset
D) It specifies the number of features in each cluster

Answer: A) It defines the minimum number of data points required to form a cluster
Explanation: The minPts parameter defines the minimum number of data points required within an epsilon neighborhood for a region to be considered a cluster.

5. Which of the following points in DBSCAN is considered a “core point”?

A) A point with at least minPts points in its epsilon neighborhood
B) A point that is closer to the cluster centroid than others
C) A point that is located at the center of the cluster
D) A point that has no neighboring points within epsilon distance

Answer: A) A point with at least minPts points in its epsilon neighborhood
Explanation: A core point in DBSCAN is a point that has at least minPts points within its epsilon neighborhood, including itself.

6. In DBSCAN, what is a “border point”?

A) A point with fewer than minPts points in its epsilon neighborhood but is in the neighborhood of a core point
B) A point with no points in its epsilon neighborhood
C) A point that lies at the centroid of a cluster
D) A point that is considered an outlier

Answer: A) A point with fewer than minPts points in its epsilon neighborhood but is in the neighborhood of a core point
Explanation: A border point has fewer than minPts points in its epsilon neighborhood but is within the neighborhood of a core point, meaning it is on the edge of a cluster.

7. What is an “outlier” or “noise” point in DBSCAN?

A) A point that does not have enough points within its epsilon neighborhood to be considered a part of any cluster
B) A point that is located at the center of a cluster
C) A point that has the highest distance from the cluster centroid
D) A point that is always at the edge of a cluster

Answer: A) A point that does not have enough points within its epsilon neighborhood to be considered a part of any cluster
Explanation: An outlier or noise point in DBSCAN is a point that does not have enough neighboring points (less than minPts) within its epsilon neighborhood and thus is not assigned to any cluster.

8. What is the main advantage of DBSCAN over K-means clustering?

A) DBSCAN requires the number of clusters to be specified in advance.
B) DBSCAN can detect clusters of arbitrary shape, while K-means can only detect spherical clusters.
C) DBSCAN always produces fewer clusters than K-means.
D) DBSCAN does not require distance metrics.

Answer: B) DBSCAN can detect clusters of arbitrary shape, while K-means can only detect spherical clusters.
Explanation: DBSCAN is a density-based algorithm that can detect clusters of arbitrary shape, whereas K-means assumes spherical clusters with a predefined number of clusters.

9. Which of the following is a limitation of DBSCAN?

A) It cannot detect clusters of arbitrary shape.
B) It is sensitive to the choice of epsilon (ε) and minPts parameters.
C) It does not require a distance metric.
D) It always produces a fixed number of clusters.

Answer: B) It is sensitive to the choice of epsilon (ε) and minPts parameters.
Explanation: DBSCAN’s performance is highly sensitive to the choice of epsilon and minPts, and improper values for these parameters can lead to poor clustering results.

10. Which of the following data structures is typically used in DBSCAN to optimize the search for neighboring points?

A) KD-tree
B) Hash table
C) Decision tree
D) Adjacency matrix

Answer: A) KD-tree
Explanation: DBSCAN often uses spatial data structures like KD-trees to efficiently search for neighboring points within the epsilon distance.

11. What happens if the epsilon (ε) parameter in DBSCAN is set too large?

A) More points will be classified as outliers.
B) The algorithm will not detect any clusters.
C) The algorithm will form fewer, larger clusters.
D) The algorithm will form more, smaller clusters.

Answer: C) The algorithm will form fewer, larger clusters.
Explanation: If epsilon is set too large, DBSCAN will include more points in each cluster, potentially merging distinct clusters into fewer, larger ones.

12. What happens if the epsilon (ε) parameter in DBSCAN is set too small?

A) More points will be classified as outliers.
B) The algorithm will merge all points into one cluster.
C) The algorithm will detect many small clusters.
D) The algorithm will not work properly.

Answer: A) More points will be classified as outliers.
Explanation: If epsilon is set too small, DBSCAN may not find enough neighboring points for clustering, resulting in more points being classified as outliers.

13. How does DBSCAN handle noise or outliers in the data?

A) DBSCAN ignores noise points completely.
B) DBSCAN assigns noise points to the closest cluster.
C) DBSCAN labels noise points as “outliers” and does not assign them to any cluster.
D) DBSCAN treats noise points as a separate cluster.

Answer: C) DBSCAN labels noise points as “outliers” and does not assign them to any cluster.
Explanation: In DBSCAN, points that do not meet the density criteria for being part of any cluster are labeled as “noise” and not assigned to any cluster.

14. Which type of data is DBSCAN particularly well-suited for?

A) High-dimensional data
B) Data with clusters of varying shapes and densities
C) Data with a fixed number of clusters
D) Data with categorical variables

Answer: B) Data with clusters of varying shapes and densities
Explanation: DBSCAN is effective for datasets where clusters have varying shapes and densities, as it does not assume spherical clusters like K-means.

15. What does the “minPts” parameter influence in DBSCAN?

A) The maximum number of clusters formed
B) The number of points required to form a cluster
C) The size of each cluster
D) The density of points in a cluster

Answer: B) The number of points required to form a cluster
Explanation: The minPts parameter specifies the minimum number of points required to form a dense region that is considered a cluster in DBSCAN.