Cluster evaluation MCQs

1. What is the main goal of cluster evaluation?

  • A) To find the optimal number of clusters
  • B) To determine the quality of the clusters formed by a clustering algorithm
  • C) To visualize the clusters in a 2D or 3D space
  • D) To compute the distance between data points

Answer: B) To determine the quality of the clusters formed by a clustering algorithm
Explanation: The purpose of cluster evaluation is to assess the quality of the clusters produced by a clustering algorithm, considering factors like cohesion and separation.


2. Which of the following is an internal evaluation metric for clustering?

  • A) Rand Index
  • B) Silhouette Score
  • C) Fowlkes-Mallows Index
  • D) V-Measure

Answer: B) Silhouette Score
Explanation: Internal evaluation metrics like the Silhouette Score measure the quality of clustering based on the data itself (i.e., how similar data points are within a cluster and how distinct the clusters are).


3. Which of the following is a commonly used external evaluation metric for clustering?

  • A) Davies-Bouldin Index
  • B) Silhouette Score
  • C) Rand Index
  • D) Within-Cluster Sum of Squares (WCSS)

Answer: C) Rand Index
Explanation: The Rand Index is an external evaluation metric that compares the clustering results with a ground truth (true labels), measuring the agreement between the clustering and the true class labels.


4. What does the Davies-Bouldin Index measure in clustering evaluation?

  • A) The average distance between clusters
  • B) The ratio of the average distance within clusters to the distance between clusters
  • C) The compactness of the clusters
  • D) The proportion of points that belong to the same cluster

Answer: B) The ratio of the average distance within clusters to the distance between clusters
Explanation: The Davies-Bouldin Index measures cluster separation and compactness. A lower value indicates better clustering, with clusters being well-separated and compact.


5. Which of the following is true about the Silhouette Score?

  • A) A high silhouette score indicates poor clustering.
  • B) The silhouette score ranges from -1 to +1, where a higher score indicates better clustering.
  • C) The silhouette score only works for hierarchical clustering.
  • D) A silhouette score of 0 indicates perfect clustering.

Answer: B) The silhouette score ranges from -1 to +1, where a higher score indicates better clustering.
Explanation: The Silhouette Score ranges from -1 to +1. A score closer to +1 indicates well-defined clusters, while a score near 0 suggests overlapping clusters.


6. The Within-Cluster Sum of Squares (WCSS) is used to evaluate which of the following?

  • A) The number of clusters
  • B) The compactness of the clusters
  • C) The separation between clusters
  • D) The average distance between points in different clusters

Answer: B) The compactness of the clusters
Explanation: WCSS measures the compactness of clusters, which is the sum of squared distances between each point and the centroid of its cluster. Lower WCSS indicates more compact clusters.


7. What is the main advantage of external evaluation metrics over internal evaluation metrics in clustering?

  • A) They do not require a ground truth.
  • B) They assess the clustering based on a true class label.
  • C) They are less computationally expensive.
  • D) They work better for high-dimensional data.

Answer: B) They assess the clustering based on a true class label.
Explanation: External evaluation metrics use ground truth or true labels to compare the clustering results, providing an objective measure of clustering performance.


8. What does the V-Measure evaluate in clustering?

  • A) The compactness of clusters
  • B) The agreement between the clustering and the true labels
  • C) The similarity of clusters based on their centroid
  • D) The amount of overlap between clusters

Answer: B) The agreement between the clustering and the true labels
Explanation: The V-Measure is an external evaluation metric that measures how well the clustering corresponds to a known set of true labels, balancing the cluster homogeneity and completeness.


9. What is cluster cohesion?

  • A) The distance between different clusters
  • B) The distance between the centroids of clusters
  • C) The tightness or compactness of points within the same cluster
  • D) The number of clusters formed

Answer: C) The tightness or compactness of points within the same cluster
Explanation: Cluster cohesion measures how closely the points within a cluster are to each other. High cohesion means the points in a cluster are close together.


10. What does a high value of the Silhouette Score indicate about the clustering?

  • A) The clusters are not well-separated.
  • B) The points are correctly grouped into appropriate clusters.
  • C) There are many outliers in the dataset.
  • D) The clustering has many overlaps between clusters.

Answer: B) The points are correctly grouped into appropriate clusters.
Explanation: A high silhouette score indicates that the points are well-grouped within their clusters and are far from points in other clusters.


11. Which of the following clustering evaluation metrics is typically used to compare the clustering result with a ground truth classification?

  • A) Silhouette Score
  • B) Rand Index
  • C) Davies-Bouldin Index
  • D) Within-Cluster Sum of Squares (WCSS)

Answer: B) Rand Index
Explanation: The Rand Index compares the clustering results with the true labels to measure how well the clustering matches the ground truth.


12. When evaluating clustering results, what does homogeneity refer to?

  • A) The similarity of clusters to each other
  • B) The degree to which points in the same cluster share the same true label
  • C) The number of points in each cluster
  • D) The size of the clusters

Answer: B) The degree to which points in the same cluster share the same true label
Explanation: Homogeneity measures how similar the points within a cluster are to each other based on the true labels, with high homogeneity indicating that the points in a cluster share the same label.


13. What does the adjusted Rand index (ARI) correct for?

  • A) The number of points in each cluster
  • B) The random chance of clustering assignments
  • C) The distance between cluster centroids
  • D) The maximum number of clusters possible

Answer: B) The random chance of clustering assignments
Explanation: The adjusted Rand index (ARI) corrects for chance agreement by normalizing the Rand Index, making it more robust and interpretable.


14. Which evaluation metric is used to assess how well a clustering algorithm performs in creating clusters of different shapes and sizes?

  • A) Silhouette Score
  • B) Adjusted Rand Index
  • C) Calinski-Harabasz Index
  • D) Davies-Bouldin Index

Answer: C) Calinski-Harabasz Index
Explanation: The Calinski-Harabasz Index (Variance Ratio Criterion) evaluates clustering by comparing the dispersion between clusters and the dispersion within clusters, with a higher value indicating better-defined clusters.


15. If the Silhouette Score is close to 0, what does this indicate?

  • A) The clustering algorithm performed perfectly.
  • B) The clusters are well-separated and compact.
  • C) The clustering is poor, with points on the border of multiple clusters.
  • D) The number of clusters should be increased.

Answer: C) The clustering is poor, with points on the border of multiple clusters.
Explanation: A silhouette score close to 0 indicates that the points are not well grouped into distinct clusters, and they may be on the boundary between clusters.

Leave a Reply

Your email address will not be published. Required fields are marked *