Clustering MCQs

1. What is the primary goal of clustering in machine learning?

a) To reduce the size of the dataset
b) To classify data into predefined categories
c) To group similar data points into clusters based on their attributes
d) To predict continuous values based on input features

Answer: c) To group similar data points into clusters based on their attributes


2. Which of the following is a popular algorithm for partitional clustering?

a) DBSCAN
b) K-means
c) Agglomerative hierarchical clustering
d) Gaussian Mixture Models (GMM)

Answer: b) K-means


3. In K-means clustering, how is the number of clusters (k) typically determined?

a) It is determined by the algorithm itself
b) Through trial and error or methods like the elbow method
c) By applying a decision tree
d) By using cross-validation

Answer: b) Through trial and error or methods like the elbow method


4. Which of the following is a key step in the K-means algorithm?

a) Assigning data points to the closest cluster centroid
b) Calculating the hierarchical tree structure of the data
c) Finding the global minimum of a cost function
d) Calculating the probability distribution of data points

Answer: a) Assigning data points to the closest cluster centroid


5. What does the Silhouette score measure in clustering?

a) The accuracy of clustering
b) The compactness and separation of clusters
c) The number of clusters in the dataset
d) The entropy of the clusters

Answer: b) The compactness and separation of clusters


6. Which of the following is a density-based clustering algorithm?

a) K-means
b) DBSCAN
c) Gaussian Mixture Models (GMM)
d) Agglomerative hierarchical clustering

Answer: b) DBSCAN


7. In DBSCAN, what does epsilon (ε) represent?

a) The maximum number of points in a cluster
b) The distance threshold for determining whether points are neighbors
c) The minimum number of clusters
d) The average distance between points in a cluster

Answer: b) The distance threshold for determining whether points are neighbors


8. What is the main advantage of hierarchical clustering over K-means?

a) It does not require the number of clusters to be specified in advance
b) It is computationally more efficient
c) It works well for high-dimensional data
d) It can handle missing values more effectively

Answer: a) It does not require the number of clusters to be specified in advance


9. Which type of clustering is agglomerative hierarchical clustering?

a) Partitional
b) Density-based
c) Divisive
d) Hierarchical

Answer: d) Hierarchical


10. What is the main difference between K-means and K-medoids clustering?

a) K-means uses centroids as cluster centers, while K-medoids uses actual data points
b) K-means works only with numerical data, while K-medoids can handle categorical data
c) K-means is a density-based algorithm, while K-medoids is partitional
d) K-means is used for hierarchical clustering, while K-medoids is for partitional clustering

Answer: a) K-means uses centroids as cluster centers, while K-medoids uses actual data points


11. Which of the following is a limitation of the K-means clustering algorithm?

a) It is sensitive to the initial placement of centroids
b) It can handle non-linear boundaries between clusters
c) It is efficient for high-dimensional data
d) It does not require the number of clusters to be predefined

Answer: a) It is sensitive to the initial placement of centroids


12. What does the elbow method help determine in clustering?

a) The most optimal clustering algorithm to use
b) The minimum number of clusters
c) The best number of clusters (k) for the dataset
d) The distance between centroids

Answer: c) The best number of clusters (k) for the dataset


13. What does the DBSCAN algorithm require as input?

a) The number of clusters
b) A distance measure (epsilon) and minimum points in a cluster
c) A list of class labels for the data points
d) A predefined hierarchical tree

Answer: b) A distance measure (epsilon) and minimum points in a cluster


14. In Gaussian Mixture Models (GMM), how are clusters represented?

a) As overlapping groups of data points with probabilistic membership
b) As rigid boundaries around each data point
c) By assigning a cluster label to each data point
d) As hierarchical trees of data points

Answer: a) As overlapping groups of data points with probabilistic membership


15. Which of the following is a weakness of K-means clustering?

a) It assumes clusters are spherical and of similar size
b) It can handle non-convex clusters well
c) It requires fewer computational resources than DBSCAN
d) It works well on categorical data

Answer: a) It assumes clusters are spherical and of similar size

Leave a Comment

All copyrights Reserved by MCQsAnswers.com - Powered By T4Tutorials