1. What is the primary goal of clustering in machine learning?
a) To reduce the size of the dataset
b) To classify data into predefined categories
c) To group similar data points into clusters based on their attributes
d) To predict continuous values based on input features
Answer: c) To group similar data points into clusters based on their attributes
2. Which of the following is a popular algorithm for partitional clustering?
a) DBSCAN
b) K-means
c) Agglomerative hierarchical clustering
d) Gaussian Mixture Models (GMM)
Answer: b) K-means
3. In K-means clustering, how is the number of clusters (k) typically determined?
a) It is determined by the algorithm itself
b) Through trial and error or methods like the elbow method
c) By applying a decision tree
d) By using cross-validation
Answer: b) Through trial and error or methods like the elbow method
4. Which of the following is a key step in the K-means algorithm?
a) Assigning data points to the closest cluster centroid
b) Calculating the hierarchical tree structure of the data
c) Finding the global minimum of a cost function
d) Calculating the probability distribution of data points
Answer: a) Assigning data points to the closest cluster centroid
5. What does the Silhouette score measure in clustering?
a) The accuracy of clustering
b) The compactness and separation of clusters
c) The number of clusters in the dataset
d) The entropy of the clusters
Answer: b) The compactness and separation of clusters
6. Which of the following is a density-based clustering algorithm?
a) K-means
b) DBSCAN
c) Gaussian Mixture Models (GMM)
d) Agglomerative hierarchical clustering
Answer: b) DBSCAN
7. In DBSCAN, what does epsilon (ε) represent?
a) The maximum number of points in a cluster
b) The distance threshold for determining whether points are neighbors
c) The minimum number of clusters
d) The average distance between points in a cluster
Answer: b) The distance threshold for determining whether points are neighbors
8. What is the main advantage of hierarchical clustering over K-means?
a) It does not require the number of clusters to be specified in advance
b) It is computationally more efficient
c) It works well for high-dimensional data
d) It can handle missing values more effectively
Answer: a) It does not require the number of clusters to be specified in advance
9. Which type of clustering is agglomerative hierarchical clustering?
a) Partitional
b) Density-based
c) Divisive
d) Hierarchical
Answer: d) Hierarchical
10. What is the main difference between K-means and K-medoids clustering?
a) K-means uses centroids as cluster centers, while K-medoids uses actual data points
b) K-means works only with numerical data, while K-medoids can handle categorical data
c) K-means is a density-based algorithm, while K-medoids is partitional
d) K-means is used for hierarchical clustering, while K-medoids is for partitional clustering
Answer: a) K-means uses centroids as cluster centers, while K-medoids uses actual data points
11. Which of the following is a limitation of the K-means clustering algorithm?
a) It is sensitive to the initial placement of centroids
b) It can handle non-linear boundaries between clusters
c) It is efficient for high-dimensional data
d) It does not require the number of clusters to be predefined
Answer: a) It is sensitive to the initial placement of centroids
12. What does the elbow method help determine in clustering?
a) The most optimal clustering algorithm to use
b) The minimum number of clusters
c) The best number of clusters (k) for the dataset
d) The distance between centroids
Answer: c) The best number of clusters (k) for the dataset
13. What does the DBSCAN algorithm require as input?
a) The number of clusters
b) A distance measure (epsilon) and minimum points in a cluster
c) A list of class labels for the data points
d) A predefined hierarchical tree
Answer: b) A distance measure (epsilon) and minimum points in a cluster
14. In Gaussian Mixture Models (GMM), how are clusters represented?
a) As overlapping groups of data points with probabilistic membership
b) As rigid boundaries around each data point
c) By assigning a cluster label to each data point
d) As hierarchical trees of data points
Answer: a) As overlapping groups of data points with probabilistic membership
15. Which of the following is a weakness of K-means clustering?
a) It assumes clusters are spherical and of similar size
b) It can handle non-convex clusters well
c) It requires fewer computational resources than DBSCAN
d) It works well on categorical data
Answer: a) It assumes clusters are spherical and of similar size