1. Which of the following is true about the k-Nearest Neighbors (k-NN) algorithm?
- A) It is a supervised learning algorithm.
- B) It requires a training phase.
- C) It only works with numerical data.
- D) It is a generative model.
Answer: A) It is a supervised learning algorithm.
Explanation: k-NN is a supervised learning algorithm used for both classification and regression tasks.
2. What does the “k” in k-Nearest Neighbors represent?
- A) The number of nearest neighbors to consider for classification or regression.
- B) The number of features in the dataset.
- C) The number of classes in the target variable.
- D) The number of dimensions in the feature space.
Answer: A) The number of nearest neighbors to consider for classification or regression.
Explanation: “k” represents the number of neighbors the algorithm uses to make a decision for classifying or predicting a data point.
3. Which of the following distance metrics is most commonly used in k-NN?
- A) Manhattan distance
- B) Euclidean distance
- C) Hamming distance
- D) Cosine similarity
Answer: B) Euclidean distance
Explanation: Euclidean distance is the most commonly used metric to measure the distance between points in the feature space in k-NN.
4. How does k-NN handle classification tasks?
- A) By fitting a decision boundary between classes
- B) By assigning the most common class among the k-nearest neighbors
- C) By minimizing the loss function
- D) By maximizing the likelihood of the features given the class
Answer: B) By assigning the most common class among the k-nearest neighbors
Explanation: In classification, k-NN assigns the class label based on the majority vote of the k-nearest neighbors.
5. What happens when the value of “k” is too small in k-NN?
- A) The model may overfit to the training data.
- B) The model may underfit the data.
- C) The model becomes less sensitive to noise.
- D) The model performs better with new data.
Answer: A) The model may overfit to the training data.
Explanation: A small “k” can cause the model to be sensitive to noise, resulting in overfitting and poor generalization.
6. What happens when the value of “k” is too large in k-NN?
- A) The model may overfit to the training data.
- B) The model may underfit the data.
- C) The model becomes highly sensitive to outliers.
- D) The model works better with a smaller dataset.
Answer: B) The model may underfit the data.
Explanation: A large “k” means the model considers a larger group of neighbors, which can cause it to lose sensitivity to the underlying patterns and lead to underfitting.
7. Which of the following is a disadvantage of the k-NN algorithm?
- A) It is computationally expensive during both training and prediction.
- B) It performs poorly with high-dimensional data.
- C) It is very sensitive to missing values in the dataset.
- D) It is not suitable for regression tasks.
Answer: A) It is computationally expensive during both training and prediction.
Explanation: k-NN requires storing all training data and calculating the distance for every new data point, which can be computationally expensive, especially for large datasets.
8. Which technique can help improve the performance of k-NN on high-dimensional data?
- A) Feature scaling (e.g., normalization or standardization)
- B) Using a smaller value for “k”
- C) Using the Manhattan distance instead of Euclidean distance
- D) Increasing the number of neighbors
Answer: A) Feature scaling (e.g., normalization or standardization)
Explanation: Feature scaling ensures that all features contribute equally to the distance calculation, which is important for k-NN, especially with high-dimensional data.
9. In k-NN, what does the “voting” process refer to in classification?
- A) Selecting the feature that best separates the classes
- B) Counting the number of times a particular class appears in the k-nearest neighbors and assigning the class with the highest count
- C) Assigning the class with the maximum probability based on Bayes’ theorem
- D) Assigning the class based on the average of the target labels of the nearest neighbors
Answer: B) Counting the number of times a particular class appears in the k-nearest neighbors and assigning the class with the highest count
Explanation: In classification tasks, the class label is assigned based on a majority vote from the k-nearest neighbors.
10. What is a common method to handle ties in k-NN classification (when multiple classes have the same number of neighbors)?
- A) Randomly assign a class label
- B) Choose the class based on the distance to the neighbors
- C) Assign the class with the largest probability
- D) Ignore the instance and make no prediction
Answer: B) Choose the class based on the distance to the neighbors
Explanation: In case of a tie, k-NN typically resolves the conflict by considering the distance of the tied neighbors and choosing the class of the closest neighbor.