Deep Learning MCQs December 22, 2025August 10, 2024 by u930973931_answers 50 min Score: 0 Attempted: 0/50 Subscribe 1. What is the primary purpose of an activation function in a neural network? (A) To initialize weights (B) To introduce non-linearity into the model (C) To reduce overfitting (D) To normalize the input data 2. Which of the following is a popular activation function used in deep learning? (A) Sigmoid (B) ReLU (C) Tanh (D) All of the above 3. What does the term ābackpropagationā refer to in neural networks? (A) The process of updating weights based on the error gradient (B) The forward pass of data through the network (C) The initialization of weights (D) The process of normalizing the input data 4. What is the main advantage of using a convolutional neural network (CNN) for image recognition? (A) Reduced computational cost (B) Ability to capture spatial hierarchies in images (C) Better text processing capabilities (D) Simplified weight initialization 5. Which of the following is NOT a common type of layer in a CNN? (A) Convolutional layer (B) Recurrent layer (C) Fully connected layer (D) Pooling layer 6. What is the purpose of dropout in a neural network? (A) To initialize weights (B) To speed up training by reducing the number of neurons (C) To reduce the size of the input data (D) To prevent overfitting by randomly dropping units during training 7. Which of the following techniques is used to deal with the vanishing gradient problem? (A) Increasing the learning rate (B) Using ReLU activation function (C) Decreasing the network size (D) Using batch normalization 8. What does LSTM stand for in the context of deep learning? (A) Long Short-Term Memory (B) Linear Sequential Time Model (C) Latent Semantic Time Model (D) Large-Scale Training Model 9. Which of the following is a major advantage of LSTM networks? (A) Reduced computational complexity (B) Ability to capture long-term dependencies (C) Enhanced feature extraction from images (D) Improved training speed 10. In the context of deep learning, what is a āvanishing gradientā? (A) When the learning rate is too high (B) When the gradient becomes too small to update weights effectively (C) When the gradient becomes too large (D) When the model overfits the training data 11. What does the softmax function output? (A) A probability distribution over classes (B) A single scalar value (C) A binary output (D) A normalized input 12. Which type of neural network is most commonly used for natural language processing tasks? (A) Autoencoder (B) Convolutional Neural Network (CNN) (C) Generative Adversarial Network (GAN) (D) Recurrent Neural Network (RNN) 13. What is the role of the ālearning rateā in training a neural network? (A) It determines the number of neurons in each layer (B) It controls the size of the weight updates during training (C) It sets the initial values of the weights (D) It defines the structure of the network 14. Which optimization algorithm is commonly used in deep learning? (A) Stochastic Gradient Descent (SGD) (B) Genetic Algorithm (C) Simulated Annealing (D) Particle Swarm Optimization 15. What is the purpose of an autoencoder? (A) To generate new data (B) To learn a compressed representation of data (C) To classify images (D) To initialize weights 16. What is a Generative Adversarial Network (GAN) composed of? (A) A single neural network (B) Two convolutional networks (C) A recurrent network and an autoencoder (D) A generator and a discriminator 17. What is ābatch normalizationā used for in deep learning? (A) To reduce the dimensionality of the data (B) To normalize the inputs of each layer to improve training speed and stability (C) To prevent overfitting (D) To initialize weights 18. Which of the following is an advantage of using pre-trained models? (A) Higher overfitting risk (B) Increased model complexity (C) Reduced training time and computational cost (D) Easier weight initialization 19. What is the key difference between CNNs and RNNs? (A) RNNs are used exclusively for image processing (B) RNNs are faster to train than CNNs (C) CNNs have fewer parameters than RNNs (D) CNNs are designed for spatial data, while RNNs are designed for sequential data 20. What is āoverfittingā in the context of deep learning? (A) When the model has too few parameters (B) When the model performs well on training data but poorly on test data (C) When the model cannot learn from the training data (D) When the model generalizes well to new data 21. Which of the following is a method to reduce overfitting? (A) Reducing the training data (B) Increasing the learning rate (C) Dropout (D) Using a smaller model 22. What does the term āepochā refer to in training a neural network? (A) A specific layer in the neural network (B) A single iteration of gradient descent (C) A single update to the modelās weights (D) A complete pass through the entire training dataset 23. What is the purpose of the āAdamā optimizer in deep learning? (A) To normalize the input data (B) To reduce the dimensionality of the data (C) To combine the advantages of both SGD and RMSprop (D) To perform weight initialization 24. Which of the following best describes ātransfer learningā? (A) Using a pre-trained model on a new but related task (B) Training a model from scratch (C) Sharing weights between different layers of a network (D) Adjusting the learning rate during training 25. What is the role of the āloss functionā in a neural network? (A) To measure the difference between the predicted and actual values (B) To initialize weights (C) To update the learning rate (D) To reduce overfitting 26. What is the output of a ReLU activation function for a negative input? (A) 0 (B) 1 (C) The negative value itself (D) The absolute value of the input 27. Which of the following is a challenge in training deep neural networks? (A) Insufficient layers (B) Increasing learning rate (C) Limited network capacity (D) Vanishing or exploding gradients 28. Which neural network model is particularly good at handling time-series data? (A) RNN (Recurrent Neural Network) (B) CNN (Convolutional Neural Network) (C) GAN (Generative Adversarial Network) (D) Autoencoder 29. What is the ādropout rateā in a neural network? (A) The learning rate decay factor (B) The fraction of neurons to be dropped during training (C) The percentage of data to be discarded before training (D) The rate at which the model overfits 30. Which of the following is a characteristic of a deep neural network? (A) Only one hidden layer (B) Multiple hidden layers between the input and output layers (C) No activation functions (D) High bias and low variance 31. What is āweight decayā used for in training neural networks? (A) To initialize weights (B) To increase the learning rate (C) To regularize the model and prevent overfitting (D) To reduce the size of the network 32. Which of the following is an advantage of using GPU acceleration for deep learning? (A) Decreased computational resources (B) Increased overfitting risk (C) Reduced model complexity (D) Faster training times 33. What is āearly stoppingā in the context of training a neural network? (A) Increasing the number of epochs (B) Decreasing the learning rate during training (C) Reducing the network size (D) Stopping training when performance on a validation set starts to degrade 34. What does the term āgradient descentā refer to in optimization? (A) A method to initialize weights (B) An iterative method to minimize the loss function (C) A technique to increase the learning rate (D) A way to reduce the number of neurons 35. Which of the following is a type of recurrent neural network architecture? (A) CNN (Convolutional Neural Network) (B) LSTM (Long Short-Term Memory) (C) GAN (Generative Adversarial Network) (D) Autoencoder 36. What is the primary function of a āpooling layerā in a CNN? (A) To initialize weights (B) To increase the number of feature maps (C) To reduce the spatial dimensions of the input (D) To add non-linearity 37. What does āhyperparameter tuningā involve in deep learning? (A) Changing the activation functions (B) Adjusting the parameters of the model to improve performance (C) Reducing the number of layers (D) Normalizing the input data 38. What is a ākernelā in the context of convolutional layers? (A) A data normalization method (B) A type of activation function (C) A regularization technique (D) A small matrix used for filtering the input data 39. In deep learning, what is āfeature extractionā? (A) The process of normalizing the data (B) The technique of increasing model complexity (C) The initialization of network weights (D) The process of identifying and selecting relevant features from raw data 40. What is the main difference between ābatchā and āstochasticā gradient descent? (A) Batch gradient descent uses the entire dataset, while stochastic uses one sample at a time (B) Stochastic gradient descent uses the entire dataset, while batch uses one sample at a time (C) Batch gradient descent is faster than stochastic gradient descent (D) Stochastic gradient descent is used for image data, while batch is used for text data 41. What is the purpose of ādata augmentationā in deep learning? (A) To decrease the complexity of the model (B) To reduce the number of training samples (C) To speed up the training process (D) To artificially increase the size of the training dataset by applying transformations 42. What is a common challenge when training very deep neural networks? (A) Vanishing or exploding gradients (B) Insufficient data (C) High training speed (D) Low computational requirements 43. What does the āreluā activation function output for an input of 5? (A) 1 (B) 0 (C) 5 (D) 5 44. What is āmodel ensembleā in machine learning? (A) Using a single model to make predictions (B) Combining the predictions of multiple models to improve performance (C) Training a model on a single type of data (D) Reducing the number of features in the model 45. What is the primary goal of ādimensionality reductionā? (A) To reduce the number of features in the data while retaining important information (B) To increase the complexity of the model (C) To improve the speed of the learning algorithm (D) To simplify the data preprocessing steps 46. What does āmodel regularizationā aim to address? (A) Underfitting by increasing model complexity (B) Overfitting by adding a penalty to the loss function (C) Reducing the number of training samples (D) Normalizing the input data 47. What is the āsoftmaxā function typically used for in a neural network? (A) To apply non-linearity (B) To normalize input data (C) To convert raw scores into probabilities (D) To initialize weights 48. What is āgradient clippingā used for in training neural networks? (A) To reduce the dimensionality of the input data (B) To accelerate convergence (C) To add noise to gradients (D) To prevent exploding gradients by limiting their values 49. Which of the following best describes ātransfer learningā? (A) Combining multiple models into a single ensemble (B) Training a model from scratch (C) Changing the activation functions of a model (D) Using a pre-trained model on a different but related task 50. What is a āloss landscapeā in neural network optimization? (A) A plot of the training and validation accuracies (B) A visualization of the network architecture (C) A graphical representation of the loss function with respect to model parameters (D) A diagram showing the gradient flow through the network