1. What is Topic Modeling?
A. The process of summarizing the content of a document
B. The technique used to identify underlying topics in a collection of documents
C. The task of extracting named entities from text
D. A method to convert text into numerical data
Answer: B
2. Which of the following is a common algorithm used in topic modeling?
A. K-means clustering
B. Latent Dirichlet Allocation (LDA)
C. Random Forest
D. Decision Tree
Answer: B
3. In Latent Dirichlet Allocation (LDA), what does the “Dirichlet distribution” represent?
A. The distribution of topics across documents
B. The distribution of words within topics
C. The distribution of words across documents
D. The distribution of the topic-per-word ratio
Answer: A
4. What does the output of topic modeling typically consist of?
A. Word clouds representing frequent words
B. A set of topics with the distribution of words across them
C. A categorized list of documents
D. A ranked list of sentiment scores
Answer: B
5. What is the main goal of Topic Modeling?
A. To group documents into predefined categories
B. To extract topics and discover hidden thematic structures within a collection of text data
C. To perform sentiment analysis on documents
D. To generate word embeddings
Answer: B
6. Which of the following is a common application of topic modeling?
A. Generating captions for images
B. Organizing and summarizing large collections of text
C. Detecting anomalies in data
D. Improving website ranking in search engines
Answer: B
7. What is the key assumption made by Latent Dirichlet Allocation (LDA) in topic modeling?
A. Each document is composed of a mixture of topics
B. Each topic contains a mixture of words
C. The number of topics is known beforehand
D. All of the above
Answer: D
8. In topic modeling, what does “topic coherence” refer to?
A. The degree to which words within a topic are related to each other
B. The quality of topic labels
C. The distribution of topics across the entire dataset
D. The balance of topics within documents
Answer: A
9. Which of the following is a limitation of topic modeling techniques like LDA?
A. They require large labeled datasets
B. They often fail to capture subtle semantic relationships
C. They do not handle numerical data
D. They cannot handle stop words
Answer: B
10. How does the number of topics in a topic modeling algorithm (e.g., LDA) affect the results?
A. A higher number of topics leads to more generalization and abstraction
B. A lower number of topics results in overfitting to specific documents
C. A higher number of topics can result in more granular and specific topic discovery
D. The number of topics does not impact the results significantly
Answer: C
11. Which evaluation metric is often used to assess the quality of topics generated by topic modeling?
A. Accuracy
B. Precision
C. Topic coherence
D. Recall
Answer: C
12. What is “Topic Distribution” in the context of topic modeling?
A. A representation of the probability of each word appearing in a specific topic
B. A representation of the proportion of topics in a document
C. A set of topics generated from a corpus
D. A graph showing the relationships between topics
Answer: B
13. Which of the following is an example of a topic that might be identified in a corpus about technology?
A. “sports, basketball, soccer”
B. “cloud computing, data security, AI”
C. “politics, elections, government”
D. “art, painting, sculpture”
Answer: B
14. What is the main difference between Topic Modeling and Document Classification?
A. Topic modeling automatically assigns categories, while document classification requires labeled data
B. Document classification finds topics, while topic modeling classifies documents
C. Topic modeling is a supervised learning method, whereas document classification is unsupervised
D. There is no difference; they are the same
Answer: A
15. What type of documents is topic modeling most useful for?
A. Documents with a fixed structure, like HTML pages
B. Large, unstructured text corpora
C. Images and video content
D. Structured data such as spreadsheets
Answer: B