supervised learning is mcq

Suppose you have been given a fair coin and you want to find out the odds of getting heads. What happens when you get features in lower dimensions using PCA?1.The features will still have interpretability2.The features will lose interpretability3.The features must carry all information present in data4.The features may not carry all information present in dataA. What would you think how many times we need to train SVM in such case?A) 1B) 2C) 3D) 4Solution: ATraining the SVM only one time would give you appropriate results. True-False: Is it possible to apply a logistic regression algorithm on a 3-class Classificationproblem?A) TRUEB) FALSESolution: AYes, we can apply logistic regression on 3 classification problem, We can use One Vs all method for 3 class classification in logistic regression. What is/are true about kernel in SVM?1. Multiple Choice Questions MCQ on Distributed Database with answers Distributed Database – Multiple Choice Questions with Answers 1... Find minimal cover of set of functional dependencies example, Solved exercise - how to find minimal cover of F? We are lowering the bias2. B. After that, the machine is provided with a new set of examples (data) so that supervised learning algorithm analyses the … 1. d. categorical attribute. Which of the following is true about AIC?A) We prefer a model with minimum AIC valueB) We prefer a model with maximum AIC value. Number of tree should be as large as possible2. When we take the natural log of the odds function, we get a range of values from -∞ to ∞. What will happen when you fit degree 4 polynomial in linear regression?A) There are high chances that degree 4 polynomial will over fit the dataB) There are high chances that degree 4 polynomial will under fit the dataC) Can’t sayD) None of theseSolution: (A)Since is more degree 4 will be more complex(overfit the data) than the degree 3 model so it will again perfectly fit the data. In supervised learning, each example is a pair consisting of an input object (typically a vector) and the desired output value (also called the supervisory signal). FALSESolution: (A)Sometimes it is very useful to plot the data in lower dimensions. Which of the following are real world applications of the SVM?A) Text and Hypertext CategorizationB) Image ClassificationC) Clustering of News ArticlesD) All of the aboveAns Solution: DSVM’s are highly versatile models that can be used for practically all real world problems ranging from regression to clustering and handwriting recognitions. c) both a & b. d) none of … present the interesting structure that is present in the data. type of machine learning in which the response variable is unknown. The attributes have 3, 2, 2, and 2 possible values each. Which of the following statement is true about outliers in Linear regression?A) Linear regression is sensitive to outliersB) Linear regression is not sensitive to outliersC) Can’t sayD) None of theseSolution: (A)The slope of the regression line will change due to outliers in most of the cases. Supervised learning algorithm should have input variables (x) and an target variable (Y) when you train the model . Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. These short solved questions or quizzes are provided by Gkseries. Supervised learning, in the context of artificial intelligence (AI) and machine learning, is a type of system in which both input and desired output data are provided. Removing columns which have high variance in dataC. Inductive Learning. Suppose you are building a SVM model on data X. In which of the following scenario a gain ratio is preferred over Information Gain?A) When a categorical variable has very large number of categoryB) When a categorical variable has very small number of categoryC) Number of categories is the not the reasonD) None of theseSolution: AWhen high cardinality problems, gain ratio is preferred over Information Gain technique.20. Machine learning techniques differ from statistical techniques in that machine learning methodsa) typically assume an underlying distribution for the data.b) are better able to deal with missing and noisy data.c) are not able to explain their behavior.d) have trouble with large-sized datasets.Ans : Solution B. Some telecommunication company wants to segment their customers into distinct groups in order to send appropriate subscription offers, this is an example of A. Unsupervised learning does not use output data. 21. Which of the above decision boundary shows the maximum regularization?A) AB) BC) CD) All have equal regularizationSolution: ASince, more regularization means more penality means less complex decision boundry that shows in first figure A. It helps in picking out the K-modes clustering algorithmd. True-False: Is Logistic regression mainly used for Regression?A) TRUEB) FALSESolution: BLogistic regression is a classification algorithm, don’t confuse with the name regression. Choose the option which describes bias in best manner.A) In case of very large x; bias is lowB) In case of very large x; bias is highC) We can’t say about biasD) None of theseSolution: (B)If the penalty is very large it means model is less complex, therefore the bias would be high. Individual tree is built on a subset of observations4. 4. 47. Computers are best at learninga) facts.b) concepts.c) procedures.d) principles.Ans : Solution A, 10. The cost parameter in the SVM means:A) The number of cross-validations to be madeB) The kernel to be usedC) The tradeoff between misclassification and simplicity of the modelD) None of the aboveAns Solution: CThe cost parameter decides how much an SVM should be allowed to “bend” with the data. We are increasing the varianceA) 1 and 2B) 2 and 3C) 1 and 4D) 2 and 4Solution: CBetter model will lower the bias and increase the variance, 25. c) Attributes are Algorithms are left to their own devices to help discover and 2. analysis tool. A machine learning problem involves four attributes plus a class. Which of the following methods do we use to best fit the data in Logistic Regression?A) Least Square ErrorB) Maximum LikelihoodC) Jaccard distanceD) Both A and BAns Solution: B, 2. What does this value tell you?a) The attributes are not linearly related.b) As the value of one attribute increases the value of the second attribute also increases.c) As the value of one attribute decreases the value of the second attribute increases.d) The attributes show a curvilinear relationship.Ans : Solution C, 35. C Active learning. the class value. Now, How many local minimas are present in the graph?A) 1B) 2C) 3D) 4Solution: CThere are three local minima present in the graph. 16. less uncertain and high entropy means more uncertain. In such case, is it right toconclude that V1 and V2 do not have any relation between them?A) TRUEB) FALSESolution: (B)Pearson correlation coefficient between 2 variables might be zero even when they have arelationship between them. Random Forest is a black box model you will lose interpretability after using it. Suppose you are using a bagging based algorithm say a RandomForest in model building.Which of the following can be true?1. True-False: Linear Regression is a supervised machine learning algorithm.A) TRUEB) FALSESolution: (A)Yes, Linear regression is a supervised learning algorithm because it uses true labels for training. 7 Which of the following is/are true about bagging trees?1. Machine Learning subject, having subject no. True-False: Linear Regression is mainly used for Regression.A) TRUEB) FALSESolution: (A)Linear Regression has dependent variables that have continuous values. 3. Explanation: Perceptron learning law is supervised, nonlinear type of learning. 44. D. 24. With Bayes classifier, missing data items area) treated as equal compares.b) treated as unequal compares.c) replaced with a default value.d) ignored.Ans : Solution B, 43. classification problems. (D) AI is … Question Context 32-33:We have been given a dataset with n records in which we have input attribute as x and output attribute as y. Machine learning MCQs. You will have interpretability after using Random ForestA) 1B) 2C) 1 and 2D) None of theseAns Solution: ASince Random Forest aggregate the result of different weak learners, If It is possible we would want more number of trees in model building. We are increasing the bias4. We do not claim any copyright of the above content, For any Suggestions / Queries / Copyright Claim / Content Removal Request contact us at, READ MORE: 10 Best Machine Learning Institutes in Pune 2020, READ MORE: The Complete Guide To Become A Machine Learning Engineer, 7 Tips To Fix Slow Internet Issue on Your Mobile, 30 Mind-Blowing LinkedIn Facts You Need to Share, Easy Step By Step Guide To Restrict Background Data, Top 10 Food Bloggers In India You Must Follow, 10 Best Machine Learning Institutes in Pune 2020, The Complete Guide To Become A Machine Learning Engineer, Complete Information and Cyber Security MCQs | SPPU Final Year, 5 Easy Steps To Delete Telegram Account Permanently. PCA would give the same result if we run again, but not k-means, 1. to new instances. Suppose you plotted a scatter plot between the residuals and predicted values in linearregression and you found that there is a relationship between them. b) read only. of disorder or purity or unpredictability or uncertainty. Now, you want to add a few new features in the same data. For clusters with arbitrary shapes, these algorithms But, human and animal learning are unsupervised. Sanfoundry Global Education & Learning Series – Neural Networks. Individual tree is built on full set of observationsA) 1 and 3B) 1 and 4C) 2 and 3D) 2 and 4Solution: ARandom forest is based on bagging concept, that consider faction of sample and faction of feature for building the individual trees. Does the decision boundary will change?A) YesB) NoSolution: AThese three examples are positioned such that removing any one of them introduces slack in the constraints. It is like learning under the guidance of a teacher; Training dataset is like a teacher which is used to train the machine; Model is trained on a pre-defined dataset before it starts making decisions when given new data; Kernel function map low dimensional data to high dimensional space2. A multiple regression model hasa) only one independent variableb) more than one dependent variablec) more than one independent variabled) none of the aboveAns : Solution B, 17. Some times, feature normalization is not feasible in case of categorical variables3. Classification is used to predict a discrete class or label(Y). pairs. True- False: Overfitting is more likely when you have huge amount of data to train?A) TRUEB) FALSESolution: (B)With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e. Which of the following scenario would give you the right hyper parameter?A) 1B) 2C) 3D) 4Solution: (B)Option B would be the better option because it leads to less training as well as validation error. B) Some of the coefficient will be approaching to zero but not absolute zeroC) Both A and B depending on the situationD) None of theseSolution: (A)As already discussed, lasso applies absolute penalty, so some of the coefficients will become zero. How many seconds would it require to train one-vs-all method endto end?A) 20B) 40C) 60D) 80Solution: BIt would take 10×4 = 40 seconds, 29 Suppose your problem has changed now. large datasets, increasing interpretability but at the same time minimizing Which of the following is/are not true about DBSCAN clustering algorithm:1. Select the option(s) which is/are correct in such a case.Note: Consider remaining parameters are same.A) Training accuracy increasesB) Training accuracy increases or remains the sameC) Testing accuracy decreasesD) Testing accuracy increases or remains the same. means that the partitions in classification are. following is NOT supervised Supervised learning B. Unsupervised learning C. Reinforcement learning Ans: B. 2 and 4Ans Solution: (D)When you get the features in lower dimensions then you will lose some information of data most of the times and you won’t be able to interpret the lower dimension data. Now, think that you increase the complexity (or degree of polynomial of this kernel). Suppose you have same distribution of classes in the data. 43. D Reinforcement learning. Individual tree is built on all the features3. The process of forming general concept definitions from examples of concepts to belearned.a) Deductionb) abductionc) inductiond) conjunctionAns : Solution C, 9. 21. As part of DataFest 2017, we organized various skill tests so that data scientists can assess themselves on these critical skills. AdaBoost4. Supervised learning is the machine learning task of learning Supervised learning B. c. input attribute. Classification basically involves assigning new input variables (X) to the class to which they most likely belong in based on a classification model that was built from the training data that was already labeled. Supervised learning differs from unsupervised clustering in that supervised learning requiresa) at least one input attribute.b) input attributes to be categorical.c) at least one output attribute.d) output attributes to be categorical.Ans : Solution B, 13. 4. 1 and 2C. True-False: Is Logistic regression a supervised machine learning algorithm?A) TRUEB) FALSESolution: ATrue, Logistic regression is a supervised learning algorithm because it uses true labels fortraining. A nearest neighbor approach is best useda) with large-sized datasets.b) when irrelevant attributes have been removed from the data.c) when a generalized model of the data is desirable.d) when an explanation of what has been found is of primary importance.Ans : Solution B, 22. Which of the following is required by K-means clustering?a) defined distance metricb) number of clustersc) initial guess as to cluster centroidsd) all of the mentionedAnswer: dExplanation: K-means clustering follows partitioning approach. A higher degree(Right graph) polynomial might have a very high accuracy on the train population but is expected to fail badly on test dataset. To practice all areas of Neural Networks, here is complete set on 1000+ Multiple Choice Questions and Answers . The minimum time complexity for training an SVM is O(n2). 2 and 3Solution: D DBSCAN can form a cluster of any arbitrary shape and does not have strong assumptions for the distribution of data points in the data space. DBSCAN has a low time complexity of order O (n log n) only. For example, grade A should be consider as high grade than grade B. The multiple coefficient of determination is computed bya) dividing SSR by SSTb) dividing SST by SSRc) dividing SST by SSEd) none of the aboveAns : Solution C, 20. 3. This section focuses on "Machine Learning" in Data Science. We build the N regression with N bootstrap sample2. Consider V1 as x and V2 as |x|. 41. Which of the following option is true?A) Linear Regression errors values has to be normally distributed but in case of LogisticRegression it is not the caseB) Logistic Regression errors values has to be normally distributed but in case of LinearRegression it is not the caseC) Both Linear Regression and Logistic Regression error values have to be normally distributedD) Both Linear Regression and Logistic Regression error values have not to be normallydistributedAns Solution: A, 11. 8. 18. 41. Naive Bayes is a Choose which of the following options is true regarding One-Vs-All method in Logistic Regression.A) We need to fit n models in n-class classification problemB) We need to fit n-1 models to classify into n classesC) We need to fit only 1 model to classify into n classesD) None of theseSolution: AIf there are n classes, then n separate logistic regression has to fit, where the probability of each category is predicted over the rest of the categories combined. Attributes are statistically dependent The second model is more robust than first and third because it will perform best on unseen data.4. 22. to its various techniques like clustering, classification, etc. High entropy 4 onlyD. Which of the following methods do we use to best fit the data in Logistic Regression?A) Least Square ErrorB) Maximum LikelihoodC) Jaccard distanceD) Both A and BSolution: BLogistic regression uses maximum likely hood estimate for training a logistic regression. These tests included Machine Learning, Deep Learning, Time Series problems and Probability. 52. Since data is fixed and SVM doesn’t need to search in big hypothesis space. What is true about feature normalization?1. This subject gives knowledge from the introduction of Machine Learning terminologies and types like supervised, unsupervised, etc. Which of the following algorithms do we use for Variable Selection?A) LASSOB) RidgeC) BothD) None of theseSolution: AIn case of lasso we apply a absolute penality, after increasing the penality in lasso some of the coefficient of variables may become zero, Context: 48-49Consider a following model for logistic regression: P (y =1|x, w)= g(w0 + w1x)where g(z) is the logistic function. Supervised learning and unsupervised clustering both require at least onea) hidden attribute.b) output attribute.c) input attribute.d) categorical attribute.Ans : Solution A, 12. 2 and 3D. Notes, tutorials, questions, solved exercises, online quizzes, MCQs and more on DBMS, Advanced DBMS, Data Structures, Operating Systems, Natural Language Processing etc. 1 and 3B. 55. Question Context 37-38:Suppose, you got a situation where you find that your linear regression model is under fittingthe data.37. Another name for an output attribute.a) predictive variableb) independent variablec) estimated variabled) dependent variableAns : Solution B, 23. E.g. of the following methods is the most appropriate? 58. If you are a data scientist, then you need to be good at Machine Learning – no two ways about it. Gradient Boosting2. of desired clusters5. 15. (B) ML and AI have very different goals. Which of the following is not supervised learning? Now, Imagineyou want to add a variable in variable space such that this added feature is important. 57. DATA MINING Multiple Choice Questions :-1. The data X can be error prone which means that you should not trust any specific data point too much. What would happen when you use very small C (C~0)?A) Misclassification would happenB) Data will be correctly classifiedC) Can’t sayD) None of theseSolution: AThe classifier can maximize the margin between most of the points, while misclassifying a few points, because the penalty is so low. to its various techniques like clustering, classification, etc. Which of the following statement(s) is true about β0 and β1 values of two logistics models (Green, Black)?Note: consider Y = β0 + β1*X. So, here are the MCQs on the subject Machine Learning from the course of Computer branch, SPPU, which will clearly help you out on the upcoming exams. 15. upport vectors are the data points that lie closest to the decision surface. (C) ML is a set of techniques that turns a dataset into a software. This supervised learning technique can process both numeric and categorical input attributes.a) linear regressionb) Bayes classifierc) logistic regressiond) backpropagation learningAns : Solution A, 42. Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. As we know, the syllabus of the upcoming final exams contains only the first four units of this course, so, the below-given MCQs cover the first 4 units of ML subject as:-, Unit 4. The problem of finding hidden structure in unlabeled data is called A. Question Context:8– 9Suppose you are using a Linear SVM classifier with 2 class classification problem. 42. 2. It has strong assumptions for the distribution of data points in dataspace3. 1 and 3D. 63. 2, 2, and 2 possible values each. information being processed. Machine Learning Multiple Choice Questions and Answers. If there exists any relationship between them, it means that the model has not perfectly captured the information in the data. :- 410250, the first compulsory subject of 8th semester and has 3 credits in the course, according to the new credit system. Individual tree is built on full set of observationsA) 1 and 3B) 1 and 4C) 2 and 3D) 2 and 4Ans Solution: ARandom forest is based on bagging concept, that consider faction of sample and faction of feature for building the individual trees. 10. A term used to describe the case when the independent variables in a multiple regression modelare correlated isa) Regressionb) correlationc) multicollinearityd) none of the aboveAns : Solution C, 15. of the data object. We wish to produce clusters of many different sizes and shapes. information loss. Supervised learning can be divided into two categories: classification and regression. It does not require prior knowledge of the no. Principal Component Analysis (PCA) is not predictive Based upon that give the answer for following question.What would happen when you use very large value of C(C->infinity)?Note: For small C was also classifying all data points correctly. Some of the questions th… Which of the following algorithm is most sensitive to outliers?a. 31. It is also simply referred to as the cost of misclassification. C. 12. Machine Learning being the most prominent areas of the era finds its place in the curriculum of many universities or institutes, among which is Savitribai Phule Pune University(SPPU). Which statement is true about neural network and linear regression models?a) Both models require input attributes to be numeric.b) Both models require numeric attributes to range between 0 and 1.c) The output of both models is a categorical attribute value.d) Both techniques build models whose output is determined by a linear sum of weightedinput attribute values.Ans : Solution A, 27. A) 1B) 2C) 3D) 4Solution: BScenario 2 and 4 has same validation accuracies but we would select 2 because depth is lower is better hyper parameter. any conclusions from that information. 12. Looking at above two characteristics, which of the following option is the correct forPearson correlation between V1 and V2? Which of the following is true about individual (Tk) tree in Random Forest?1. Solution: BThe gamma parameter in SVM tuning signifies the influence of points either near or far away from the hyperplane. learning? Solution: DIf you decrease the number of iteration while training it will take less time for surly but will not give the same accuracy for getting the similar accuracy but not exact you need to increase the learning rate. The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Bagging is the method for improving the performance by aggregating the results of weaklearnersA) 1B) 2C) 1 and 2D) None of theseAns Solution: CBoth options are true. What is Supervised Learning? Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output. Which of the following applied on warehouse? Suppose you are using a bagging based algorithm say a RandomForest in model building. NaÃ¯ve Bayes classifier It is the method for improving the performance by aggregating the results of weak learnersA) 1B) 2C) 1 and 2D) None of theseAns Solution: BIn boosting tree individual weak learners are not independent of each other because each tree correct the results of previous tree. 18. Which of the followingconclusion do you make about this situation?A) Since the there is a relationship means our model is not goodB) Since the there is a relationship means our model is goodC) Can’t sayD) None of theseSolution: (A)There should not be any relationship between predicted values and residuals. In Random forest you can generate hundreds of trees (say T1, T2 …..Tn) and then aggregate the results of these tree. Theme images by, Top 5 Machine Learning Quiz Questions with Answers explanation, Interview questions on machine learning, quiz questions for data scientist answers explained, machine learning exam questions, 1. Low entropy means 25. 2 onlyC. 1.True- False: Over fitting is more likely when you have huge amount of data to train?A) TRUEB) FALSEAns Solution: (B)With a small training dataset, it’s easier to find a hypothesis to fit the training data exactly i.e. Random Forest is a black box model you will lose interpretability after using it. FalseAns Solution: (A)Decision trees can also be used to for clusters in the data but clustering often generates natural clusters and is not dependent on any objective function. If the correlation coefficient is zero, it just means that that theydon’t move together. b. output attribute. 2 onlyc. Selecting data so as to assure that each class is properly represented in both the training andtest set.a) cross validationb) stratificationc) verificationd) bootstrappingAns : Solution B, 30. 26. A) Bias will be highB) Bias will be lowC) Can’t sayD) None of theseSolution: AModel will become very simple so bias will be very high. Suppose, You applied a Logistic Regression model on a given data and got a training accuracy X and testing accuracy Y. Individual tree is built on a subset of the features2. 22. This technique associates a conditional probability value with each data instance.a) linear regressionb) logistic regressionc) simple regressiond) multiple linear regressionAns : Solution B, 41. According to this fact, what sizes of datasets are not best suited for SVM’s? 35. PCA can be used for projecting and visualizing data in lower dimensions.A. [True-False] Standardisation of features is required before training a Logistic Regression.A) TRUEB) FALSESolution: BStandardization isn’t required for logistic regression.