Databricks Certified Machine Learning Associate Exam (Databricks-Machine-Learning-Associate) Free Practice Test

Question 1

Which of the following statements describes a Spark ML estimator?

A. An estimator is a trained ML model which turns a DataFrame with features into a DataFrame with predictions

B. An estimator chains multiple alqorithms toqether to specify an ML workflow

C. An estimator is a hyperparameter arid that can be used to train a model

D. An estimator is an alqorithm which can be fit on a DataFrame to produce a Transformer

E. An estimator is an evaluation tool to assess to the quality of a model

Correct Answer: D

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Question 2

A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:

Which of the following lines of code can be used to complete the code block to successfully complete the task?

A. predict(spark_df.columns)

B. predict(*spark_df.columns)

C. mapInPandas(predict(spark_df.columns))

D. predict(Iterator(spark_df))

E. mapInPandas(predict)

Correct Answer: E

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Question 3

Which of the following machine learning algorithms typically uses bagging?

A. Decision tree

B. Random forest

C. IGradient boosted trees

D. K-means

Correct Answer: B

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Question 4

What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

A. One-hot encoding

B. Target encoding

C. Leave-one-out encoding

D. String indexing

E. Categorical

Correct Answer: A

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Question 5

A data scientist has produced three new models for a single machine learning problem. In the past, the solution used just one model. All four models have nearly the same prediction latency, but a machine learning engineer suggests that the new solution will be less time efficient during inference.
In which situation will the machine learning engineer be correct?

A. When the new solution's models have an average latency that is larger than the size of the original model

B. When the new solution's models have an average size that is larger than the size of the original model

C. When the new solution requires the use of fewer feature variables than the original model

D. When the new solution requires if-else logic determining which model to use to compute each prediction

E. When the new solution requires that each model computes a prediction for every record

Correct Answer: E

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Question 6

A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

A. Change the number of compute nodes to be half or less than half of the number of evaluations.

B. Change the number of compute nodes to be double or more than double the number of evaluations.

C. Change the number of compute nodes and the number of evaluations to be much larger but equal.

D. Change the iterative optimization algorithm used to facilitate the tuning process.

Correct Answer: D

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Question 7

A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

A. Fewer hyperparameter values need to be tested when using a train-validation split

B. Fewer models need to be trained when using a train-validation split

C. Bias is avoidable when using a train-validation split

D. Reproducibility is achievable when using a train-validation split

E. A holdout set is not necessary when using a train-validation split

Correct Answer: B

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Question 8

A data scientist is attempting to tune a logistic regression model logistic using scikit-learn. They want to specify a search space for two hyperparameters and let the tuning process randomly select values for each evaluation.
They attempt to run the following code block, but it does not accomplish the desired task:

Which of the following changes can the data scientist make to accomplish the task?

A. Replace the GridSearchCV operation with cross_validate

B. Replace the random_state=0 argument with random_state=1

C. Replace the GridSearchCV operation with RandomizedSearchCV

D. Replace the GridSearchCV operation with ParameterGrid

E. Replace the penalty= ['12', '11'] argument with penalty=uniform ('12', '11')

Correct Answer: C

Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).

Welcome to TestSimulate

Databricks Certified Machine Learning Associate (Databricks-Machine-Learning-Associate) Free Practice Test