Welcome to TestSimulate

Pass Your Next Certification Exam Fast!

Everything you need to prepare, learn & pass your certification exam easily.

365 days free updates. First attempt guaranteed success.

Databricks Certified Machine Learning Associate (Databricks-Machine-Learning-Associate) Free Practice Test

Question 1
Which of the following statements describes a Spark ML estimator?

Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 2
A data scientist has defined a Pandas UDF function predict to parallelize the inference process for a single-node model:

They have written the following incomplete code block to use predict to score each record of Spark DataFrame spark_df:

Which of the following lines of code can be used to complete the code block to successfully complete the task?

Correct Answer: E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 3
Which of the following machine learning algorithms typically uses bagging?

Correct Answer: B
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 4
What is the name of the method that transforms categorical features into a series of binary indicator feature variables?

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 5
A data scientist has produced three new models for a single machine learning problem. In the past, the solution used just one model. All four models have nearly the same prediction latency, but a machine learning engineer suggests that the new solution will be less time efficient during inference.
In which situation will the machine learning engineer be correct?

Correct Answer: E
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 6
A data scientist is performing hyperparameter tuning using an iterative optimization algorithm. Each evaluation of unique hyperparameter values is being trained on a single compute node. They are performing eight total evaluations across eight total compute nodes. While the accuracy of the model does vary over the eight evaluations, they notice there is no trend of improvement in the accuracy. The data scientist believes this is due to the parallelization of the tuning process.
Which change could the data scientist make to improve their model accuracy over the course of their tuning process?

Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 7
A data scientist learned during their training to always use 5-fold cross-validation in their model development workflow. A colleague suggests that there are cases where a train-validation split could be preferred over k-fold cross-validation when k > 2.
Which of the following describes a potential benefit of using a train-validation split over k-fold cross-validation in this scenario?

Correct Answer: B
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 8
A data scientist is attempting to tune a logistic regression model logistic using scikit-learn. They want to specify a search space for two hyperparameters and let the tuning process randomly select values for each evaluation.
They attempt to run the following code block, but it does not accomplish the desired task:

Which of the following changes can the data scientist make to accomplish the task?

Correct Answer: C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).