Databricks Certified Professional Data Engineer (Databricks-Certified-Professional-Data-Engineer) Free Practice Test
Question 1
The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs Ul. A new data engineering hire is onboarding to the team and has requested access to one of these notebooks to review the production logic.
What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?
What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?
Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 2
The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.
A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.
Which statement captures best practices for this situation?
A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.
Which statement captures best practices for this situation?
Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 3
All records from an Apache Kafka producer are being ingested into a single Delta Lake table with the following schema:
key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG There are 5 unique topics being ingested. Only the " registration " topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.
Which of the following solutions meets the requirements?
key BINARY, value BINARY, topic STRING, partition LONG, offset LONG, timestamp LONG There are 5 unique topics being ingested. Only the " registration " topic contains Personal Identifiable Information (PII). The company wishes to restrict access to PII. The company also wishes to only retain records containing PII in this table for 14 days after initial ingestion. However, for non-PII information, it would like to retain these records indefinitely.
Which of the following solutions meets the requirements?
Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 4
Which statement regarding spark configuration on the Databricks platform is true?
Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 5
An organization processes customer data from web and mobile applications. Data includes names, emails, phone numbers, and location history. Data arrives both as batch files (from SFTP daily) and streaming JSON events (from Kafka in real-time).
To comply with data privacy policies, the following requirements must be met:
* Personally Identifiable Information (PII) such as email, phone number, and IP address must be masked or anonymized before storage.
* Both batch and streaming pipelines must apply consistent PII handling.
* Masking logic must be auditable and reproducible.
* The masked data must remain usable for downstream analytics.
How should the data engineer design a compliant data pipeline on Databricks that supports both batch and streaming modes, applies data masking to PII, and maintains traceability for audits?
To comply with data privacy policies, the following requirements must be met:
* Personally Identifiable Information (PII) such as email, phone number, and IP address must be masked or anonymized before storage.
* Both batch and streaming pipelines must apply consistent PII handling.
* Masking logic must be auditable and reproducible.
* The masked data must remain usable for downstream analytics.
How should the data engineer design a compliant data pipeline on Databricks that supports both batch and streaming modes, applies data masking to PII, and maintains traceability for audits?
Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 6
A data engineer is designing a Lakeflow Spark Declarative Pipeline to process streaming order data. The pipeline uses Auto Loader to ingest data and must enforce data quality by ensuring customer_id is not null and amount is greater than zero. Invalid records should be dropped. Which Lakeflow Spark Declarative Pipelines configuration implements this requirement using Python?
Correct Answer: B
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 7
A data engineering team uses Databricks Lakehouse Monitoring to track the percent_null metric for a critical column in their Delta table. The profile metrics table ( prod_catalog.prod_schema.
customer_data_profile_metrics ) stores hourly percent_null values. The team wants to trigger an alert when the daily average of percent_null exceeds 5% for three consecutive days, while ensuring notifications are not spammed during sustained issues. Which SQL alert configuration achieves this goal while minimizing false positives and redundant notifications?
customer_data_profile_metrics ) stores hourly percent_null values. The team wants to trigger an alert when the daily average of percent_null exceeds 5% for three consecutive days, while ensuring notifications are not spammed during sustained issues. Which SQL alert configuration achieves this goal while minimizing false positives and redundant notifications?
Correct Answer: B
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 8
Review the following error traceback:

Which statement describes the error being raised?

Which statement describes the error being raised?
Correct Answer: C
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 9
Two of the most common data locations on Databricks are the DBFS root storage and external object storage mounted with dbutils.fs.mount().
Which of the following statements is correct?
Which of the following statements is correct?
Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 10
A healthcare analytics team is implementing a dimensional model in Delta Lake for patient care analysis.
They have a date dimension table and are evaluating design options to ensure it supports a wide range of time- based analyses.
Which design approach for the date dimension will support efficient time-based querying and aggregation?
They have a date dimension table and are evaluating design options to ensure it supports a wide range of time- based analyses.
Which design approach for the date dimension will support efficient time-based querying and aggregation?
Correct Answer: B
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).