Welcome to TestSimulate

Pass Your Next Certification Exam Fast!

Everything you need to prepare, learn & pass your certification exam easily.

365 days free updates. First attempt guaranteed success.

Google Certified Professional Data Engineer (Professional-Data-Engineer) Free Practice Test

Question 1
You are loading CSV files from Cloud Storage to BigQuery. The files have known data quality issues, including mismatched data types, such as STRINGS and INT64s in the same column, andinconsistent formatting of values such as phone numbers or addresses. You need to create the data pipeline to maintain data quality and perform the required cleansing and transformation. What should you do?

Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 2
You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL.
What should you do?

Correct Answer: C
Question 3
You need to choose a database for a new project that has the following requirements:
Fully managed
Able to automatically scale up
Transactionally consistent
Able to scale up to 6 TB
Able to be queried using SQL
Which database do you choose?

Correct Answer: C
Question 4
You manage your company's BigQuery data warehouse. You need to implement a solution that enables the data science team to modify data for experiments without affecting the original tables, while minimizing additional storage costs. What should you do?

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 5
Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in thedashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

Correct Answer: A
Question 6
You have a BigQuery dataset named "customers". All tables will be tagged by using a Data Catalog tag template named "gdpr". The template contains one mandatory field, "has sensitive data~. with a boolean value. All employees must be able to do a simple search and find tables in the dataset that have either true or false in the "has sensitive data" field. However, only the Human Resources (HR) group should be able to see the data inside the tables for which "hass-ensitive-data" is true. You give the all employees group the bigquery.
metadataViewer and bigquery.connectionUser roles on the dataset. You want to minimize configuration overhead. What should you do next?

Correct Answer: D
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 7
You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.
You have the following requirements:
You will batch-load the posts once per day and run them through the Cloud Natural Language API.
You will extract topics and sentiment from the posts.
You must store the raw posts for archiving and reprocessing.
You will create dashboards to be shared with people both inside and outside your organization.
You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?

Correct Answer: A
Question 8
Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.

Correct Answer: A
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 9
You are operating a streaming Cloud Dataflow pipeline. Your engineers have a new version of the pipeline with a different windowing algorithm and triggering strategy. You want to update the running pipeline with the new version. You want to ensure that no data is lost during the update. What should you do?

Correct Answer: B
Explanation: Only visible for TestSimulate members. You can sign-up / login (it's free).
Question 10
Your United States-based company has created an application for assessing and responding to user actions.
The primary table's data volume grows by 250,000 records per second. Many third parties use your application's APIs to build the functionality into their own frontend applications. Your application's APIs should comply with the following requirements:
Single global endpoint
ANSI SQL support
Consistent access to the most up-to-date data
What should you do?

Correct Answer: D
Question 11
You are collecting loT sensor data from millions of devices across the world and storing the data in BigQuery.
Your access pattern is based on recent data tittered by location_id and device_version with the following query:
You want to optimize your queries for cost and performance. How should you structure your data?

Correct Answer: A
Question 12
You have data pipelines running on BigQuery, Cloud Dataflow, and Cloud Dataproc. You need to perform health checks and monitor their behavior, and then notify the team managing the pipelines if they fail. You also need to be able to work across multiple projects. Your preference is to use managed products of features of the platform. What should you do?

Correct Answer: C