You are building a real-time prediction engine that streams files, which may contain PII (personal identifiable information) data, into Cloud Storage and eventually into BigQuery. You want to ensure that the sensitive data is masked but still maintains referential integrity, because names and emails are often used as join keys. How should you use the Cloud Data Loss Prevention API (DLP API) to ensure that the PII data is not accessible by unauthorized individuals?
A. Create a pseudonym by replacing the PII data with cryptogenic tokens, and store the non-tokenized data in a locked-down button.
B. Redact all PII data, and store a version of the unredacted data in a locked-down bucket.
C. Scan every table in BigQuery, and mask the data it finds that has PII.
D. Create a pseudonym by replacing PII data with a cryptographic format-preserving token.
Answer
D