How Data Quality Analysis Defines Your Architecture in Google Cloud

How Data Quality Analysis Defines Your Architecture in Google Cloud



Have you ever wondered why engineers choose Cloud Run or BigQuery? It’s not intuition: it’s evidence! This video is a practical tutorial on how to build a fail-safe Data Pipeline. We’ll walk you through the step-by-step process to validate the viability of any external data source (like the DNCP API) and how those results become the blueprint for your architecture in Google Cloud. You will learn the “Causality Chain”: PHASE 1: The Data Stress Test. Problem: The API is external and does not document its limits (Rate Limit). Solution: We do a benchmark (stress test) to measure the P99 (worst case latency) and thus define the safe limit in Cloud Run. PHASE 2: Cleaning for AI. Problem: 6.6% of the data has an amount of zero, which would severely bias any AI Model. Solution: We designed a cleanup rule in SQL (Dataform) that uses advanced statistics to neutralize these values ​​BEFORE they reach BigQuery. PHASE 3: System Design. Why BigQuery? Because we need to transform complex JSON-LD data and consolidate the data warehouse for high-speed access. Why Cloud Memorystore? Because the API token expires fast (OAuth). We use Redis so that the system is never left without access. The end result: A document (Single Source of Truth – SSOT) that not only says what was done, but demonstrates the why of each line of code and each GCP service. Key Resources for your Learning: Technologies: BigQuery, Cloud Run, Cloud Memorystore, Dataform. Concepts: SSOT, Data Quality, Causality Chain, JSON-LD. Subscribe to master Data Engineering with rigor and Google Cloud!

(Source: YouTube Channel Adalberto Gomez)


soure youtube
Seluruh konten video yang ditayangkan dari platform pihak ketiga seperti YouTube di situs PortalTujuh.com sepenuhnya merupakan tanggung jawab pemilik akun dan/atau kanal YouTube terkait.

PortalTujuh.com tidak bertanggung jawab atas isi, narasi, opini, maupun klaim yang terdapat dalam video tersebut. Kami hanya menayangkan ulang (embed) video dari sumber publik sebagai pelengkap informasi, bukan sebagai representasi pandangan redaksi kami.

Jika Anda merasa ada konten video yang melanggar hak cipta, norma, atau hukum yang berlaku, silakan laporkan langsung ke pihak YouTube melalui mekanisme yang tersedia.