Skip to main content

Google

It’s Time To Update Your Data Lake Ingestion Strategy

Modern Office: Portrait Of Motivated Black It Programmer Working On Laptop Computer. Male Specialist Create Website, Software Engineer Develop Programme. Shot With Visual Effects Of Running Code.

If you built your data lake in the last 5-7 years, you probably are ingesting data into the data lake using many batch jobs that ingest and update data in the data lake. There is nothing wrong with what you did, but data technology has continued to move forward and in my opinion, the time has come to change your data lake ingestion strategy to reduce the number of high-latency batch jobs and move toward primarily using real-time change data capture (CDC).

What is Change-Data-Capture (CDC)

Young It Engineer Inspecting Data Center ServersTraditionally, businesses used batch-based approaches to move data once or several times a day. However, batch movement introduces latency and reduces the operational value of data to the organization. Change Data Capture (CDC) has emerged as an ideal solution for near real-time movement of data from relational databases (like DB2, SQL Server, or Oracle) to operational data sources or data lakes. Change Data Capture is a software process that identifies and tracks changes to data in a database. CDC provides real-time or near-real-time movement of the tracked data by moving and processing data continuously as new database events occur. In high-velocity data environments where time-sensitive decisions are made, Change-Data-Capture is an excellent fit to achieve low-latency, reliable, and scalable data replication. Change-Data-Capture is also ideal for zero-downtime migrations to the cloud.

The Benefits of Real-Time Data

For many years data strategists believed that 24 hours of data latency was acceptable for data in the data lake that would be used for analysis and to create insights. However, as organizations have become more data-driven and more dependent on data-informed business decisions, beliefs have changed, and real-time or close to real-time data in the data lake offers several benefits to many different types of use cases.  Additionally, over the last several years CDC technology has improved, become more reliable, and in many cases even less expensive than large-batch ELT tools.  So, if organizations can have real-time data in the data lake, organizations can reduce data ingestion operational costs and organizations can do this all without increasing load on operational systems then it seems to me the path is clear. Organizations need to update their data strategies and need to move more towards Change-Data-Capture based data architectures.

Perficient’s Cloud Data Expertise

The world’s leading brands choose to partner with us because we are large enough to scale major cloud projects, yet nimble enough to provide focused expertise in specific areas of your business. Our cloud, data, and analytics team can assist with your entire data and analytics lifecycle, from data strategy to implementation. We will help you make sense of your data and show you how to use it to solve complex business problems. We will assess your current data and analytics issues and develop a strategy to guide you to your long-term goals.

Download the guide, Becoming a Data-Driven Organization with Google Cloud Platform, to learn more about Dr. Chuck’s GCP data strategy

 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Chuck Brooks

Dr. Chuck is a Senior Data Strategist / Solution Architect. He is a technology leader and visionary in big data, data lakes, analytics, and data science. Over a career that spans more than 40 years, Dr. Chuck has developed many large data repositories based on advancing data technologies. Dr. Chuck has helped many companies become data-driven and develop comprehensive data strategies. The cloud is the modern ecosystem for data and data lakes. Dr. Chuck’s expertise lies in the Google Cloud Platform, Advanced Analytics, Big Data, SQL and NoSQL Databases, Cloud Data Management Engines, and Business Management Development technologies such as SQL, Python, Data Studio, Qlik, PowerBI, Talend, R, Data Robot, and more. The following sales enablement and data strategy results from 40 years of Dr. Chuck’s career in the data space. For more information or to engage Dr. Chuck in an engagement, contact him at chuck.brooks@perficient.com.

More from this Author

Follow Us
TwitterLinkedinFacebookYoutubeInstagram