Announcing GCS Object Monitoring with Lariat

 

TL;DR: You can now use Lariat to monitor the data integrity of objects on GCS. Lariat will find data issues with objects as soon as they are written to a user-specified bucket and prefix. Reach out to us if you’re interested!

After releasing our S3 Object Monitoring Product in May, one of the common requests we’ve received is about availability on other Cloud providers. We’re happy to announce that after some hard work from our team over the last few weeks, Lariat is now releasing the GCS Object Monitoring product. 

After a 5-minute install, you can now answer “data integrity” questions for objects on GCS like:

  • Is a significant percentage of objects written in the last 10 minutes for events that happened greater than a day ago?

  • Does my geographic data show a big drop in data for events in Washington in the last 30 minutes?

  • Has a data partner stopped sending data for a subset of contract ids in the last hour? 


The GCS monitoring product follows the main principles we outlined previously on our blog post about monitoring constantly updating datasets on object storage. To recap, the main motivators for this product are: 

  • Object storage is becoming the prominent choice for datasets (all the way from raw data to data served to the lakehouse) 

  • Measuring the data integrity of objects is more than just ensuring valid checksums or successful object creation. Downstream processing needs to be halted as soon as the inspected object has columns with anomalies or missing data 

  • Sifting through a large number of file-ingest events (anywhere from 1000 to 100,000/day) to figure out which ones have issues is cumbersome to data engineering teams


The install process sets up the following architecture to ensure metrics about the object can be tracked as soon as they are written to pre-selected GCS buckets:

Once the product starts collecting data, you can track the timeline of events, see alerts about the data integrity of objects and view dashboards of relevant metrics about the overall dataset. 


Our Data Ingest Monitoring product for GCS is now in open beta. If you’re interested in a free trial, sign up below and we will be in touch within 24 hours.

 
Previous
Previous

How to design a seamless self-service installation experience for Hybrid SaaS

Next
Next

Why you should monitor data at the object storage layer