C-Suite On Deck
Keep me plugged in with the best
Join thousands of your peers and receive our weekly newsletter with the latest news, industry events, customer insights, and market intelligence.
I agree to the
terms of service
PLEASE CORRECT THE FOLLOWING:
Please Enter Some Keywords
New file checksum feature lets you validate data transfers between HDFS and Cloud Storage
When you’re copying or moving data between distinct storage systems such as multiple Apache Hadoop Distributed File System (HDFS) clusters or between HDFS and Cloud Storage, it’s a good idea to perform some type of validation to guarantee data integrity. This validation is essential to be sure data wasn’t altered during transfer.For Cloud Storage, this validation happens automatically client-side with commands like gsutil cp and rsync. Those commands compute local file checksums, which are then validated against the checksums computed by Cloud Storage at the end of each operation. If the checksums do not match, gsutil deletes the invalid copies and prints a warning message. This mismatch rarely happens, and if it does, you can retry the operation.Now, there’s also a way to automatically perform end-to-end, client-side validation in Apache Hadoop across heterogeneous Hadoop-compatible file systems like HDFS and Cloud Storage. Our Google engineers recently added the feature to Apache Hadoop, in collaboration with Twitter and members of the Apache Hadoop open-source community.
I'm for real
Enter your email once to access all our information and resources.
(Your email address is required so we know you're a real person)
By downloading this content, you give permission for your contact information to be shared with the content provider who may contact you in regards to the content.