Background: When data lakes first came on the scene they were usually on-premises Hadoop clusters. Such systems could store any kind of data in its raw format at massive scale. With the Hadoop Distributed File System (HDFS) the data was stored multiple times across the cluster so there was little to chance of data loss…
Read More