Great article Daniel, thanks! I shared on my last medium post that it was a new topic and a couple of people asked me why - gonna redirect them to this one :-) The compute expense is definitely one of the reason, the other would be the HR expense and the technical entry-level required to setup a data pipeline. For example, to setup an on premise cluster, we needed different IT profiles (devops, network, etc) to just have something up and running while now, by reading a documentation, any software engineer can launch an EMR cluster in a few minutes!