Breaking down the main activities of a data engineer in 2021

Coding [Digital Image] https://unsplash.com/@jefflssantos | Spongebob Cleaning [Digital Image] https://imgflip.com/meme/81959717/Spongebob-Cleaning

Data engineering's role in 2021 has been scaling beyond the scope for a better or for worse. Therefore, multiple definitions of the role are popping up. Does the data engineer do more analytics (aka new role definition, analytics engineer), data pipelines, handling more infrastructure (DevOps), or machine learning engineering? Basically, it’s getting a bit blurry on what an average data engineer will spend his time. However, these categories fall into technical activities, and we often forget that it represents just a chunk of the time spent. …


Data Skills Radar Dashboard http://dataskillsradar.amaaai.com/

The tech scene is evolving fast. Blazingly fast. There are so many new projects, frameworks, and cloud API services popping up, it is just too difficult to constantly stay up-to-date for a software engineer. To help our community stay on track, we (Adriaan Slechten, Grégoire Hornung, Vincent Claes, and myself) have developed a dashboard that monitors the technical skills that are currently trending. The dashboard is available here and has a particular focus on the data skills.

In this blog post, we will go through the dashboard's backstage and share some insights and comments on these based on our experience.

How we did it?


In this blog post, I will cover a few elements that should motivate you to dockerize your development environment and give you a repo example on how you can smoothly achieve this with VS Code. The idea here goes further than just "I have my application dockerized that I can test locally," and creates a complete development experience entirely (or almost?) in docker. I will also share my general experience, and limits I have encountered while having all my development environment dockerized for the past few months.

So… Why?

It uses the same runtime environment as your application

  • A good practice nowadays is to provide a Dockerfile for either the target…


Copyright Databricks

Originally called “Spark Summit” and now drifting to AI (to follow the hype of course), the summit, organized by Databricks(founder of Spark, Delta, MLflow) brings together all top tech companies with mature experience in data science and data engineering with more than 200 sessions. So even if you are not a spark-fan boy (no, I won't talk about spark 3.0), there’s a lot to learn from this event. Bonus this year: the event was online and free and as usual, all talks+slides are available here.

In these takeaways focusing on the data engineering topics, I'll provide as resources, the most…

Mehdi Ouazza

Data Engineer. Geek entrepreneur, eager to learn and who’s passionate about Big data, Data science, web app https://mehd.io

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store