Stop Using The Term “Data Engineer”, There’s Something Better
5 overused definitions of the hottest job of the year
5 overused definitions of the hottest job of the year
The Data Engineer scope has never been so large and confusing. Some new roles definitions have been popping up to focus on specific areas, but they are still underused. If you look at Data Engineer job offers, you may get confused as the content and job itself may differ from company to company. In this article, we will remind ourselves what the close cousins of data engineer roles are to be able to spot a misnomer.
📕 Data Engineer aka Database Administrator (DBA)
This role has been fading out as most analytics use cases are based on OLAP systems. Some companies are still heavily relying on OLTP databases (e.g. MySQL, Postgres) for their analytics. They are many reasons for this :
They don't have any pub/sub system to fetch data in real-time and have low-latency query requirements (milliseconds response time)
They still haven't invested in a data platform and just relying on existing software engineers to do analytical jobs
They may need a DBA to maintain their production database: performance tuning, monitoring, migrations, backups, etc.
How to spot a DBA data engineer role? You will probably have a job offer that focuses on a specific database with a heavy requirement on SQL and database/query/stored procedures optimizations.
📗 Data Engineer aka Data Analyst
Data analysts are more focused on business value than internal SQL performance. They usually work with dashboarding tools and provide KPIs, insights directly consumable for the business.
The trap here is that a Data Analyst may do some data pipelines as a lack of data engineer's availability, but that doesn't make it a Data Engineer; it's a different beast.
Software Engineering is not where Data Analyst shines. If you missed the following in a job offer: CI/CD, Programming knowledge (Testing, Python/Java/Scala), infra topics (Terraform, Docker, etc.) — you are probably looking for a Data Analyst job offer, not a data engineer one.
📘 Data Engineer aka Analytics Engineer
Analytics engineer is a pretty new named role. With the Cloud data warehouse emerging (Snowflake, BigQuery, Firebolt), a new era of data engineers was born. These engineers are super-powered "Data analysts".
They apply software engineer best practices (version control, testing, CICD) and usually focus on SQL pipelines & optimization while using a Cloud Data Warehouse technology.
They are usually responsible for data assets (cleaned, transformed data) that are directly used by businesses (at the opposite of just providing "raw" data).
Dbt is often part of their primary tool belt.
🔗 Fishtown Analytics did a great comparison of Data Analyst, Analytics Engineer, & Data Engineer here.
📙 Data Engineer aka Machine Learning Engineer
Data maturity is growing and machine learning is getting more democratized. Therefore, we need dedicated people for the challenge!
This role focuses his time on the Machine learning lifecycle: deploying models (developed by Data Scientists) and turning them into a live production system. They usually have a strong software engineering background.
Examples of some (specifics) tools they use:
ML Platform : AWS SageMaker, GCP Cloud ML
ML Libs : Tensorflows, scikit-learn, Spark ML, Keras, Pytorch
Orchestrations : MLflow, Kubeflow, Airflow
📔 Data Engineer aka Data Platform Engineer
With the data mesh concept getting traction, we tend to have more of a self-service approach where the centralized data team will mostly focus on providing the data platform and tooling to enable other data citizens to be autonomous in terms of data pipelines.
Some of the work of a Data Platform Engineer :
Managing data infrastructure (Kafka, Data Orchestrators, Data Catalog)
Providing ETL tooling (dbt) or framework to rationalize and simplify data pipelines
Creating some microservices/API that provide data.
So who's the real data engineer? 🕵
Today's big challenge is that the data engineer role in a company often doesn't involve solely one of the definitions above but a mixed percentage of 2 or more. However, it's fair to say that if your job is mainly falling into 1 area, then don't use the data engineer title!
When you apply for a role, be sure to ask what's the expected amount of time you will work on these topics, this could give you a better idea of what to expect.
One data engineer can hide another, and your role definition may be completely different from company to company, so watch out! 👀
Mehdi OUAZZA aka mehdio 🧢
Thanks for reading! 🤗 🙌 If you enjoyed this, follow me on 🎥 Youtube, ✍️ Medium, or 🔗LinkedIn for more data/code content!
Support my writing ✍️ by joining Medium through this link