Skip to content
Opinie artikel

Centralized vs Decentralized data engineering

Don’t let the coin flip decide

As a consultant in the data engineering field, I see my share of lakes and meshes. Often the question is raised, which option will fit the best, a data mesh or a data lake. This question boils down to, should data engineering be centralized or decentralized? This blog will introduce the two flavors, show their differences and give some clues as to what could be a good option for your enterprise. If you have any feedback or questions, feel free to let me know in the comments. If you feel that your feedback does not fit in that small box or if you want to know more, please reach out to me at ‘ruud.cools at codecentric dot nl’.

Centralized data engineering

The characteristic of this option is that there is a single component responsible for a specific task, an example is the data lake. This data lake is a single centralized repository for all enterprise data. One of the goals of the lake is to break down the data silos that inhibit the reuse of data throughout multiple departments of a company. Having a single source of truth for information that is essential for business is another one. Another characteristic of the centralized approach is the dedicated team or department responsible for the lake and all the essential components that surround it, e.g. compute, metadata, and security. Often, they are also responsible for transforming processes or elements of a process, a so-called use case, to become data driven or allow data supported decision making. During the development of the use case, the required data is ingested into the lake and possibly also the transformed data is stored back into the lake, completing the circle.

If we take a small step back and list the key elements of centralized engineering we see the following:

  • A single component is responsible for one or more tasks
  • A single group responsible for the lake and essential components
  • Driven from a business perspective to transform processes
  • Grows organically as new use cases are included
  • Often a highly standardized methodology is used

Meer lezen:

https://blog.the-experts.nl/ruudcools/centralized-vs-decentralized-data-engineering-kpf

Also interesting