Wanted: Data Architects with a holistic perspective

Table of Contents

Holistic perspective on data

In practice, Martijn Evers, co-founder of i-Refact, believes there’s a desire for data architects with a holistic vision (read part 1 of our interview here). Architects who can effortlessly switch between various modes. He jokingly refers to the contrast between a gorilla architect, who is assertive and supported by the direction, and a guerrilla architect, who doesn’t have a wide base of support in the organization due to all kinds of politically sensitive matters, and thus is forced to operate under the radar. Professional data architects should always ask themselves “Are we doing the right things”, while others are still busy trying to answer the question of whether they’re doing things right. A full-scale data architect is critical and cynical, but can also be nice, without necessarily being friendly. A good data architect confronts people with the choices they make (or don’t make) and points out the possible consequences.

Borrowing from IT

A data architect who borrows from IT and tries to speak their language lacks credibility. They have to translate the “language of data/information” for the business, because ‘Wovon man nicht sprechen kann, darüber muss man schweigen’, as Lutwig Wittgenstein once said (whereof one cannot speak, thereof one must be silent, for the less German-inclined among us). Up to this point, data architects have mostly been solving IT problems. They tried, against better judgment, to prevent chaos in the IT department. They have to stop doing that at once. They have their own responsibility, namely judging the quality of the data architecture.

Archetypes

Evers thinks there are several archetypes of architects. Every archetype has their own way of managing uncertainty, adaptivity, and meeting specific client requirements.

A so-called prefab architect will want to dutifully follow the rules. In a predictable and highly standardized environment, this archetype will feel right at home.

But a tweaking architect is probably a lot more flexible and willing to bend the rules to their whims. This archetype will flourish in an environment with a moderate degree of standardization with a clearly growing amount of uncertainty.

A prefab architect tends to dutifully follow the rules.

The concern-driven architect is an idealized type of architect who doesn’t just follow or bend the rules, but makes their own: ‘be the rule.’ This person strives to embody the relevant interests as a concern avatar. They work from the perspective of the organization, and moreover, the people inside it. IT systems should match the physical and cognitive abilities of their users. A full-scale data architect is always looking for a balance between diverse interests.

Powerpoint heroes

Most data architects are business-oriented in practice, according to Evers, but the degree to which they are can vary enormously. “There are architects who behave like real managers, but don’t know a lot about IT. They can talk a good game about (business intelligence) strategy, for example, but the risks in the IT environment are a blind spot for them. Then there are architects that understand IT risks but are completely consumed by them. In extreme cases, you may be dealing with a Powerpoint hero or an IT nerd. A good full-scale data architect is comfortable in both areas. They’re the connecting factor, the linking pin between business and IT. By focusing on quality, they serve as the de facto conscience of the organization.”

Data architecture is invisible

When designing the posters, Evers regularly refers back to examples from practical, physical construction. According to him, it’s striking how many parallels can be made to real-life architecture and works of art. But there’s also a huge difference: physical art, and all of its qualities, is visible. Whether we’re talking about the Eiffel Tower, the Sagrada Família in Barcelona, or a Leonardo da Vinci painting. IT and data architecture, on the other hand, is invisible to the end-user.

When in Rome

You can even go further back in history. The Romans left us several structures that are still standing to this day. Evers uses this link to argue for a ‘Trias Politica Data’, a variation of the well-known Trias Politica, which separates the responsibilities and powers of the legislative, executive, and judicial branches. In a direct analogy to this system, the responsibilities for registering, processing, safeguarding, and controlling data should also be strictly separated. He also distinguishes five types of data nodes, which arise from Damhof’s data quadrant model (see part 1).

Within this model, registrars, processors, controllers, collectors, and discoverers all have their own delineated tasks and responsibilities. Orchestration and integration then happen on a central level. Look no further than the current set-up of the many data chains in the corporate world, and you can see that this principle is still something of a pipe dream. The ‘data fail culture’, which colleague René Veldwijk has been calling out for years, can’t be addressed without making fundamental changes.

Evers realizes that his vision is indebted to the contemporary German mathematician and physicist Hanno Wupper, who wrote a practical book about the importance of architecture for the modern digital world. Evers often refers to this book in order to explain the basic principles of architecture in layman’s terms. The people in the boardroom still don’t speak the language of data architecture most of the time. When it comes to physical buildings made of stone, steel, and glass, Wupper says the quality can usually be judged based on three classical criteria that the Roman master builder Vitruvius once described: utilitas, firmitas, and venustas.

Utilitas

Utilitas stands for usability, purpose, usefulness, and soundness. Translated to the IT world, you could call these the basic quality principles of an IT system’s functionality.

Firmitas

Firmitas describes the build quality of a particular construction. We can call a physical building stable and robust if it doesn’t collapse under normal use. The same goes for data architecture and the hardware and software of IT systems. Software shouldn’t crash for no reason. A data architect should ideally know about programming languages, programming, software verification, and security/reliability principles. Because computer systems, unlike physical buildings, can behave dynamically, you can build them so that they protect themselves or recover when necessary, just like living creatures. That may sound like science fiction, but thanks to machine learning and algorithms, it will be an everyday thing soon enough.

Another aspect of firmitas is longevity. Some Roman structures and bathhouses are still standing strong, completely intact. Evers: “IT is much more transient, a fancy data warehouse usually has a lifespan of about five years. That’s nothing.” A full-scale data architect should discuss with their client how long a data architecture should last, and what’s economically viable. More longevity of hardware and software isn’t always better. Shorter lifespans can even be a choice. The most important thing, Evers says, is that these kinds of choices are made deliberately by the data architect and the client.

Venustas

Venustas, or beauty, is a tricky concept in IT. Wupper says that beauty in IT manifests on four levels: statistical appearance, dynamic behavior, programming languages, and mathematical structures. Anyone who’s ever held an Apple product in their hands has an intuitive idea of what beauty should mean in technology. You can see dynamic behavior in touchscreen devices, for example.

The IT world is bursting at the seams with languages. Specification languages, programming languages, database query languages (SQL for example), and languages that computers use to communicate among themselves. The latest variant is the language that digital assistants like Siri, Google Assistant, and Alexa use. Finally, all hardware and software is based on mathematical structures: conceptual models (files, programs, and folders), an abstract data structure, or a machine.

Agilitas

The Romans didn’t know much about ‘agilitas’, because Roman culture was forever. Unfortunately, the opposite is the case in IT, where agility is often more important than longevity. Evers thinks this aspect (agilitas) also has to be considered. That allows us to grab the bull by the horns again. This does have important consequences, however. Unlike regular architects, IT constructions, especially data constructions, are exposed to ever-spreading entropy. Building a very long-lasting construction isn’t a given for a data architect, and is usually actually undesirable. “Damhof’s data quadrant model and my data model matrix, on the other hand, are very sustainable. These interests will still be standing 100 years from now. That’s something you can attach your name to as a data architect”, Evers claims.

Multi-realities: the Holy Grail

“One version of the truth” is a common saying in organizations. Evers wipes the floor with it. “The average large organization is too complex for one version of the truth. There are always multiple versions of the truth circulating. Truth is determined by context. We prefer to talk about multi-realities. It’s our Holy Grail. All facts belong to a certain truth, some facts belong to multiple truths. The same facts can have multiple realities. Multi-realities make use of the same facts.”

Truth is determined by context. We prefer to talk about multi-realities.

“Multi-realities aren’t a bad thing, as long as they can be managed, including the underlying data. You have to be able to explain the different versions to supervisors. This is hard to grasp, however, and difficult to automate. There’s much ground to gain in this area, for everyone.”

Data vault

Evers isn’t just a data missionary, but also a certified data vault master. Dan Linstedt introduced the so-called data vault alongside data warehouses and data marts. With a data vault, people work using facts from the sources, instead of a desired truth, because it doesn’t exist. Every fact is stored, regardless of whether the data is correct and uncompromised, but with several extra parameters. For example, the source of the data and the date of storage is always tracked. This makes the concept suitable for compliance purposes. “BI goes away from the source, while we’re saying: you have to go to the source of the data.” But a data vault also has its own pros and cons. The focus is often on making BI products available, and not safeguarding the quality within the operational organization.

Data-proficient instead of data-driven

One thing is certain: thinking purely in terms of technical solutions like a data vault isn’t going to help organizations. That’s why Evers is completely allergic to discussions about whether data is structured, unstructured, or semi-structured, and what exactly big data (integration) or real-time data means. “It’s much more productive to discuss semantics, representation of data, privacy, and security.”

Evers also has a disdain for vague functions like data scientist or data engineer, and disciplines like analytics/BI. He thinks BI is an empty shell, and at the mere mention of ETL he scrunches his face. “Don’t bother a business manager with these things. The IT world is overflowing with hyped-up trends and buzzwords. Even relatively new trends like data-driven and information-driven working are already showing their first cracks. I prefer the term data-proficient. A company proficient with data has the data bull under control. A data-driven organization is tossed around all corners of the arena or has to run for their lives through the narrow streets of Pamplona.”

A full-scale webshop

Evers has built up an entire poster book full of matrices, intriguing images, and text that all have something to do with data architecture. It’s not a static document, but a document that he’s still continuously tinkering with, with the help of his community. Evers also started a webshop to promote full-scale data architects. “More and more members of our Full Scale-movement, our online community, asked us about t-shirts.” He gave the people what they wanted and built a webshop with a wide assortment of shirts, caps, hoodies, etc. The promotional material can even be ordered in baby sizes. The clothes are imprinted with slogans like “I love models”, or with a reproduction of the world-famous Vitruvian Man by Leonardo da Vinci.

Body of knowledge

The posters and all the promotional material are all dedicated to the body of knowledge that Evers and his like-minded community members have built up and put to use for their current and future clients. In short: the online movement of full-scale data architects is alive and well. The members regularly see each other in meetups and poster parties (!) that Evers organizes. In that sense, Evers definitely lives up to his moniker of data missionary. And as a certified data vault master, he’s part of an exclusive group: there are only fifteen people bearing this title in the Netherlands.

Conscience of the organization

The focus of a full-scale data architect is primarily on determining the purpose of the data architecture and connecting it to the organization. Then they design and judge the quality of both the existing and the future data architecture. This judgment is based on criteria like functionality, safety, agility, and reliability. The full-scale data architect doesn’t hesitate to ask critical questions, like ‘What are the desired properties? Have you decided on a particular type of solution, and if so, which one, and why?’ That’s how the full-scale data architect functions as the organization’s conscience.

Sources:

  • Architectuur voor de digitale wereld, Hanno Wupper. Nijmegen, 2012
  • Full-scale Data Architects Posterbook, 0.2 version, Martijn Evers, i-Refact. Den Bosch, 2018