Holistic data management and taming bulls
Not everyone gets excited about the prospect of discussing information entropy, shadow IT and technical debt. But for Martijn Evers, it’s all in a day’s work. We had an animated discussion about holistic data management and the art of taming bulls. Together with Ronald Damhof, Martijn Evers, co-founder of i-Refact, started an online movement dedicated to perhaps the ultimate job of the future: full-scale data architect. It’s a job that has to suit you. “Usually, you’re born a data architect”, the self-appointed data missionary says. In other words: abstract thinking has to be in your genes. That’s why organizations usually call on people with real passion to fill this key role.
Poster mania
“I still have some in the trunk of my car”, Evers whispers conspiratorially. The scene takes place in a corner of the lounge in a stately modern office in an industrial area in Utrecht, the Netherlands. At first glance, it may look like a scene from a new crime show, but nothing could be further from the truth. Evers is referring to his, in his own words, “poster mania”. Like a true poster enthusiast, he cherishes his collection of over 25 different, enormous (A0) posters, some of which he always takes to appointments. And there are many more posters in the pipeline.
Psychedelic atmosphere
The posters the certified data vault master is referring to aren’t just eye-catching thanks to their original ideas, the evocative imagery, and the daring analogies, but especially because of their eclectic color palette. They exude the psychedelic atmosphere of seventies album covers. “Yeah, some of them are real works of art”, Evers beams with a sparkle in his eyes.
But it quickly becomes clear that he’s serious. He uses the posters to show his clients at a glance where the problems of the organization are, and where the so-called gridlocks can happen. They’re especially useful to illustrate which major concerns are at stake and which solutions are available when you want to take data, data management, and data architectures seriously. “Showing a poster or drawing a matrix often unleashes a lively discussion.”
Making the big bucks
Speaking of discussions: “data is the new oil“, as prominent speakers say at congresses. In the same breath they talk about the big promise of our time: big data. ‘Big data is big business.’ These claims usually find their way into the boardroom. They echo in the halls of the upper floors of large corporations. Before you know it, they lead a life of their own, unfortunately without the relevant context. According to Evers, these one-liners usually come from people who don’t know much about data architecture.
Like Newton’s gravitational law, there’s such a thing as data gravity.
Making big bucks with data is, so far, the sole domain of the Amazons, Facebooks, and Googles of the world. The rest of the world is mostly struggling with the abundance of data they’re flooded with. Like Newton’s gravitational law, Evers believes there’s such a thing as “data gravity.” Data has the tendency to cluster and attract new applications with new data. You can make this data gravity work in your favor, like the big data giants do. For most organizations, however, data gravity is more of an impediment, for example when migrating data between clouds and local data centers.
Data as a nuclear option
“Data is life-threatening. It can destroy organizations, companies, countries, and complete societies”, according to one of Evers’ bolder statements. He goes so far as to compare the impact of data with the threat of nuclear weapons. “Influence of countries (see ‘Cambridge Analytica’) has become a genuine problem. And companies are literally going under as a result of bad data management.
Data is life-threatening and can destroy organizations.
Only governmental institutions or independent administrative bodies have an escape hatch because they can basically live off the infinite funding of a government, who will usually bail them out. Even when they sometimes make consciously wrong decisions with regards to data facilities, data architecture, and data management.” And that happens fairly regularly, Evers says in an understated way. You can read about such cases in the newspaper on a near-daily basis.
Escalation
“When there are major IT problems in governmental institutions, the state secretary or minister should be involved much earlier. There are so many layers of management between them now that people in the government are informed about risks and financial consequences much too late. When a minister gets a call it’s usually already far too late. The millions have already evaporated, the directors in question have left or been transferred, and the first reorganization is already behind us. A thick layer of dust has already gathered on the proceedings. Governments often lack direction. Substantial problems with IT have to be solved at the highest possible level of an organization. I think ministers should be sent packing when the data quality in their ministry isn’t up to snuff. When algorithms start making decisions, the risks increase, and so too does the ministerial responsibility.” Evers doesn’t beat around the bush.
Technical debt
Evers also has a clear-cut opinion about technical debt, an IT term that originated in software development. “The moment a developer so much as looks at their keyboard, there is technical debt.” Technical debt accrues when developers knowingly or unknowingly choose the cheaper short-term solution, which costs more in the long term. Every system is basically technical debt in Evers’ opinion, something you’d have been better off without.
Controlling data chains
Instead of clinging to old, often outdated technology, organizations should invest their time and money in what really matters: managing data chains. Evers is referring to a sustainable, safe solution for processing large volumes of data for a large amount of users. The quality of the processes and reliability of data delivery is key to this. “The business should think much more deeply about data and data organization instead of being lax and short-sighted and thinking ‘the data is in a data warehouse or data lake, so we’re good.'” The business owns the data, not the IT department. If it goes wrong, the business pays the price. In order to get the data in order, a real transition is often necessary.
Organizations should take control of the data chains, preferably at the start of the chain rather than the end, as is often the case with data warehouses and data lakes. At that point the situation is often already hopeless and data and information quality suffers as a result. Ronald Damhof, Martijn’s partner in crime, says: “A data warehouse, data lake, or any kind of data fort is an expression of technical debt that wasn’t handled earlier in the data chain.” Evers rejoins: “Data has to be organized in such a way that everyone in the chain can derive value from the available data. We focus on the organizational side, on the data and chain problems. The technological solutions, whether it’s a BI tool or a data vault, are always subservient. We don’t care about technology at all, we’re opposed to ‘technological intimacy’ as far as that goes.”
Chain entropy
Data chains have several nasty characteristics, though: they almost always form spontaneously (emergence), because everyone wants to have each other’s data, and copying data is child’s play. It takes no time to make an Excel spreadsheet. Like an invisible spider web, the data starts spreading through the organization. The complexity keeps increasing, and at a certain point, the whole thing comes to a halt. Logistical panic strikes and the chaos is complete. Entropic death seems inevitable. Entropy is an abstract concept from thermodynamics. Fundamentally, it’s a way to measure disorder or degeneracy in a system.
Entropic death seems inevitable.
Organizations also struggle with entropy their whole lives. But you can’t see it or smell it. As a full-scale data architect, all you can do is warn. That’s why Evers developed a labeled warning system from A to G, just like energy labels on electrical devices, for information products. “Organizations are still stuck thinking in IT principles and functions. They have to pay more attention to fighting chaos and learn to think in separation of power, functions, and rewards (the concerns). They also have to honor the principles of the Trias Politica Data (see part 2 for an explanation). If they don’t, then entropy in the chain will gain the upper hand. Unfortunately, IT and the directors often don’t see the problem, and once they do it’s usually too late.”
The tension between data valorization and data organization
The right full-scale data architect can bridge the gap between the business and IT and is worth their weight in gold, according to Evers. They safeguard the balance between, for example, data valorization and data organization. “Compare it to Yin and Yang: opposing forces that balance all aspects of life. Many companies and institutions are very unbalanced. Some data is still stuck in Excel, or in dated source systems, whereas other data is in the cloud and can be accessed with a cool app.”
Mitigating risks
Although Evers is a real motormouth, he chooses his words carefully. He’s given his core message a lot of thought. For example, he prefers the term data valorization over the more common data monetization, because he believes there are so many more ways to derive value from data than simple monetary gain. “If, by using solid data management, you can mitigate risks and comply with regulations and rules, you also create value, albeit not with a dollar sign attached to it. Preventing data problems also represents value creation. Data can’t always be cashed out, though. An insurance provider can’t just sell their client data on the market. Using data to detect fraud does, however, represent a great internal value for that insurer. Algorithms are usually the next step in data valorization.”
Logical data model
Is data an asset or a liability? Or does it offer a strategic advantage? However you look at it, good data management is only becoming more important, according to Evers. “I see a lot of companies wrestle with choices. What should we primarily focus our attention on? Many initiatives are either technology-driven, or so pragmatic that it’s hard to judge how future-proof or reusable they are. Often, a clever data architecture is completely missing, and the realization of an underlying data management platform is a long and costly process without tangible value. That’s why, in 2014, I started i-Refact with two partners. The name is a portmanteau of the words ‘refactoring’ and ‘fact’, to indicate that data valorization is about reorganizing the facts.”
“There are many ways to organize data well, for example ‘fact or communication’-based models, but we want to single out the logical data model. On the one hand it’s a crucial link between the abstract, more meaning-driven data (information) organization and data governance. On the other hand it connects the more concrete implementation with a data management system. Without this link, you’re missing control over the implementation. Aside from that it also nicely decouples the implementation activities from activities in the area of data/information definitions. This has the fancy name ‘Data Definition and Implementation Architecture’ and is really only known for the fact that it’s never named. My Data Model Matrix may be one of the first attempts to put this on the agenda of managers and architects in a structural way.
Grabbing the bull by the horns
To Martijn Evers, data represents a wild bull, based on the data quadrant model by Roland Damhof (see below). Because data architects often have to negotiate between two opposed interests: quality versus flexibility. Evers: “You can see these two terms as the horns of the bull.”
Quality versus flexibility
Quality stands for a systematic way of thinking and developing. It’s primarily about consistency, validity, integrity, conformity, relevance, transparency, and the ability to verify and control the data. Flexibility stands for an opportunistic way of thinking and developing. Terms like flexibility, agility, variety, accessibility, responsiveness, and availability of data are key here. The trick is to maneuver these two figurative horns of the bull and steer it so that no accidents happen.
Anyone who knows anything about bullfighting knows that pushing the horns, no matter how hard, will probably not have any effect. Letting go isn’t an option either; you’ll get killed. “You have to use your power carefully and divide it by putting pressure on one horn while keeping an eye on the other.” In data terms, that means even if you want maximum flexibility, you still need to maintain a minimum level of quality. And vice versa.
Four quadrants
By visualizing the opportunistic and systematic methods of developing along the y-axis and push-and-pull factors on the x-axis, we get a matrix with four quadrants.
Figure 1: The data quadrant model
- The first quadrant represents strongly standardized processes and systems that register and manage data. The predictability and repeatability are big factors in this quadrant. Hard facts are central.
- Quadrant two prioritizes qualitative information, context, and truths.
- Quadrant three is dedicated to registering data with limited standardization and management, such as one-time ad-hoc data or personal data collections. A phenomenon like shadow IT can flourish here. Governance is very limited.
- Quadrant four, finally, offers room for research, innovation, prototyping, and designing.
That’s it for the theory. In part two of this article we’ll discuss the various archetypes of data architects and the Trias Politica Data.
Sources:
- Architectuur voor de digitale wereld, Hanno Wupper. Nijmegen, 2012
- Full-scale Data Architects Posterbook, 0.2 version, Martijn Evers, i-Refact. Den Bosch, 2018