Breaking down walls is an evocative image, rich in emotional content, as it calls up the idea of a free world without barriers. Like many rhetorical devices, this metaphor can be easily transported to other domains, like data, for example.
In the world of data, however, breaking down walls can be dangerous, as it exposes data to the outside. On the one hand, walls can be seen as limiting access, but on the other, walls protect the contents, as the walls of a silo protect the grain; if grain silos get knocked down, they dispose of their contents right on the ground, exposed to animals, men, and inclement weather.
Data in data silos is also protected. It is in its natural environment and the walls that protect it must not be seen as only a limitation to its access, but as a logical response to safety and protection requirements. If we focused only on the limitation, we would risk confusing a data democracy with a data anarchy: “Let’s break down the silos and give everyone access to data; let them do with it what they want!”
Democratize the Data
Clearly, this is not the objective that should be set. Instead, we should strive to make data available to those who need it, in a controlled way, where the term “control” refers to the exercise of laws that govern the data’s use, similar to the laws that guide the governance of any democracy. Data should be available to those who need it and when this need arises, without anyone having to move the data from its protected environment before it is necessary, and above all, without having to move it solely to make it visible.
We must then think of silos that not only contain data but also carry a description of their contents that is visible from the outside, a description that can be read without entering the silos or without, by symmetry, bringing the data outside of the silo, just to be able to understand the data that is in there, how it is made, and what it represents.
Such a description would enable us to understand what the silo contains, whether or not it is suitable for us, and if it is, whether it can be used as is, or whether it is necessary to combine it with what is contained in other silos, so that we can obtain data products that better meet our needs.
Basically, we must see the different silos as suppliers of the raw material, which can be used, from time to time, on the basis of the characteristics that their description indicates, and finally, taking data from them only when necessary, i.e., when the data is actually needed by consumers.
Continuing the comparison with the silos that we all know, we should then be able to track, at all times, the use that has been made of the raw material, by whom and when, exactly as it happens, for tracking systems in the agro-food field, meeting all knowledge and safety requirements.
Connect the Silos
Ultimately, we have to think of an articulated system, where the different silos perform their primary function, keeping the data safe, but they are also efficiently connected, so the data can be easily captured at the moment of need, enabling consumers to use the data as it is or use it to compose more articulated information constructs with greater expressive power.
To conclude, let’s put down the hammers and pickaxes and instead take up the tubes and connectors, because we don’t have to demolish any silos, we can just connect them; this is the spirit that animates data virtualization, and this is the spirit that animates Denodo.
Originally published at https://www.datavirtualizationblog.com on December 9, 2020.