Data Researchers
Prof. Dr. Boris Otto 
is head of the Fraunhofer Institute 
for Software and Systems Engineering 
and Professor for Industrial Information Management 
at the Technical University of Dortmund.
At the Fraunhofer ISST he and his colleagues 
are researching digital ecosystems 
for logistics, 
data management
and health care.
In his Data Researchers Vlog, 
Prof. Otto highlights current digitization topics.
Today, Data Researchers focuses on how to build data spaces.
Today I would like to talk about a very current topic, 
namely the question of how to actually build data spaces. 
This is very important because the European Commission has stipulated
in its data strategy that we want to build 
domain-specific data spaces in some domains
in some sectors,
so that we can use the innovation potential of data
for the prosperity of the European economy 
and ultimately for the benefit of society. 
But of course this raises the question: 
what are data spaces and how do I get there that I have one?
That is why I would like to address three points today: 
First, I would like to discuss what data spaces actually are. 
Secondly, I would like to explain by means of an example 
to what extent data spaces can be 
classified in basic architectural models.
And thirdly, I would like to share
some observations regarding the question of what 
are actually the success factors that make such data spaces work. 
Let's start with the first point:
What are Data Spaces? 
A data space is ultimately a data integration concept. 
The first papers were published about fifteen years ago 
and this concept is essentially characterized by four features:
Firstly, data spaces do not aim at a physical integration of data, 
but essentially leave the data where it arises and is managed.
In other words, you end up with a distributed data architecture,
and that very much satisfies the requirements that we see, 
for example, in GAIA-X or the IDS Association, 
namely to build a distributed data infrastructure. 
That is the first point. 
The second point is that we do 
not envisage having a common database schema, 
so we are not trying to achieve schema integration, 
but the data space concept as a data integration concept envisages 
that integration takes place at the semantic level, 
so it is very important that we have vocabularies 
to ensure semantic interoperability between
data from distributed databases in the first place. 
The third point is the property of data spaces 
that we don't necessarily have a single source of truth,
which is derived from the fact that data spaces 
are very distributed architectures,
but it also means that we can have redundancies. 
So it is possible that we have certain data multiple times, 
just distributed, and it is not absolutely necessary 
that we create consistency and a single source of truth. 
And the fourth point, and this is something that is quite important 
also for the economic opportunity inherent in data spaces, 
is the fact that data spaces can be nested within themselves. 
You can imagine that we can form nested spaces of data spaces,
which can be overlapping and not disjoint, as said before, 
that is very important. 
So these four criteria are actually what characterizes 
a data space according to pure doctrine 
as I said, the first papers were published about 15 years ago
I have already mentioned the International Data Spaces Initiative, 
which has adopted this concept on the one hand, but on the other hand,
because we are striving to set a standard for data sovereignty, 
has added at least two more criteria to the basic concept 
of the four criteria that I described:
One is the concept of data sovereignty and also data traceability. 
As a data provider, I want to be able to ensure
that I can determine who is allowed to do what with 
my data in such a data space
and who is not and under what conditions,
and I also want to have a certain transparency about 
what actually happened 
to my data when I share it, so I also want to have a certain traceability. 
The second point is very much in the view of the data receiver: 
I want to be able to trust that the person who claims to be an actor
in such a data space or an ecosystem 
based on it really is who he claims to be. 
So we need a certain trust anchor, and the IDS initiative 
provides for this in that we have to go
this in that we have to go through a certification process 
if we want to offer IDS-compliant software components, 
which ultimately results in us receiving a digital certificate, a token, 
which documents this trust anchor, so to speak. 
So this is ultimately the basic idea of Data Spaces
what does this mean now for application scenarios
in different domains, 
what does it mean now in the context of the things 
required by the EU data strategy?
Let's perhaps take an example from the field of mobility: 
What we actually want to achieve is not data integration,
but rather new, innovative mobility services, 
such as intermodal mobility,  
which covers the entire mobility chain from A to Z
in an integrated manner for me as a traveler, as a mobile citizen,
and which provides me with ideal support 
and coordinates the various modes 
of transport and mobility providers. 
Another example are intelligent traffic control concepts. 
To make these innovative services possible,
we need data from different players. 
This is something very central to such ecosystems, 
as I have just explained using the example of mobility, 
where no single actor has all the data it needs 
to offer these innovative services. 
And that's where, if we now go down one level 
in an architectural model,
data spaces come into play, because they offer a space in which 
this data can be made available for innovative services,
not in such a way that the data provider loses control 
and overview of what happens to his data and where he puts it, 
but rather while maintaining data sovereignty.
It can therefore be said that the data integration concept 
of the Data Space
forms a shared digital twin for a certain domain, 
and gives all players who follow the rules of the game 
the opportunity to share their data, but also to use it jointly.
If we now go down one level further, i.e. 
to the third level below, it is clear 
that we can use this data, which we can use interoperably
on the second level in the sense of data services, 
that we also need a software architecture for this 
that enables us to provide, 
receive, transform and make this data usable at all.
And in this respect, the data space on the second level is supported 
by a distributed architecture of software components. 
The IDS initiative suggested a standard with 
the reference architecture model 
for so a distributed digital infrastructure, 
which consists of different components.
On the one hand, this is a gateway, 
a connector component that enables me 
as a participant in such a data space, i.e. 
as a fellow player in such a data space, 
to share data while maintaining data sovereignty 
and to receive it on the other hand. 
But it's also clear that when we talk about a distributed architecture, 
we need some common services - I'm going to say this 
in a somewhat untechnical way 
in the middle to ensure that these distributed 
endpoints can find each other and interact with each other at all. 
To do this, we need brokerage services, for example, 
that reconcile the supply of data with the demand for data,
and we also need certain clearinghouse services 
that ensure that you do not read the data itself,
but that monitor whether data transactions have been 
successfully exe-cuted, to give two examples.
So in this respect, the question of how to build 
a data space is also somewhat different from a classic, 
I would say one-off IT project, where you work 
with specifications and performance specifications 
and then develop a solution over a very defined period of time
and then implement it.
So if things go well, we are never finished, 
because this data room is constantly evolving.
And I think that's a very important point to understand
when you talk about building a data space, creating a data space. 
What are the success factors now? 
As I said, the considerations result strongly from observations, 
so they still lack a clear empirical basis,
but I still want to share them because I think it is important to work 
together on these things now. 
The first point is the incentive system:  
Why should I, as someone who has data, participate in a Data Space? 
And here we have to keep in mind how such 
ecosystems work in the first place. 
They only work if everyone brings something to the party.
If no one contributes anything, no data space will result.
That means we have to manage to develop 
incentive systems that make it easy 
for data owners/data owners to share their data
and in the end clearly show what benefits they get from it.
Here are two ideas: It has been shown
 that the simple quid pro quo principle,
i.e. "I share data and get data in return, I am only allowed 
to participate in a data space if I also enter data myself", 
that such simple mechanisms greatly increase the willingness 
to share data in a data space. 
The second example on the topic of 
incentive systems is very much related 
to how I ultimately understand my involvement in a data space. 
If it is purely a cost factor for me, you can do it that way, 
but it will always be a hindrance to find arguments that justify
an engagement in a data space based on costs.
We are working on approaches via tokens,
digital certificates, rather to develop a model where you 
become part of an overall value. 
In other words, everyone who participates in a data space is, 
so to speak, a shareholder or a shareholder of cooperative shares, 
if you like.
So I have a piece of an overall value, and this value manifests itself 
very, very strongly in the trust infrastructure 
and in the data sovereignty that is shared 
between everyone who participates. 
So the first point is incentive systems.
The second point concerns financing.  
As I said, the structure of a data space is 
very different from a typical one-off project. 
On the other hand, we have the interests of individual players, 
i.e. a microeconomic view, and on the other hand, 
of course, we also have the public interest, 
the interest of us as a society or even the EU internal market. 
And in this respect, we have to balance these different interests,
i.e. the individual interests and the common good interest,
both in the construction of the data spaces.
I think this is very well explained in the 
key points of the German data strategy.
But of course this also means that we have to think about it: 
When we ask about the financing, about the funding, 
of such a data room, we probably have to bear 
in mind this duality of interests. 
This is why I am personally convinced that we will find a very high 
degree of mixed financing in financing models, 
which is fed by public financing on the one hand 
and private financing on the other.
This is not so far-fetched either,
because many infrastructures that we are used 
to using are financed in a similar way. 
So we are doing this in the interest of the common good 
and everyone has to do their bit. 
With regard to the EU's data strategy, this certainly applies 
 to the interaction between the European Commission 
and the member states on the one hand, 
and between the private and public sectors on the other. 
So, business and politics will join forces to 
create these data spaces in Europe.
On the other hand, we also have to keep in mind 
that we have very cooperative operator models 
or organizational models to put Data Spaces into practice.
And here it is certainly also important to consider that the technology,
i.e. both the digital infrastructure and the Data Spaces 
are a necessary condition for success, 
but probably not sufficient.
After all, ecosystems function according to the laws 
of the platform economy via network defects. 
In other words, we have to manage to achieve a critical mass 
and promote the use of the International Data Spaces 
initiatives as much as possible. 
And we will probably succeed in doing this by making 
the entry barriers as low as possible. 
In other words, we need to increase data readiness, 
especially for smaller companies, or perhaps 
consider sponsoring in order to obtain certification.
On the other hand, we can also consider that 
where the public sector is a user,
because it makes many processes itself data-based, 
it really does use data space concepts 
in its own interest, that the use of such standards as 
those we are developing at GAIA-X 
and IDS is also made mandatory in public tenders, 
and that we, as I said, 
provide knowledge transfer for understanding 
data spaces as a data integration concept,
but also with regard to the question of how 
we can get them implemented.
I am very glad that we as Fraunhofer are on our way
 in the eye of the storm, 
and I am confident that we can make a good contribution
 to this question of how to build Data Spaces. 
Thank you very much.
