Data Liquidity and Systems Interoperability
How to automate alignment and management of complex heterogenous data and systems?
Data integration and analytics is a bottleneck for solving our greatest challenges from doing science and creating general artificial intelligence, to everything in between. The demand for integrated data is indicated by the number of startups that focus on nothing more than collecting lists of well-aligned data-sets of interest and monetizing specialized queries. Well-aligned quality datasets is the gold-mine for endeavors involving inherently heterogeneous data, such as for drug discovery, complex designs, sociological research, and so on. Presence of multitude of data formats and standards makes any simple question, such as "get me a list of all world's dogs" - an insurmountable quest for yet another startup focusing on that specific domain. The existing solutions, such as linked ontology-aware data formats are insufficiently flexible and rich to be convenient for defining records with multi-vocabulary fields from arbitrary ad-hoc vocabularies, and lack support for definitions of value types, callable object interfaces and modification permissions, enabling objects to retain properties even after decoupling from the data management systems that originate them.
Current widely known solutions (such as Linked Data), are not entirely well suited for the problem, as they require large amounts of data to be serialized in the same format, that never is the case in the ever diversifying world, and there is no standard way to embed schemas, permissions and other context data to data items, necessary to make them reusable in queries.
Combining the RDF-based SPARQL (for alignment) with OAUTH2 (for permissioning) and some and a standard to securely encrypt data about query origin context (such as query origin identity keys, cookies, IP addresses, and definitions of schema versions of resources, where data came from) it may be possible to approach the desired data properties of retaining the ability to reuse data items as objects in the context of arbitrary programming languages, without the need to write custom integrations. However, this seem to have not been done, and there may be better solutions to address the problem.
For example, due to the diversity and complexity of systems on the web (protocols and formats), there may be other (better?) ways to approach the problem, based on plug-and-play philosophy for devices using drivers, allowing to abstract away web resource APIs, and have fully-featured polymorphic interactive data as a shared feature of all programming languages, treating websites and web systems (including decentralized ones) as operating system devices directly available as variables to programming languages.
Regardless of the choice or way of implementation, the data liquidity and systems interoperability seem to remain an important unsolved problem and bottleneck for faster progress in large number of domains of digital activity.