1.1 General Introduction
Aim of the Data Context Hub
The Data Context Hub is a platform for building data applications that integrate data from multiple sources into an explorable knowledge graph. This makes the relationships and knowledge in the data visible and usable. This Documentation is valid for the Version 2.0 of the Data Context Hub .
The Data Context Hub acts as a combination of a single point of access and a causal, content-based integration layer. Its potential grows with the amount of integrated data and the need for dynamically changing views.
The goal of Data Context Hub is to provide contextual information wherever it is needed. Data and relationships can be explored from any perspective and in relation to any specific question, without preparation. Analysing possible causal chains in data helps to understand complex relationships or system behaviour, or to investigate solutions to challenging engineering problems. The Data Context Hub provides data consistency and traceability.
The Data Context Hub can extract data from a variety of sources, including relational databases, flat files, JSON or SOAP endpoints, XML data and many others. In addition to extraction, the Data Context Hub also provides the ability to receive data and integrate it directly into the graph with low latency.
In addition to its ETL capabilities, the Data Context Hub also includes a set of exploration and traversal tools, including the ability to visualise attached file artefacts.
Overall, Data Context Hub is a powerful and flexible tool for performing ETL operations and managing data integration workflows for building and navigating knowledge-graph. It can help organizations to move and transform data from various sources into a format that can be analysed and used by other applications, or be an essential part of data analytics.
ETL for Graphs
The ETL (Extract, Transform, Load) process for graph databases is a method for moving data from various sources, such as relational databases or flat files, into a graph database.
Extract: The first step in the ETL process is to extract the data from its original source. This can be done by using SQL queries to retrieve data from a relational database, or by reading data from flat files.
Transform: The extracted data is then transformed to fit the format of the target graph database. This step involves cleaning, normalising, and mapping the data to the appropriate nodes and edges in the graph.
Load: The final step is to load the transformed data into the graph database. This is typically done using a bulk loading process, where the data is written to the database in large chunks.
It is worth noting that the ETL process for graph databases may differ from traditional relational databases, as it involves mapping the data to nodes and edges, rather than tables and columns.
The advantage of graph databases
Graph databases are advantageous over relational databases in several ways. One of the main advantages is their ability to handle highly connected data. Unlike relational databases, which use tables and foreign keys to define relationships between data, graph databases use nodes and edges to represent entities and their relationships. This allows them to more effectively model and query data with complex relationships, such as impact chains in engineering data or social networks or recommendation systems.
Another advantage of graph databases is their scalability. They are designed to handle large amounts of data and can easily scale to accommodate growth. They also have a more flexible data model, which allows for the easy addition or removal of entities and relationships.
Graph databases also have better performance for traversing relationships in the data. Relational databases often use complex join operations to retrieve related data, which can be time-consuming and resource intensive. Graph databases, on the other hand, use pointers to directly access related data, which makes querying and traversing relationships much faster.
Finally, graph databases also support index-free adjacency, which means that every node stores its connections to other nodes directly, enabling faster traversals. This makes graph databases particularly well suited for use cases as dependency models, impact chains, fraud detection, recommendation systems and many other use cases where data relationships are crucial.