Hello! Let's figure out what Links are in Data Vault and why they are needed. If hubs are responsible for storing unique entities, then links connect these entities, creating a complete picture of the data.
What is a Link?
Link is a table that records relationships between two or more hubs. Each link describes the connection between business entities, maintaining a history of changes and ensuring model scalability.
Basic Principles of Links
- Connection through hubs:
The link does not contain business keys directly. Instead, it refers to surrogate keys from the corresponding hubs. - Multiple connections:
A link can connect two or more hubs, allowing complex relationships between entities to be modeled. - Historicity:
The link stores information about when and from where the connection was loaded. - No attributes:
Links contain only hub keys and metadata. All connection attributes (such as type or status) are moved to satellites.
Example: Link "Passengers and Flights"
Imagine we are building a data warehouse for an airline. We need to store information about which passengers flew on which flights.
We have two hubs:
- Hub_Passengers: A hub with unique passenger identifiers.
- Hub_Flights: A hub with unique flight identifiers.
Example Hub_Passengers
Hub_Passengers is a table for storing unique passenger identifiers:
Surrogate Key | Business Key | Load Date | Source System |
---|---|---|---|
1 | 1234567890 | 2025-01-03 | BookingSystem |
2 | CUST001 | 2025-01-03 | CRMSystem |
- Surrogate Key: Unique passenger key in the hub.
- Business Key: Unique passenger identifier from the data source.
- Load Date: Record load date.
- Source System: The source system from which the key was obtained.
To link these two entities, a link is created:
Structure Link_PassengerFlights
Link Key | Hub Passenger Key | Hub Flight Key | Load Date | Source System |
---|---|---|---|---|
1 | 1 | 101 | 2025-01-03 | BookingSystem |
2 | 2 | 102 | 2025-01-03 | CRMSystem |
3 | 1 | 103 | 2025-01-04 | FlightSystem |
-
Link Key: Unique key for the record in the link.
-
Hub Passenger Key: Reference to the hub with passengers.
-
Hub Flight Key: Reference to the hub with flights.
-
Load Date: Date of the record load.
-
Source System: Data source.
-
1st link: Passenger with key
1
(1234567890) from BookingSystem flew on flightFLIGHT123
(101). -
2nd link: Passenger with key
2
(CUST001) from CRMSystem flew on flightFLIGHT456
(102). -
3rd link: The same passenger with key
1
flew on another flightFLIGHT789
(103).
Conclusion
These examples demonstrate how hubs and links interact:
- Hubs focus on unique business keys.
- Links capture relationships between entities from the hubs, creating a connected data model.