Hello! Let's figure out what Links are in Data Vault and why they are needed. If hubs are responsible for storing unique entities, then links connect these entities, creating a complete picture of the data.


What is a Link?

Link is a table that records relationships between two or more hubs. Each link describes the connection between business entities, maintaining a history of changes and ensuring model scalability.


Basic Principles of Links

  1. Connection through hubs:
    The link does not contain business keys directly. Instead, it refers to surrogate keys from the corresponding hubs.
  2. Multiple connections:
    A link can connect two or more hubs, allowing complex relationships between entities to be modeled.
  3. Historicity:
    The link stores information about when and from where the connection was loaded.
  4. No attributes:
    Links contain only hub keys and metadata. All connection attributes (such as type or status) are moved to satellites.

Example: Link "Passengers and Flights"

Imagine we are building a data warehouse for an airline. We need to store information about which passengers flew on which flights.

We have two hubs:

  • Hub_Passengers: A hub with unique passenger identifiers.
  • Hub_Flights: A hub with unique flight identifiers.

Example Hub_Passengers

Hub_Passengers is a table for storing unique passenger identifiers:

Surrogate Key Business Key Load Date Source System
1 1234567890 2025-01-03 BookingSystem
2 CUST001 2025-01-03 CRMSystem
  • Surrogate Key: Unique passenger key in the hub.
  • Business Key: Unique passenger identifier from the data source.
  • Load Date: Record load date.
  • Source System: The source system from which the key was obtained.

To link these two entities, a link is created:

Structure Link_PassengerFlights

Link Key Hub Passenger Key Hub Flight Key Load Date Source System
1 1 101 2025-01-03 BookingSystem
2 2 102 2025-01-03 CRMSystem
3 1 103 2025-01-04 FlightSystem
  • Link Key: Unique key for the record in the link.

  • Hub Passenger Key: Reference to the hub with passengers.

  • Hub Flight Key: Reference to the hub with flights.

  • Load Date: Date of the record load.

  • Source System: Data source.

  • 1st link: Passenger with key 1 (1234567890) from BookingSystem flew on flight FLIGHT123 (101).

  • 2nd link: Passenger with key 2 (CUST001) from CRMSystem flew on flight FLIGHT456 (102).

  • 3rd link: The same passenger with key 1 flew on another flight FLIGHT789 (103).

Conclusion

These examples demonstrate how hubs and links interact:

  • Hubs focus on unique business keys.
  • Links capture relationships between entities from the hubs, creating a connected data model.