Data Vault Architecture is a modern approach to data warehouse design that focuses on flexibility, scalability, and long-term reliability. It includes both a logical and a physical model, clearly separates structural components, and offers advantages over traditional approaches such as star and snowflake schemas.
Logical and Physical Model of Data Vault
- Logical Model:
- Goal: Creating a resilient data structure capable of adapting to changes in business requirements or data sources.
- Key components:
- Hubs: Store unique business keys representing key business entities, such as airlines, flights, or passengers.
- Links: Model relationships between hubs, such as "a passenger takes a flight" or "a flight belongs to an airline."
- Satellites: Store contextual information and attributes such as passenger name, flight date, or delay reason.
- Physical Model:
- Implemented in relational databases. Each entity (hub, link, or satellite) is represented as a separate table.
- Surrogate keys are used for relationships between components, ensuring performance and consistency.
Comparison with Traditional Approaches (Star and Snowflake)
Characteristic | Star Model | Snowflake | Data Vault |
---|---|---|---|
Flexibility | Low: difficult to adapt to changes. | Low: complexity increases with granularity. | High: easy to add new data. |
History Support | Limited. | Limited. | Full: history is recorded in satellites. |
Scalability | Moderate. | Low: schema complexity increases. | High: structure is easily scalable. |
Normalization | Denormalized. | Normalized. | Partial: keys and relationships are normalized. |
Performance | High for reporting. | Moderate. | Depends on implementation but can be optimized. |
Unlike the star and snowflake schemas, Data Vault is optimal for changes and auditing. It allows reliable tracking of where data originates from and how it has been modified.
Component Roles: Hubs, Links, Satellites
- Hubs:
- Represent centralized data points describing key entities.
- Store unique business keys that are independent of data sources.
- Example: The Hub_Airline table contains unique airline identifiers.
- Links:
- Provide relationships between hubs.
- Reflect business processes or interactions between entities.
- Example: The Link_PassengerFlights table links passengers to flights.
- Satellites:
- Store attributes and data context, such as flight descriptions or delay metrics.
- Can be divided based on data sources, attribute types, or update requirements.
- Example: The Sat_DelayDetails table stores reasons and durations of flight delays.
Architecture Example
Consider a system for tracking flights and their delays.
- Hub: The
Hub_Flights
table contains unique flight identifiers. - Link: The
Link_PassengerFlights
table links flights to passengers. - Satellites: The
Sat_DelayDetails
andSat_PassengerInfo
tables store delay reasons and passenger information, respectively.
Conclusion
The Data Vault architecture ensures modularity and data transparency, making it suitable for complex and evolving business environments. Its logical and physical model, based on hubs, links, and satellites, enables the creation of scalable data warehouses that preserve data for analytics, auditing, and historical analysis.