Archiving and deleting data in Data Vault are important processes for managing storage volume and ensuring performance. Data Vault is suitable for working with historical data, so the approach to archiving and deletion should be based on strategies for preserving historical records and business requirements.
Key aspects of data archiving and deletion
-
Archiving:
- Moving outdated data to separate archival storage for long-term retention.
- Preserving data that may be needed for analysis, compliance with legal requirements, or auditing.
-
Deletion:
- Deleting data that is no longer needed for analysis or is not required to be retained for legal reasons.
- Deletion of records may be based on age criteria, business rules, or data retention policies.
-
Balancing:
- Ensuring an optimal volume of data in operational tables for high performance.
- Considering user needs for accessing historical data.
Data Vault archiving strategy
1. Archiving satellites
Satellites contain historical data and can grow rapidly. Archiving old records may include:
- Moving data older than a certain period (e.g., more than 5 years) to an archival database.
- Using partitioning to isolate outdated data for convenient processing.
Example SQL for archiving satellites:
INSERT INTO Archive_Sat_PassengerDetails
SELECT *
FROM Sat_PassengerDetails
WHERE LoadDate < DATEADD(YEAR, -5, GETDATE());
DELETE FROM Sat_PassengerDetails
WHERE LoadDate < DATEADD(YEAR, -5, GETDATE());
2. Archiving links
Links contain relationships between objects that rarely change. Old records can be archived along with their corresponding satellites.
Example SQL for archiving links:
INSERT INTO Archive_Link_PassengerFlights
SELECT *
FROM Link_PassengerFlights
WHERE FlightDate < DATEADD(YEAR, -5, GETDATE());
DELETE FROM Link_PassengerFlights
WHERE FlightDate < DATEADD(YEAR, -5, GETDATE());
Data deletion strategy
1. Deleting data at the satellite level
When deleting data from satellites, ensure that it is not needed for auditing or reporting.
Example SQL for deleting outdated data:
DELETE FROM Sat_PassengerDetails
WHERE LoadDate < DATEADD(YEAR, -10, GETDATE());
2. Deleting records related to retention policies
Some records (e.g., users' personal data) must be deleted after a certain period or upon request to comply with legislation (e.g., GDPR - for the EU, CCPA - for the USA, 152-FZ - for the Russian Federation).
Example SQL for policy-based data deletion:
DELETE FROM Sat_PassengerDetails
WHERE PassengerID IN (
SELECT PassengerID
FROM DeletionRequests
);
Practical example: Archiving and deletion in Data Vault
Scenario: Archiving passenger data older than 5 years
-
Archiving:
- Creating archive tables for satellites and links.
- Moving data with LoadDate older than 5 years.
-
Deletion:
- Deleting personal data from satellites after 10 years of storage.
- Deleting references to outdated records in links.
SQL script for automation:
-- Archiving outdated data
INSERT INTO Archive_Sat_PassengerDetails
SELECT *
FROM Sat_PassengerDetails
WHERE LoadDate < DATEADD(YEAR, -5, GETDATE());
DELETE FROM Sat_PassengerDetails
WHERE LoadDate < DATEADD(YEAR, -5, GETDATE());
-- Deleting data older than 10 years
DELETE FROM Sat_PassengerDetails
WHERE LoadDate < DATEADD(YEAR, -10, GETDATE());
Conclusion
Archiving and deleting data are important processes for managing data in Data Vault. They help maintain storage performance, reduce storage costs, and comply with legal requirements. Combining automation with a well-planned strategy allows for effective data volume management and ensures access to necessary historical data.