An analytical data mart (Data Mart) is a data representation focused on specific business processes or tasks. The data mart is created based on data contained in the Raw Vault and Business Vault, and serves as a source for BI tools such as Power BI, Tableau, or Excel.

In this section, we will explore how to design and create a data mart based on data from the Business Vault, which will provide analysts with convenient access to aggregated and ready-to-use data.


Key stages of building an analytical data mart

  1. Defining the business task: We will create a data mart for analyzing flight delays by airlines, months, and delay reasons. This information will help identify trends and key issues affecting flight punctuality.

  2. Designing the data mart:

    • Key metrics are selected (e.g., total number of delays, delay reasons).
    • Dimensions are defined (airlines, months, years).
  3. Creating the data mart structure: The data mart is designed as one or more tables in the Business Vault database, which aggregate data from the Raw Vault.

  4. Loading data into the data mart: An SQL script is implemented to transfer and aggregate data from Raw Vault and Business Vault tables.


Designing the data mart

For the data mart, we will create the Fact_DelayAnalysis table, which will contain key metrics such as the total number of delays and the distribution of delay reasons. We will also create dimension tables (Dim_Airline and Dim_Date) for airlines and time.


Creating the structure of the analytical data mart

SQL script for creating data mart tables:

-- Create the fact table 
CREATE TABLE Business_Vault.Fact.DelayAnalysis (
    FactID INT IDENTITY(1,1) PRIMARY KEY,
    Airline_SurrogateKey UNIQUEIDENTIFIER,
    DateKey INT,
    TotalDelays INT,
    CarrierDelays INT,
    WeatherDelays INT,
    NASDelays INT,
    SecurityDelays INT,
    LateAircraftDelays INT
);
-- Dimension table for airlines
CREATE TABLE Business_Vault.Dim.Airline (
    Airline_SurrogateKey UNIQUEIDENTIFIER PRIMARY KEY,
    AirlineName NVARCHAR(255)
);

-- Dimension table for dates
CREATE TABLE Business_Vault.Dim.Date (
DateKey INT PRIMARY KEY,
Year INT,
Month INT,
Day INT,
MonthName NVARCHAR(50)
);

Loading data into the data mart

SQL script for populating data mart tables:

  • Populating dimensions:
-- Load data into Dim_Airline 
INSERT INTO Business_Vault.Dim.Airline (Airline_SurrogateKey, AirlineName)
SELECT DISTINCT h.Airline_SurrogateKey, s.Airline_Name
FROM Raw_Vault.Hub.Hub_Airline h
JOIN Raw_Vault.Sat.Sat_AirlineDetails s ON h.Airline_SurrogateKey = s.Airline_SurrogateKey;

-- Load data into Dim_Date
INSERT INTO Business_Vault.Dim.Date (DateKey, Year, Month, Day, MonthName)
SELECT DISTINCT
    YEAR(s.Year * 100 + s.Month) AS DateKey,
    s.Year,
    s.Month,
    1 AS Day, -- For monthly aggregates Day = 1
    DATENAME(MONTH, DATEFROMPARTS(s.Year, s.Month, 1)) AS MonthName
FROM Raw_Vault.Sat.Sat_AirlineDetails s;
  • Populating the fact table:
-- Load data into Fact_DelayAnalysis INSERT INTO Business_Vault.Fact.Fact_DelayAnalysis ( Airline_SurrogateKey, DateKey, TotalDelays, CarrierDelays, WeatherDelays, NASDelays, SecurityDelays, LateAircraftDelays ) SELECT h.Airline_SurrogateKey, YEAR(s.Year * 100 + s.Month) AS DateKey, SUM(s.Total_Delays) AS TotalDelays, SUM(s.Carrier_Delay) AS CarrierDelays, SUM(s.Weather_Delay) AS WeatherDelays, SUM(s.NAS_Delay) AS NASDelays, SUM(s.Security_Delay) AS SecurityDelays, SUM(s.Late_Aircraft_Delay) AS LateAircraftDelays FROM Raw_Vault.Hub.Hub_Airline h JOIN Raw_Vault.Sat.Sat_AirlineDetails s ON h.Airline_SurrogateKey = s.Airline_SurrogateKey GROUP BY h.Airline_SurrogateKey, YEAR(s.Year * 100 + s.Month); 

Example of using the data mart in Power BI

After loading data into the data mart, you can connect Power BI to the Business Vault database and create the following visualizations:

  • Bar chart of delays by reason.
  • Trend line of the total number of delays by month.
  • Pie chart of each airline's share in total delays.

Conclusion

Designing an analytical data mart is a key stage that provides access to ready-to-use data for analysis. Using data marts allows efficient data aggregation and presentation in a convenient format for BI tools.