let's use the tools: Database - SQL Server Express Tools - ETL/ELT dbt (Data Build Tool) BI tools - Power BI Desktop Development and administration tools - VS Code, MS SQL data source - Airline Dataset (Bureau of Transportation Statistics, USA)

Scenario of using the Airline Dataset (Bureau of Transportation Statistics, USA) in a training course

The Airline Dataset provides rich information about flights, delays, airports, and airlines, making it ideal for building a training scenario focused on implementing Data Vault. Below is an example scenario that can be used in the course.


Scenario Description

Students take on the role of data engineers working on building an analytical platform for an airline. The goal is to create a Data Vault that will allow analyzing:

  1. Flight and airport statistics.
  2. Flight delays and their causes.
  3. Popular routes and destinations.

Data comes from several sources:

  • Flight System: Contains information about flights (dates, routes, airlines).
  • Airport System: List of airports.
  • Delay Reports: Data on delays and their causes.

Hubs:

  1. Hub_Flights: Stores unique flights.

    • Fields: FlightKey (Surrogate Key), FlightID (Business Key), SourceSystem, LoadDate.
  2. Hub_Airports: Stores airport data.

    • Fields: AirportKey (Surrogate Key), AirportID (Business Key), Name, City, Country, SourceSystem, LoadDate.
  3. Hub_Airlines: Stores airline data.

    • Fields: AirlineKey (Surrogate Key), AirlineID (Business Key), Name, SourceSystem, LoadDate.

Links:

  1. Link_FlightRoutes: Connection between flights and routes.

    • Fields: FlightRouteKey, FlightKey, OriginAirportKey, DestinationAirportKey, LoadDate.
  2. Link_FlightAirlines: Connection between flights and airlines.

    • Fields: FlightAirlineKey, FlightKey, AirlineKey, LoadDate.

Satellites:

  1. Sat_FlightDetails: Satellite for flight details.

    • Fields: FlightKey, DepartureTime, ArrivalTime, LoadDate, SourceSystem.
  2. Sat_AirportDetails: Satellite for airports.

    • Fields: AirportKey, RunwayLength, TerminalCount, LoadDate.
  3. Sat_DelayReports: Satellite for delay data.

    • Fields: FlightKey, DelayTime, DelayReason, LoadDate, SourceSystem.

Stages:

  1. Staging Area:

    • Loading raw data from CSV/JSON files.
    • Cleaning and standardizing data.
  2. Raw Vault:

    • Creating hubs, links, and satellites.
    • Using surrogate keys for data linking.
  3. Business Vault:

    • Adding business logic, e.g., classifying delays as short-term or long-term.

Based on the Data Vault, Data Marts are created for analytics:

  1. Route Analytics:

    • Most popular routes (by number of flights).
    • Airports with the highest traffic.
  2. Delay Analysis:

    • Top delay reasons.
    • Average delay time by airports.
    •  

4. Data Visualization

Using Power BI:

  • Dashboard 1: Flight Analysis
    • Map displaying routes.
    • Graph showing the number of flights by airlines.
  • Dashboard 2: Flight Delays
    • Table of delay reasons.
    • Chart with delay time distribution.