Daniel Linstedt - Building a Scalable Data Warehouse with Data Vault 2.0

  1. What is Big Data?
    1. Volume
    2. Velocity
    3. Variety
    4. Veracity (optional)
    5. Value (optional)
  2. What is Data Vault?

    Methodology for building maintaining and expanding DWH.

  3. “Data pyramid”
    1. Data
    2. Information
    3. Knowledge
    4. Wisdom

Data Vault 2.0 - Overview

  1. Structure

Source system(s) –hard business rules–> Staging Area –soft business rules–> Information Marts

  1. Linstedt proposes using “information marts” instead of “data marts” as those are objects following data operations (e.g. aggregation, consolidation) - i.e. at a higher pyraimd level than raw pieces of data.

  2. Auditability limited to 4 pieces of information:
    1. Where from
    2. When
    3. How
    4. Where to
  3. In addition to the columns from the source system, each table in the stage area includes:
    1. sequence number
    2. timestamp
    3. record source
    4. hash key computations for all business keys and their combinations