Term

Data Warehouse

A data warehouse is a centralised store of cleaned, conformed, structured data, organised so analytical queries are fast and reporting is consistent. The defining move is schema-on-write: data is shaped, typed, and reconciled before it lands, the opposite of a data lake's schema-on-read posture.
Reviewed by 7wData

Why it matters

The warehouse is the spine of most BI programs I see. It is where finance, ops, and product agree on what a customer, a transaction, and a product line actually are, because the conforming work has been done upstream. That distinguishes it from a data lake (raw, schema-on-read, flexible but inconsistent) and from a data lakehouse (the converged architecture that keeps the lake’s flexibility and bolts warehouse-grade consistency on top). For AI: a model trained on production warehouse extracts is almost always more honest than one trained on lake-raw inputs, because the warehouse has absorbed the definitional fights the lake silently passes to the training pipeline.

Where you’ll encounter it

Three contexts. A data team picking between Snowflake, BigQuery, Redshift, or Databricks SQL is choosing a warehouse, even when the vendor markets it as a lakehouse. A CFO asking “where does our reporting data actually live” is asking about the warehouse, whether they use the word or not. A model risk review that finds the training set was a warehouse extract from a stale snapshot is looking at a warehouse failure mode: trustworthy data, wrong time. Notice the pattern: every “lakehouse vs warehouse” comparison is doing the same architectural work in a different shape, trading consistency cost against flexibility cost, with the warehouse on the consistency side.


Part of the 7wData AI Glossary. Tracking how concepts like this move in the expert conversation: daily signals at ins7ghts.com.