When to use models
Models are ideal for:- Joining sources together - Combine data from multiple sources or databases into a single clean table
- Type casting - Convert data types ahead of time so your reports don’t need to handle it
- Reusable logic - Centralize calculations and business logic in one place instead of repeating it across reports
- Pre-running large queries - Execute heavy transformations once instead of on every page load
- Clean report tables - Create simple, well-structured tables that are easy to work with when building visualizations
Creating a model
- Navigate to the Models tab in the sidebar
- Click New Model
- Name your model - this is how you’ll reference it in your reports
- Write your SQL query in the editor
- Click Save to create the model
What should go in models
Models are where complex data preparation happens, keeping your reports fast and focused on presentation.Joins belong in models
All joins between sources should be handled in models, not in reports. This ensures:- Faster report loading times
- Consistent join logic across all reports
- Easier maintenance when relationships change
Model granularity
Create models at the most granular level users need for drilldown and exploration. Think about:- What’s the finest detail users should be able to filter to?
- What dimensions do they need to slice and dice by?
- What level of detail supports their analysis?
Recategorizing and bucketing
Use CASE statements in models to create clean, consistent categories. This is ideal for:- Bucketing numeric values into ranges (small/medium/large orders)
- Standardizing messy category data
- Creating business-friendly labels from technical values
Keep reports simple
Reports should focus on filtering, sorting, and presenting data from models. Complex transformations and joins slow down reports and make them harder to maintain. In models: Joins, aggregations, complex calculations, data type casting, CASE statementsIn reports: Filtering, sorting, simple calculations, formatting
Refresh schedules
Models automatically refresh on a schedule to keep your data up to date. You can also manually refresh a model at any time from the Models page. By default, models refresh every 12 hours. Enterprise plans can configure a custom refresh schedule per model, with hourly intervals ranging from 1 hour to 72 hours.Table engines
Models support different table engines that control how your data is stored and optimized. The default engine works well for most use cases, but you can choose a specific engine for better performance depending on your workload.| Engine | Description | Data location |
|---|---|---|
| S3 | Reads and writes data directly to S3 objects. Best for archival or cold data. Does not support indexes. | Region of your organization |
| MergeTree | General-purpose table. Faster than the default engine. Good default choice for most workloads. | GCP us-central1 |
| SummingMergeTree | Pre-aggregated counters or running totals — things like daily page view counts, revenue by region, or click counters | GCP us-central1 |
| AggregatingMergeTree | Incremental, pre-computed aggregations more complex than simple sums (e.g. unique user counts, averages, or quantiles). Advanced — requires special aggregate function syntax. | GCP us-central1 |
SummingMergeTree: watch out for non-additive metrics
SummingMergeTree automatically sums numeric columns when merging rows with the same key. This works great for additive metrics like counts and totals, but it will produce incorrect results for non-additive metrics like averages, percentages, or ratios. For example, this query would give wrong results with SummingMergeTree:avg_val is an average, SummingMergeTree would sum those averages together during background merges — producing meaningless numbers. Averages of averages are not valid aggregations.
Other things to keep in mind with SummingMergeTree:
- Always use
sum()+GROUP BYwhen querying — background merges may be incomplete at query time, so rows with the same key might not be fully merged yet. Explicit aggregation in your queries ensures correct results. - Rows where all summed columns equal 0 are automatically deleted — this can be surprising if you expect all rows to be retained.
MergeTree family: primary keys must be non-nullable
All MergeTree family engines (MergeTree, SummingMergeTree, AggregatingMergeTree) require a non-nullable primary key. However, the Evidence Studio Data Lake usesNullable columns by default to allow data ingestion to succeed more easily — even when some values are missing.
This means that if you select a column directly as your primary key, it may fail because it’s Nullable. Use coalesce to provide a fallback value and ensure the column is non-nullable:
coalesce to guarantee a non-null value.
If your table has many columns, you can use * EXCEPT to select everything else without listing each column individually:
Examples
Enriched order details
Create a model calledenriched_orders that joins order details with item information:
enriched_orders throughout your reports:
Daily sales summary
Create a model calleddaily_sales that pre-aggregates order data by day:
daily_sales in a line chart to visualize trends over time:
Order summary with item counts
Create a model calledorder_summary that combines headers with aggregated details:
order_summary to quickly analyze order patterns:

