Why role clarity matters in a sustainable BI program
A BI program scales when responsibilities are explicit, deliverables are standardized, and handoffs are lightweight but unambiguous. When roles blur, teams tend to re-implement logic in multiple places, ship dashboards that disagree, and spend most of their time reconciling numbers instead of answering questions. This chapter clarifies three common roles—data engineer, analytics engineer, and BI/data analyst—by focusing on (1) responsibilities, (2) typical deliverables, and (3) collaboration points and handoff artifacts.
Role 1: Data Engineer (DE)
Primary responsibilities
- Ingestion and connectivity: bring data from source systems into the analytics environment (batch, streaming, files, APIs).
- Pipeline reliability: scheduling, retries, idempotency, backfills, monitoring, alerting, and incident response.
- Data quality at the boundary: schema drift handling, deduplication strategies, late-arriving data policies, and basic validity checks.
- Security and access patterns: service accounts, secrets management, network controls, and least-privilege access to raw/landing zones.
- Operational documentation: runbooks, on-call notes, and SLAs/SLOs for critical datasets.
Typical deliverables
- Ingestion pipelines: jobs/workflows that land source data into a raw/landing area.
- Source connectors and configs: API credentials, CDC configuration, file ingestion rules.
- Reliability assets: monitors, alerts, dashboards for pipeline health, and runbooks.
- Raw dataset contracts: expected schemas, partitioning strategy, and freshness expectations.
Collaboration points
- With analytics engineers: agree on raw table naming, partitioning, incremental load strategy, and how to represent deletes/updates (especially for CDC).
- With analysts: clarify source system meaning (what a field represents operationally), event timing (created vs posted vs updated), and known quirks.
Practical step-by-step: standing up a reliable ingestion pipeline
- Define the ingestion contract: source tables/endpoints, expected schema, update method (append-only vs upserts), and freshness target.
- Implement extraction: choose CDC or API pulls; ensure idempotency (re-running does not duplicate).
- Land raw data: store with minimal transformation; keep metadata like
_ingested_at, source file name, and batch id. - Validate boundary checks: row counts, null checks on key fields, schema drift detection.
- Operationalize: scheduling, retries, alerting, and a runbook describing how to backfill.
Role 2: Analytics Engineer (AE)
Primary responsibilities
- Transformations and modeling: convert raw data into analysis-ready tables with consistent keys, standardized timestamps, and business-friendly structures.
- Testing and quality gates: automated tests for uniqueness, referential integrity, accepted values, freshness, and reconciliation checks.
- Semantic definitions: define canonical metrics and dimensions (e.g., “Net Revenue”, “Active Customer”) and ensure they are computed consistently.
- Performance and maintainability: incremental models, partitioning/clustering choices, and avoiding repeated heavy computations.
- Documentation for reuse: model descriptions, lineage notes, and usage guidance for downstream consumers.
Typical deliverables
- Transformation code: version-controlled SQL/models and orchestration definitions.
- Curated datasets: standardized intermediate and final tables used by BI tools.
- Test suite: automated checks and CI gates that prevent breaking changes.
- Metric definition documents: canonical KPI formulas, filters, inclusion/exclusion rules, and edge cases.
- Change logs: what changed, why, and expected downstream impact.
Collaboration points
- With data engineers: coordinate on schema changes, late-arriving data handling, and backfill procedures.
- With analysts: translate stakeholder questions into stable definitions; validate that modeled outputs match business expectations; agree on grain and slicing behavior for dashboards.
Example: transformation spec (handoff artifact)
A transformation spec is a lightweight agreement between AE and analyst (and often DE) that defines exactly how a dataset will be produced.
| Section | Example content |
|---|---|
| Input sources | raw.orders, raw.order_items, raw.refunds |
| Output model | mart.fct_order_revenue_daily |
| Grain | One row per order_id per day of order_posted_date |
| Business rules | Revenue recognized on posted date; exclude test orders; refunds reduce revenue on refund posted date |
| Keys | order_id unique per day; customer_id required |
| Edge cases | Partial refunds; multi-currency conversion using daily FX rate |
| Tests | Uniqueness on (order_id, order_posted_date); non-null customer_id; reconciliation to finance totals within tolerance |
Role 3: BI/Data Analyst (Analyst)
Primary responsibilities
- Stakeholder requirements: clarify the business question, decision context, audience, and success criteria.
- Analysis and interpretation: explore data, identify drivers, quantify impact, and communicate insights.
- Dashboarding and reporting: design visuals, define filters, set default views, and ensure usability for the target audience.
- Validation with the business: confirm metrics match expectations; run UAT sessions; document assumptions and limitations.
- Adoption and iteration: training, feedback loops, and prioritizing enhancements.
Typical deliverables
- Requirements brief: business question, intended actions, definitions, and acceptance criteria.
- Mockups/wireframes: proposed dashboard layout and interactions.
- Exploratory queries/notebooks: analysis used to validate definitions and identify needed fields.
- BI assets: dashboards, reports, alert rules, and documentation for end users.
- Data validation notes: reconciliations, known caveats, and sign-off records.
Collaboration points
- With analytics engineers: request new fields/metrics, confirm definitions, and avoid implementing business logic only in the BI tool.
- With data engineers: surface data availability gaps, freshness issues, and source system changes impacting reporting.
Practical step-by-step: turning a stakeholder question into a dashboard requirement
- Clarify the decision: “What will you do differently based on this dashboard?”
- Define the audience and cadence: exec weekly review vs daily operations.
- List required KPIs and dimensions: include exact definitions and needed slices (region, channel, product).
- Set acceptance criteria: expected totals for a known period, refresh frequency, and performance expectations.
- Identify data gaps: missing fields, unclear mappings, or inconsistent source values.
- Draft a wireframe: include filters, drill paths, and default date ranges.
Handoff artifacts that prevent rework
1) Source-to-target mapping (DE ↔ AE)
This artifact makes ingestion and transformation explicit: what comes from where, how it is typed, and where it lands.
| Source field | Source table | Target field | Target model | Transform | Notes |
|---|---|---|---|---|---|
order_total | erp.orders | gross_amount | stg_orders | Cast to decimal(18,2) | Stored in source currency |
status | erp.orders | order_status | stg_orders | Uppercase; map legacy codes | See status mapping table |
posted_at | erp.orders | order_posted_ts | stg_orders | Convert to UTC | Use source timezone metadata |
2) Transformation specification (AE ↔ Analyst)
This defines business logic once, upstream, so dashboards don’t become the place where definitions diverge.
Model: mart.fct_orders
Purpose: Canonical order-level facts for revenue and fulfillment reporting
Grain: 1 row per order_id
Inclusions: posted orders only; exclude test customers
Revenue: gross_amount - discounts - refunds
Time logic: revenue_date = order_posted_date; refunds applied on refund_posted_date
Dimensions: customer_id, channel, region, product_category (from dim_product)
Quality checks: unique(order_id), not_null(customer_id), reconcile to finance within 0.5%3) Metric definition document (AE ↔ Analyst ↔ Stakeholders)
A metric definition document is a shared contract. It should be short enough to maintain, but precise enough to prevent “same name, different math.”
- Listen to the audio with the screen off.
- Earn a certificate upon completion.
- Over 5000 courses for you to explore!
Download the app
| Metric | Definition | Formula (pseudo) | Filters/Exclusions | Grain & caveats |
|---|---|---|---|---|
| Net Revenue | Revenue after discounts and refunds, recognized on posted dates | sum(gross_amount - discount_amount) - sum(refund_amount) | Exclude test orders; exclude canceled before posting | Daily by posted date; refunds can shift revenue across days |
| Active Customers | Customers with ≥1 posted order in the last 30 days | count_distinct(customer_id where order_posted_date >= today-30) | Exclude internal accounts | Rolling window; sensitive to late-arriving orders |
Common failure modes (and how to prevent them)
Failure mode: Unclear ownership
Symptoms: pipeline breaks linger; stakeholders don’t know who to contact; analysts patch data issues in dashboards.
Prevention: define dataset owners and on-call expectations; attach an owner to each critical table/model; maintain a simple escalation path (DE for ingestion failures, AE for model logic, Analyst for dashboard behavior and requirements).
Failure mode: Duplicated logic in dashboards
Symptoms: multiple dashboards compute “Net Revenue” differently; changes require editing many reports; performance degrades due to heavy BI-tool calculations.
Prevention: push business logic upstream into curated models; restrict BI-tool calculations to presentation-only (formatting, simple ratios when already well-defined); create a canonical metric layer and require reuse.
Failure mode: Conflicting KPI definitions
Symptoms: finance and sales disagree; weekly business reviews focus on reconciling numbers; teams lose trust in BI.
Prevention: maintain metric definition documents with explicit inclusions/exclusions; require stakeholder sign-off for KPI changes; version definitions and communicate changes with examples (before/after for a known period).
Failure mode: Handoffs without acceptance criteria
Symptoms: AE delivers a model but analyst finds missing fields; repeated rework; “done” is subjective.
Prevention: add acceptance criteria to specs: required columns, expected totals for a validation period, freshness, and performance targets.
Responsibility map: a typical dashboard request from intake to deployment
| Stage | Primary role | Supporting roles | Key outputs | Quality gate |
|---|---|---|---|---|
| 1. Intake & scoping | Analyst | AE | Requirements brief; KPI list; wireframe; acceptance criteria | Stakeholder confirms definitions and intended decisions |
| 2. Data availability check | AE | DE, Analyst | Data gap list; proposed sources; initial mapping needs | Feasibility confirmed; owners assigned |
| 3. Ingestion/landing (if needed) | DE | AE | Ingestion pipeline; raw tables; runbook; freshness SLA | Pipeline monitored; boundary checks passing |
| 4. Modeling & transformations | AE | Analyst | Transformation spec; curated models; tests; documentation | Automated tests pass; reconciliation within tolerance |
| 5. Metric standardization | AE | Analyst, Stakeholders | Metric definition document; versioned KPI logic | Business sign-off on KPI definitions |
| 6. Dashboard build | Analyst | AE | Dashboard/report; filter behaviors; drill paths; user guide | Uses canonical models/metrics; minimal duplicated logic |
| 7. UAT & validation | Analyst | AE, DE | UAT checklist; validation queries; issue log | Acceptance criteria met; discrepancies resolved |
| 8. Deployment & monitoring | Analyst | DE, AE | Published dashboard; access controls; refresh schedule; alerting | Refresh succeeds; usage and data quality monitored |