Data Strategy

Building the Mid-Market Finance Data Stack

You don't need a BI team and a Snowflake warehouse to get your data in order. A practical guide to assembling the right data infrastructure for a 100–500 person company.

Mei-Lin Zhao CEO & Co-Founder, InsytVex · March 26, 2026 · 8 min read

Architecture diagram of a mid-market finance data stack connecting ERP, CRM and HRIS to analytics layer

The conventional wisdom in data infrastructure is that you need a cloud data warehouse (Snowflake, BigQuery, or Redshift), a modern ETL pipeline (Fivetran, Airbyte, or something custom-built), a transformation layer (dbt), and a BI tool (Looker, Tableau, or Power BI) sitting on top. That stack is genuinely powerful. It's also sized for an engineering organization with dedicated data engineering capacity, and it costs — in both software licensing and internal labor — more than most mid-market finance teams can justify for their specific use case.

The good news is that the finance data problem is more constrained than the general analytics data problem. You're not trying to build a company-wide data platform that serves 15 teams with different data needs. You're trying to connect three to five source systems — an ERP, a CRM, an HRIS, and possibly a billing platform — into a coherent model that your FP&A team can use for planning and reporting. That's a narrower problem, and it has solutions that don't require a dedicated data engineer.

What the Finance Data Stack Actually Needs to Do

Before choosing tools, it's useful to be precise about what the finance data stack needs to accomplish. There are four functional requirements:

Data extraction: Pulling actuals from source systems — GL trial balance and AP/AR aging from the ERP, weighted pipeline and bookings history from the CRM, headcount roster and comp data from the HRIS — on a reliable schedule with minimal manual intervention.

Data normalization: Reconciling differences in how each system represents the same business concept. Your ERP uses a 6-digit account code. Your CRM uses product line codes that don't map cleanly to the chart of accounts. Your HRIS has department names that don't match the cost center structure in the ERP. Someone has to maintain the mapping layer that translates between these representations.

Model-ready structure: Transforming the normalized data into the dimensional structure the financial model expects. For most mid-market FP&A models, this means period-over-period actuals by department and account, headcount roster with effective dates, and ARR movements by customer and cohort.

Refresh cadence: Keeping the model current as actuals accumulate during the period, without requiring a manual export process each time.

Each of these requirements has multiple solutions along a spectrum from manual to fully automated, and the right choice depends on your team's technical capacity and the complexity of your source systems.

The Manual Tier: Structured CSV Exports

The lowest-tech tier that still qualifies as a "data stack" is a standardized manual export process with rigid schema discipline. Every modern ERP, CRM, and HRIS can produce CSV exports in a consistent format. The key is defining that format precisely — specifying exact column names, date formats, account code conventions, and null value handling — and enforcing it as a procedure rather than letting it vary run-to-run.

A well-disciplined CSV export process with a matching import template in your planning tool can support a monthly or quarterly refresh cadence with 2–4 hours of manual work per refresh. For a company at the smaller end of mid-market (under 150 employees, simple entity structure, one ERP) this is often the right starting point — the cost-to-benefit of a more automated stack isn't justified until the assembly process is consuming meaningful finance team time.

The limitation is that CSV exports are point-in-time snapshots. Between monthly exports, the model drifts from reality as new transactions post, headcount changes, and pipeline updates. For quarterly planning cycles, this is usually acceptable. For teams that want to run rolling forecasts that update weekly or need current data before a board meeting with short notice, the manual tier breaks down.

The Mid-Tier: API Connectors and No-Code Integration Tools

Most major source systems for mid-market finance teams — NetSuite, Salesforce, HubSpot, Workday, BambooHR, QuickBooks — have REST APIs or native connectors available through no-code integration platforms. Tools like Zapier, Make (formerly Integromat), and dedicated finance-focused connectors can automate the data pull without requiring anyone to write ETL code.

The practical architecture at this tier looks like: a scheduled connector that pulls defined data from each source system on a configurable cadence (daily, weekly, or on-demand), writes it to a Google Sheet or a staging area, and feeds a planning tool that's configured to ingest from that staging area. The normalization and mapping logic lives in the connector configuration and the staging sheet formulas, rather than in SQL or dbt models.

This tier can support a weekly refresh cadence and eliminates most of the manual export labor. The tradeoff is maintenance burden: connector configurations break when source systems update their APIs, mapping logic needs to be updated when the chart of accounts or department structure changes, and there's no version control or lineage tracking on the transformation logic the way there would be in a dbt model.

For a team of 3–5 finance professionals at a 150–400 person company without dedicated data engineering support, this tier typically provides the best cost-to-value ratio. The setup investment is 20–40 hours; the ongoing maintenance is 2–4 hours per month once the connectors are stable.

When You Actually Need a Data Warehouse

A Snowflake or BigQuery-based data warehouse starts to make sense when your finance data requirements create complexity that outgrows the mid-tier approach. The specific triggers are:

Multi-entity consolidation: If your company has multiple legal entities, multiple currencies, or inter-company eliminations, the transformation logic becomes complex enough that maintaining it in spreadsheet formulas or connector configurations is error-prone. A proper transformation layer with tested dbt models and documented lineage becomes worth the engineering investment.

High transaction volume in the source systems: If your CRM has 50,000 opportunities and your ERP has 2 million GL transactions per year, the API-based connector approach can run into rate limits, timeout issues, and performance problems that require a more engineered pipeline solution.

More than 6–8 source systems: Each additional source system multiplies the mapping maintenance burden. At a certain point, the informal connector-and-spreadsheet approach becomes harder to maintain than a properly engineered pipeline with centralized schema management.

We're not saying mid-market companies shouldn't build data warehouses. For companies that have grown to 400+ employees with complex entity structures and data needs that span multiple teams, the investment makes sense. The point is that for a pure FP&A use case at a 100–300 person company with a standard 3–4 source system setup, a full warehouse build is over-engineering the problem.

The Dimensional Modeling Requirements for Finance

Regardless of which tier you're building, the output data structure that finance models need has specific dimensional requirements. Understanding these upfront saves significant rework.

Financial actuals need to be structured as a fact table with at minimum these dimensions: account (GL account code), period (year-month), department/cost center, entity, and amount. Without the department dimension, cost allocation to the three-statement model requires manual splits. Without the entity dimension, consolidation requires a separate process.

Headcount data needs to be structured as a slowly changing dimension — a roster with effective dates, not a point-in-time snapshot. When an employee changes role or comp band, you need to know both their old and new values, with effective dates, to correctly calculate compensation expense across the transition period. HRIS exports that only show current state are problematic for retroactive period analysis and for mid-period model updates.

CRM pipeline data needs opportunity-level granularity with stage, amount, expected close date, and — if your CRM tracks it — product line or segment tags. Aggregated pipeline summaries (total weighted pipeline by quarter) are not sufficient for the cohort-level churn and expansion analysis described in the revenue modeling framework.

The Practical Recommendation for Most Teams

For a mid-market FP&A team of 2–5 people at a 100–400 person company without dedicated data engineering support, the practical path to a functional finance data stack is:

Start with structured CSV exports to validate your mapping logic and confirm source data quality before investing in automation
Build API connectors for the 2–3 source systems with the highest refresh frequency requirements (typically CRM for pipeline and ERP for actuals)
Maintain manual exports for lower-frequency sources (HRIS headcount roster, billing platform) until the connector investment is clearly justified
Use a finance-specific planning tool as the modeling layer — one that understands the dimensional structure of financial data natively, rather than building that structure yourself in a general-purpose BI tool

The goal at this stage isn't a perfect data architecture. It's a data architecture that eliminates the quarterly assembly marathon and lets your team spend their time on analysis rather than data plumbing. Every hour recovered from manual exports is an hour available for the forward-looking work that FP&A is actually for.

As source system complexity grows and the team's analytical ambitions expand, you'll add layers — more connectors, eventually a warehouse, eventually a transformation layer. But that evolution should be demand-driven, not supply-side over-engineering based on what a data team would build.