When to build vs buy your data pipeline in 2026
The honest decision matrix for whether to build your data pipeline on managed services, an open-source stack, or a vendor like Fivetran. Real numbers, not vendor pitch decks.
The decision is more nuanced than vendors admit
Every quarter a new managed-service vendor pitches us on why we should be running our clients' data pipelines on their platform. Every quarter we see open-source-purist consulting shops pitch the opposite — "build it on Airflow, you'll thank us later". Both are wrong as a generalisation. The right answer depends on three variables that vendors rarely ask about during demos.
The three variables that actually matter
Variable one: source diversity. How many different SaaS systems and operational databases does the data come from? Each new source is roughly equivalent in connector-engineering cost. Five sources is manageable in-house. Twenty-five sources is not — that is when the per-connector cost of a managed vendor like Fivetran or Airbyte starts paying for itself.
Variable two: transformation complexity. Is the data flowing through unchanged, just landing in your warehouse for analysts to query? Or is there real transformation logic — entity resolution, business-rule application, multi-stage modelling? Pure ELT favours managed connectors. Heavy transformation favours an open-source orchestration layer where you have full control.
Variable three: data sensitivity. Are you moving customer PII, financial transactions, regulated health data? Some managed vendors are SOC 2 compliant and have HIPAA BAAs available. Others do not. Sensitivity narrows the vendor field dramatically.
The honest cost comparison
Let me show you what we usually see when we audit a client's pipeline costs. A typical mid-stage SaaS, 5-10 sources, 100k-1M rows per source per day:
Fully managed (Fivetran + dbt Cloud + Snowflake): - Fivetran: $2,000-$8,000/month depending on row volume - dbt Cloud: $100-$500/month depending on seats - Snowflake compute: $1,500-$5,000/month - Engineering time to maintain: 5-10 hours/month - Total: $4,000-$14,000/month + ~$2,000/month engineering
Open-source self-hosted (Airbyte OSS + dbt Core + Postgres or DuckDB): - Airbyte OSS infra: $200-$800/month (your VMs/Kubernetes) - dbt Core: $0 - Storage + compute: $500-$2,000/month (Postgres + workers) - Engineering time: 30-50 hours/month (real number — connector breakage, infra babysitting, on-call) - Total: $700-$2,800/month + ~$10,000/month engineering
Hybrid (Airbyte Cloud + dbt Cloud + lighter warehouse): - Often the sweet spot at $2,000-$5,000/month with 10-15 hours/month of engineering
The "cheaper" open-source option is only cheaper if your engineering time is cheaper than $50/hour. For most US-based engineering teams, it is not. For offshore teams or for clients with senior data engineers in-house already, it can be.
The factor most people miss
Switching costs, in both directions. Migrating from managed-vendor to open-source is hard but doable in a quarter. Migrating from open-source-Airflow-everything to managed is psychologically painful — you are essentially admitting you over-built. Teams resist this even when the economics are obvious.
Our recommendation: start managed, migrate to open-source only when the bill exceeds the engineering cost of building it. The time you spend self-hosting Airflow at year one is time you are not spending on the actual business problem.
What we build for clients
For most SMB and mid-market clients we build on Airbyte Cloud + dbt Cloud + a Postgres or BigQuery warehouse. For larger clients with sensitive data or unusual sources, we go to a custom open-source stack on Kubernetes. For clients in the $50M+ ARR range with serious data engineering benches, we usually inherit whatever they have already built and improve at the edges.
The decision is rarely about the technology. It is about the cost-and-staffing reality of the team that has to maintain the pipeline two years from now.