Connector activation pending
The Fivetran MongoDB → BigQuery connector is not yet provisioned in this environment. All live tiles below render — until the Fivetran connector is provisioned. No hardcoded numbers are shown — the tiles populate from real data as soon as the connector goes live.
Once activated, sync metrics, schema progress, and freshness appear here automatically.
Vendor integration
Fivetran + BigQuery
Managed Mongo→BQ sync · analytical warehouse for the trade-decision corpus
Status
● Live
production · real data flowing
— · — rows in BigQuery · last sync: — · BQML: —
Capability matrix
What we use, what's gated, what's planned
Fully managed replication of the live MongoDB corpus into BigQuery on a six-hour cadence — the live trade decisions, sentiment summaries, and agent memory landing in the warehouse with full Fivetran lineage. The operator gets SQL-grade analytics over the trading record without writing or maintaining a single line of ETL.
Coming nextScheduled queries materialise the daily win-rate dashboards and per-strategy attribution cuts directly in BigQuery; dbt models layer in for governed transforms.
| Capability | Status | Phase | Why / How / Note |
|---|---|---|---|
BigQuery SELECT/WITH safety whitelist | LIVE | Phase 1.5 | bigquery_reader.py rejects any non-read statement; no DML, no DDL, ever, from agent or web surface. ✓ 26 tests |
100MB scan cap per query | LIVE | Phase 1.5 | dryRun gate aborts queries above 100MB before execution — prevents accidental $5/TB blowouts. |
/data-pipeline operator surface | LIVE | Phase 1.5 | F4 page: connector inventory + curated SQL examples + safe query box. Live React surface today. |
Fivetran REST client wrapper | LIVE | Phase 1.5 | 5 ADK tools on the data_pipeline_manager agent (list/inspect/sync/schema/log). LIVE 2026-05-18 — System Key authenticates; list_connectors round-trip verified. |
Local Fivetran MCP shim | LIVE | Phase 1.5 | fivetran_mcp.py exposes read-only Fivetran introspection over MCP — no public npm package as of 2026-05-08. LIVE 2026-05-18 — McpToolset mounted on data_pipeline_manager; agent.py:432 spawns the local stdio shim. |
MongoDB → BigQuery managed replication | LIVE | Phase 1.5.X | Fivetran MongoDB connector replicates 9 collections to mongo_sentinelhub BigQuery dataset. LIVE 2026-05-18 — chosen_stung connector synced 4,801 rows initial load (trade_decisions 4,372 · agent_memory 162 · social_sentiment_raw 126 · market_snapshots 96 · sentiment_summaries 31 · others). |
Schema evolution auto-handling | LIVE | Phase 1.5.X | Fivetran auto-evolves the BQ schema when Mongo collection shape changes — no hand-rolled migration scripts. LIVE 2026-05-18 — Fivetran auto-created the 9-table schema in mongo_sentinelhub on first sync. |
BQ ML logistic regression on outcomes | PLANNED | Phase 1.5.S (S5) | In-warehouse model on trade_decisions JOIN trade_outcomes — replaces Python calibration job for Phase 1. Gated on connector activation; one-day spike once data lands in BQ. |
Connector groups + authorized views per tenant APP-012 | PLANNED | Phase 2.5 | One Fivetran group per tenant → one BQ dataset per tenant → row-level security inherited. Multi-tenant primitive. Lowest-effort multi-tenant warehouse pattern; required before tenant #2. |
Materialized views on /insights aggregates | PLANNED | Phase 2.5 | Pre-aggregate rolling win-rate + per-strategy P&L. BQ auto-refreshes; sub-second tenant dashboards at zero query cost. |
dbt Cloud trigger on sync-complete | PLANNED | Phase 2.5 | Fivetran sync-complete webhook fires the dbt project — retires Cloud Scheduler for analytical aggregations. |
Time-travel `FOR SYSTEM_TIME AS OF` | PLANNED | Phase 3 | Point-in-time backtests + FCA Phase-3 audit lineage; BQ keeps 7-day history natively, extendable to 30d. FCA evidence requirement — 'what did we know on date X'. |
Column blocking + hashing for PII compliance APP-015 | PLANNED | Phase 2.5 | Per-column directives at the Fivetran layer (block / hash / encrypt) — required for tenant emails landing in BQ. |
BigQuery Vector Search (Atlas alternative) | PLANNED | Phase 3 | VECTOR_SEARCH() GA from 2024 — keep embeddings + decisions in one warehouse, no Mongo round-trip. Strategic alternative to Atlas Vector Search; would decommission Atlas index in trade for warehouse consolidation. |
Custom Connector SDK (broker statement PDFs) | UNDERUSED | Phase 2 | First-party SentinelHub→Alpaca connector lands fills + positions directly in BQ — no Mongo stop-over. Hackathon $10K stretch track explicitly rewards novel connector use. |
BigLake external tables | UNDERUSED | Phase 3 | Federate raw bar archives in GCS as Parquet — BQ queries them in place, no load cost. |
BI Engine in-memory acceleration | UNDERUSED | Phase 2.5 | Sub-second response on repeated dashboard queries; free up to 1GB reservation. |
Capacity-based pricing decision | UNDERUSED | Phase 3 | On-demand $5/TB is fine pre-revenue; capacity slots cheaper past ~10TB/month. Cost-optimisation thread; not a code change. |
Live data
Source: fivetranApi.capabilities (loading) · As of 17:20:59
Real-time from the Fivetran + BigQuery backend
Every tile below is a live read from the vendor backend via the FastAPI BFF. If a tile shows "—" the backend is unreachable or the metric is not yet wired (no hardcoded numbers — see anti-pattern #2).
Connectors (active/total)
—
Rows in trading_warehouse
—
Worst sync age
—
BQ ML model status
—
Roadmap commitments
Roadmap dependencies
Capabilities enabled by this integration — what is built, what is gated, and why.
Multi-tenant data partitioning — BQ shared dataset, tenant_id partitioned + clustered, authorized views
Connector groups per tenant + authorized views inherit row-level security; cheaper than per-tenant datasets at small scale.
GDPR DSR cascade (Article 17 right-to-erasure) — BQ is one of 5 backends
Fivetran column blocking + BQ tenant_id-filtered DELETE on DSR; cascades from operator console.
bigquery_reader.py — make tenant_id parameter required, fail-closed
Today the reader is single-tenant implicit; Phase 2 retrofit makes tenant_id a required argument so a missing scope rejects rather than reads global.
Demo flow
End-to-end showcase journey
Five steps a judge or investor can replay live. Each step links to the page that demonstrates it.
- 1
Open /data-pipeline. Connector list is empty — surface clearly states 'Fivetran not configured' and points to operator next-action 1.5.X.3 (activate trial connector). → Open /data-pipeline
- 2
Type into the BigQuery query box: SELECT COUNT(*) FROM trading_warehouse.trade_decisions. Server-side allowlist accepts (SELECT-only); 100MB dry-run gate confirms scan size below cap. → Open /data-pipeline
- 3
(Phase 1.5.S S5) Run BQ ML logistic regression on trade_decisions JOIN trade_outcomes — in-warehouse model trains in SQL, replaces Python calibration job.
- 4
(Phase 2.5) Materialized view on /insights aggregates auto-refreshed by BQ — sub-second tenant dashboards at zero query cost.
- 5
(Phase 3) Time-travel point-in-time backtest via SELECT * FROM trade_decisions FOR SYSTEM_TIME AS OF '2026-04-01' — FCA-grade audit lineage without snapshot tables.
What's next
Top-3 vendor-enabled capabilities coming soon
Sourced from the vendor's playbook. Each entry is mapped to its delivery phase and the value it unlocks.
BQ ML logistic regression on win/loss outcomes
Phase 1.5.S (S5)
Replaces Python calibration job; live BQ ML model is a strong hackathon demo artefact (gated on connector activation).
Connector groups + authorized views per tenant
Phase 2.5
Multi-tenant infrastructure primitive; lowest-effort path to per-tenant warehouse isolation before tenant #2.
dbt Cloud trigger on sync-complete
Phase 2.5
Retires Cloud Scheduler for analytical aggregations; rolling-win-rate + regime-attribution as dbt models.