Vendor integration
Elastic Cloud
BM25 relevance · faceted browse · ML anomaly detection on the decisions corpus
Status
● Live
production · real data flowing
Cluster — · — decisions indexed · BM25 relevance ranking on the reasoning corpus
Capability matrix
What we use, what's gated, what's planned
Sub-second BM25 search across the full-document decision corpus. When the operator asks why a ticker was skipped on a particular day, Elastic returns the relevant evaluation with its reasoning, confidence band, and macro context inline.
Coming nextDense-vector and ELSER retrieval add semantic search for natural-language operator queries; faceted browse across confidence bands and VIX regimes follows.
| Capability | Status | Phase | Why / How / Note |
|---|---|---|---|
Cluster reachable (v9.5.0) | LIVE | Phase 1.5 | Elastic Cloud cluster connected via ELASTIC_URL + ELASTIC_API_KEY env config; lazy client init on first call. |
3-tier search fallback (Elastic → atlas-search → regex) | LIVE | Phase 1.5 | /decisions never returns empty: ES BM25 first, Atlas Search Lucene second, MongoDB regex as last-resort demo mode. |
_get_es_client gate reconciled | LIVE | Phase 1.5 | Single source-of-truth client accessor; env-driven config; graceful degrade when cluster unreachable. |
sentinel-decisions index spec | CODE READY | Phase 1.5 | Explicit mapping: reasoning_text^2 + reasoning + narrative + symbol + strategy with English Lucene analyzer; facetable keyword fields for symbol/strategy/vix_regime/action. Index mapping defined; backfill script ready; cluster ingest pending operator action. |
search_trade_patterns ADK tool | CODE READY | Phase 1.5 | Multi-clause query DSL on decision corpus, exposed to elastic_analyst agent. Code merged in elastic_integration.py; gates on backfill completion. |
get_strategy_analytics + detect_decision_anomalies | CODE READY | Phase 1.5 | ADK tools for per-strategy aggregations and statistical outlier detection on confidence/outcome streams. Wired to elastic_analyst agent; awaits live corpus. |
dense_vector + kNN search | UNDERUSED | Phase 2.5 | Native vector field type; pairs with F2 to fold embeddings out of MongoDB into ES for BM25 + vector RRF in one query. Hackathon stretch — bundles with ELSER below. |
ELSER (Elastic Learned Sparse EncodeR) | UNDERUSED | Phase 1.5 (stretch) / Phase 2.5 | Semantic search without standing up an embedder pipeline — Elastic infers server-side at index + query time. 1d effort; killer hackathon-stretch demo line — 'semantic match without running a single embedding job'. |
RRF reranking (BM25 + vector + ELSER fusion) | PLANNED | Phase 2.5 | Reciprocal-rank fusion across lexical + dense + sparse scores in a single query; Phase 2 marketplace differentiator. |
Index Lifecycle Management (ILM) | PLANNED | Phase 2.5 | Hot / warm / cold / frozen tiering with auto-rollover; required past 50K decisions for cost flatness. Storage cost goes parabolic without this on multi-tenant scale. |
ML anomaly detection | UNDERUSED | Phase 2.5 | Built-in supervised + unsupervised anomaly jobs on confidence-score streams; replaces hand-rolled drift detection (ENH-021 class). 3d effort; retires custom signal-integrity polling; cross-references sentinel-divergence cycle. |
Painless scripts (custom relevance) | PLANNED | Phase 2.5 | Boost-by-recency + boost-by-confidence + custom function_score for domain-aware ranking. |
Watcher (alerting on query results) | UNDERUSED | Phase 2.5 | Server-side scheduled queries firing webhooks on conditions met; could replace some Cloud Scheduler + custom notifier code for ES-data-driven alerts. |
Connectors (native MongoDB sync) | UNDERUSED | Phase 2.5 | Managed change-stream-aware sync between MongoDB and ES — replaces hand-rolled elastic_backfill.py. 1d effort; retires elastic_backfill.py maintenance burden. |
Document-level RBAC (multi-tenant) APP-012 | PLANNED | Phase 2.5 | Server-side enforcement of tenant_id filter at ES role layer; enforced before app code sees a hit. Pairs with the MongoDB-side tenant_id partitioning (APP-012) for defence-in-depth. |
Async search | PLANNED | Phase 3 | Long-running query → query ID → poll for results. Useful for full-retention-window scans (500K+ docs). |
ES|QL (piped query language) | UNDERUSED | Phase 2.5 | SQL-like piped DSL replacement; readability win for code review; consider for new query authoring. |
Cross-Cluster Search (CCS) | PLANNED | Phase 4 | Multi-cluster federated search — Phase 4 multi-tenant story: per-tenant clusters, unified operator view. |
Search templates | UNDERUSED | Phase 2.5 | Server-side parameterised query templates; decouples client from DSL changes. |
Runtime fields | UNDERUSED | Phase 2.5 | Compute fields at query time without reindex; flexibility win for derived projections like confidence_score_bucket. |
Snapshots + repository backup | PLANNED | Phase 3 | Automated index snapshots to GCS; required for production DR; pairs with ILM frozen tier. |
Live data
Real-time from the Elastic Cloud backend
Every tile below is a live read from the vendor backend via the FastAPI BFF. If a tile shows "—" the backend is unreachable or the metric is not yet wired (no hardcoded numbers — see anti-pattern #2).
Awaiting backend
—
Awaiting backend
—
Awaiting backend
—
Awaiting backend
—
Roadmap commitments
Roadmap dependencies
Capabilities enabled by this integration — what is built, what is gated, and why.
Multi-tenant data partitioning (Elastic document-level RBAC)
Elastic-side tenant_id role filter is the second layer of defence after MongoDB schema-validator; document-level security enforced server-side, not in app code
GDPR DSR cascade (Article 17 right-to-erasure)
Elastic is one of 5 backends in the cascade; per-tenant deletion via _delete_by_query on tenant_id filter, mirrored from the MongoDB delete
Elastic backfill operator-action (sentinel-decisions index ingest)
Run delivery/scripts/elastic_backfill.py to populate the sentinel-decisions index from MongoDB; flips capability matrix from CODE-READY to LIVE for search_trade_patterns + get_strategy_analytics + detect_decision_anomalies
Demo flow
End-to-end showcase journey
Five steps a judge or investor can replay live. Each step links to the page that demonstrates it.
- 1
Open /decisions, type 'FOMC LOSS'. Source badge shows 'elasticsearch' with relevance scores — BM25 ranking, not regex. → Open /decisions
- 2
Click a result to drill into the decision detail — full reasoning text, confidence factors, outcome trail. → Open /decisions
- 3
Switch search to 'breakout failed'. Different ranking — Elastic's English analyzer surfaces stemmed matches the regex path would miss. → Open /decisions
- 4
(Phase 2.5) ML anomaly detection alerts fire on confidence-score streams — replaces hand-rolled drift detection.
- 5
(Phase 2.5) ELSER semantic search — 'decisions where the daemon hesitated' — without running a single embedding job.
What's next
Top-3 vendor-enabled capabilities coming soon
Sourced from the vendor's playbook. Each entry is mapped to its delivery phase and the value it unlocks.
Elastic ELSER (sparse semantic encoder)
Phase 1.5 stretch (bundle 8)
Semantic search without standing up the embedder pipeline; killer hackathon-stretch demo
ML anomaly detection
Phase 2.5
Replaces hand-rolled drift detection on confidence-score streams (ENH-021 class)
Connectors (native MongoDB sync)
Phase 2.5
Retires elastic_backfill.py maintenance — change-stream-aware managed sync