Vendor integration
Google Gemini (ADK)
9 ADK agents · Flash + Flash Lite tiered · Batch + Context Cache stacked · 66% F1 evaluator cost cut
Status
● Live
production · real data flowing
0 agents · 0 tools registered · — tokens today · —% cache hit
Capability matrix
What we use, what's gated, what's planned
The reasoning foundations behind every SentinelHub agent. A two-tier model strategy — Flash for synthesis and orchestration, Flash Lite for narrow classification — powers 0 specialist agents across 0 tools; Batch + Context Cache cut F1 evaluator cost by 66%.
Coming nextPer-tenant cost attribution dashboards land alongside the multi-tenant retrofit; prompt-injection guards and structured-output schemas harden the operator chat surface.
| Capability | Status | Phase | Why / How / Note |
|---|---|---|---|
9 ADK agents (orchestrator + 8 specialists) APP-009 | LIVE | Phase 1 | Multi-agent topology: orchestrator routes natural-language intent to market_analyst, sentiment, risk, advisor, decision_logger, reflection, chat, briefing. MODEL_FAST (gemini-flash-latest) for synthesis; MODEL_LITE (gemini-3.1-flash-lite-preview) for narrow classification. |
Registered tools with strict per-agent allowlist | LIVE | Phase 1 | Plain-Python tool callables; ADK derives JSON schemas from type hints + docstrings. Strict per-agent allowlists enforced — only trade_executor can submit_bracket_order, only decision_logger writes to Mongo. |
Two-tier model strategy (FAST + LITE) APP-009 | LIVE | Phase 1 | MODEL_FAST for orchestrator/advisor/chat/briefing; MODEL_LITE for sentiment classification, ticker scoring, evaluator. Module-level constants make model upgrades a one-line edit. |
OpenInference auto-instrumentation via Phoenix | LIVE | Phase 1.5.S (S3) | tracing.py wraps every Gemini + tool call; cached_tokens, input_tokens, output_tokens, cost.total_usd land on every span. Phoenix dashboard sees the entire 9-agent graph live. |
Batch API path (50% discount) | LIVE | Phase 1.5.S (S3) | evaluators/runner.py: run_evaluation_batch() submits the F1 LLM-as-judge sweep as one Gemini batch job. Falls back to serial if the SDK or model variant doesn't support batches. |
Context Cache path (75% on cached input) | LIVE | Phase 1.5.S (S3) | decision_judge.py caches the system instruction + 6-criterion rubric once per run (TTL 1h). cached_tokens visible in Phoenix spans; stacks with batch for the 66% headline F1 cost cut. |
text-embedding-004 for F2 vector pipeline | CODE READY | Phase 1.5 | 768-dim embeddings via gemini-embedding-004 power the Atlas Vector Search index over the reasoning corpus. Asymmetric task types (RETRIEVAL_DOCUMENT/RETRIEVAL_QUERY) wired correctly. Embedder merged; gated on Atlas M10 cluster upgrade for the vector index. |
Phoenix-traceable cost + cache instrumentation | CODE READY | Phase 1.5.S (S3) | MODEL_PRICING + CACHED_INPUT_DISCOUNT (0.25) + BATCH_DISCOUNT (0.50) constants in api/routers/arize.py compute genai.cost.total_usd per span. Cost dashboard derives from this. Cost constants live; the /monitoring page widget that aggregates them across the day is the next display layer. |
tenant.id span attribute for SaaS billing APP-014 | PLANNED | ongoing platform release | Phoenix spans must carry tenant.id so per-tenant cost dashboards (Phoenix + BQ usage logs + Stripe tier) can join. Billing-evidence layer for Phase 2.5.6 unit economics. tracing.py is the wire-in point; gates Phase 2.5.6 SaaS billing. |
Per-tenant LLM budget circuit breaker APP-014 | PLANNED | Phase 2.0 | Reactive enforcement of per-plan token caps using tenant.id span attribute + tenant_metadata in Mongo. Halts a tenant's agent calls when daily/monthly budget is exceeded. |
Parallel tool use (function-call fan-out) | PLANNED | Phase 2.5 | Gemini supports parallel function calls in a single turn. The orchestrator currently routes sequentially; sentiment + macro + news per-ticker fan-out halves wall time when migrated. Orchestrator latency cut. Deterministic for evals. |
Live API (bidirectional voice streaming) | UNDERUSED | Phase 2.5 | WebSocket-based audio in / audio out. Replaces the current SSE chat sidebar with voice-driven SaaS interaction — the Phase 2.5 UX differentiator. |
Vertex Agent Engine (managed deployment) | UNDERUSED | Phase 2.5 | Managed scaling + native Phoenix trace integration. Alternative to Cloud Run; worth evaluating for Phase 2 production posture. |
ParallelAgent migration of sentiment fan-out | UNDERUSED | Phase 2.5 | ADK's explicit DAG primitive replaces natural-language routing on the deterministic legs (sentiment fan-out, F1 judge harness). Halves orchestrator wall time, deterministic for evals. |
Search grounding tool (Google Search built-in) | UNDERUSED | Phase 2.5 | Gemini-native real-time news + sentiment beyond our X/Reddit/RSS firehose — particularly for breaking events social hasn't picked up yet. |
Code execution tool (built-in Python sandbox) | UNDERUSED | Phase 2.5 | Ad-hoc compute without a custom tool. Useful for advisor what-ifs; flagged not for risk paths (we keep deterministic Python in our codebase for the audit trail). |
1M-token long-context for full decision-history reasoning | UNDERUSED | Phase 2.5 | Pass the entire decision history into a single judge call for cross-decision pattern detection — replaces the paginated reflection-agent rollups. |
Thinking models / extended reasoning | UNDERUSED | Phase 3 | Higher-cost reasoning chains for divergence diagnosis, calibration interpretation, monthly attribution writeups — anywhere the operator asks 'but why'. |
Fine-tuning via Vertex AI | UNDERUSED | Phase 3 | Domain-specific Gemini for the strategy advisor — fine-tune on our decision corpus + outcomes. Phase 3, not before ≥1K labeled outcomes. |
Live data
Source: phoenix-spans (loading) · As of 17:20:59
Real-time from the Google Gemini (ADK) backend
Every tile below is a live read from the vendor backend via the FastAPI BFF. If a tile shows "—" the backend is unreachable or the metric is not yet wired (no hardcoded numbers — see anti-pattern #2).
Tokens today
0
ADK agents active
0
Tools registered
0
Cache hit rate (%)
0
Roadmap commitments
Roadmap dependencies
Capabilities enabled by this integration — what is built, what is gated, and why.
Gemini model strategy — Flash + Flash Lite tiering
Two-tier model selection (FAST for synthesis/chat, LITE for narrow classification) is the cost-discipline doctrine for the 9-agent topology. Module-level constants make upgrades atomic.
Per-tenant LLM budget circuit breaker
Phoenix tenant.id span attribute is the billing-evidence layer; per-plan token caps enforced reactively from tenant_metadata in Mongo.
tracing.py wires tenant.id resource attribute on every span
Gates Phase 2.5.6 SaaS billing — without tenant.id on every Gemini span, per-tenant unit economics is unauditable.
Demo flow
End-to-end showcase journey
Five steps a judge or investor can replay live. Each step links to the page that demonstrates it.
- 1
POST /judge/run?mode=serial — see the span land in Phoenix with `genai.usage.input_tokens` + `genai.cost.total_usd` populated. Baseline cost. → Open /monitoring
- 2
POST /judge/run?mode=batch — batch span shows `genai.usage.cached_tokens` populated + `evaluator.batch.size` + the 50% batch discount applied on top of the 75% cache discount. → Open /monitoring
- 3
Open /monitoring. Cost-burn widget shows daily spend split by model (FAST vs LITE) — derived from MODEL_PRICING constants in api/routers/arize.py. → Open /monitoring
- 4
(Phase 2.5) Per-tenant cost dashboard composed of Phoenix tenant.id spans + BQ usage logs + Stripe subscription tier — single source of truth for SaaS unit economics.
- 5
(Phase 3) Live API voice-driven chat sidebar — bidirectional audio replaces the current SSE chat. The voice-UX differentiator for the SaaS pitch.
What's next
Top-3 vendor-enabled capabilities coming soon
Sourced from the vendor's playbook. Each entry is mapped to its delivery phase and the value it unlocks.
Live API (voice-driven chat)
Phase 2.5
Voice UX differentiator vs single-LLM chatbot competitors; replaces SSE with bidirectional audio.
ParallelAgent migration of sentiment fan-out
Phase 2.5
Halves orchestrator wall time on independent fetches; deterministic for evals.
Vertex Agent Engine (managed deployment)
Phase 2.5
Alternative to Cloud Run with native Phoenix trace integration + managed scaling.