Vendor integration

Google Gemini (ADK)

9 ADK agents · Flash + Flash Lite tiered · Batch + Context Cache stacked · 66% F1 evaluator cost cut

Status

● Live

production · real data flowing

0 agents · 0 tools registered · — tokens today · —% cache hit

Marketing →Admin →Warehouse →API keys →Docs →

Capability matrix

What we use, what's gated, what's planned

The reasoning foundations behind every SentinelHub agent. A two-tier model strategy — Flash for synthesis and orchestration, Flash Lite for narrow classification — powers 0 specialist agents across 0 tools; Batch + Context Cache cut F1 evaluator cost by 66%.

Coming nextPer-tenant cost attribution dashboards land alongside the multi-tenant retrofit; prompt-injection guards and structured-output schemas harden the operator chat surface.

Capability	Status	Phase	Why / How / Note
9 ADK agents (orchestrator + 8 specialists) APP-009	LIVE	Phase 1	Multi-agent topology: orchestrator routes natural-language intent to market_analyst, sentiment, risk, advisor, decision_logger, reflection, chat, briefing. MODEL_FAST (gemini-flash-latest) for synthesis; MODEL_LITE (gemini-3.1-flash-lite-preview) for narrow classification.
Registered tools with strict per-agent allowlist	LIVE	Phase 1	Plain-Python tool callables; ADK derives JSON schemas from type hints + docstrings. Strict per-agent allowlists enforced — only trade_executor can submit_bracket_order, only decision_logger writes to Mongo.
Two-tier model strategy (FAST + LITE) APP-009	LIVE	Phase 1	MODEL_FAST for orchestrator/advisor/chat/briefing; MODEL_LITE for sentiment classification, ticker scoring, evaluator. Module-level constants make model upgrades a one-line edit.
OpenInference auto-instrumentation via Phoenix	LIVE	Phase 1.5.S (S3)	tracing.py wraps every Gemini + tool call; cached_tokens, input_tokens, output_tokens, cost.total_usd land on every span. Phoenix dashboard sees the entire 9-agent graph live.
Batch API path (50% discount)	LIVE	Phase 1.5.S (S3)	evaluators/runner.py: run_evaluation_batch() submits the F1 LLM-as-judge sweep as one Gemini batch job. Falls back to serial if the SDK or model variant doesn't support batches.
Context Cache path (75% on cached input)	LIVE	Phase 1.5.S (S3)	decision_judge.py caches the system instruction + 6-criterion rubric once per run (TTL 1h). cached_tokens visible in Phoenix spans; stacks with batch for the 66% headline F1 cost cut.
text-embedding-004 for F2 vector pipeline	CODE READY	Phase 1.5	768-dim embeddings via gemini-embedding-004 power the Atlas Vector Search index over the reasoning corpus. Asymmetric task types (RETRIEVAL_DOCUMENT/RETRIEVAL_QUERY) wired correctly. Embedder merged; gated on Atlas M10 cluster upgrade for the vector index.
Phoenix-traceable cost + cache instrumentation	CODE READY	Phase 1.5.S (S3)	MODEL_PRICING + CACHED_INPUT_DISCOUNT (0.25) + BATCH_DISCOUNT (0.50) constants in api/routers/arize.py compute genai.cost.total_usd per span. Cost dashboard derives from this. Cost constants live; the /monitoring page widget that aggregates them across the day is the next display layer.
tenant.id span attribute for SaaS billing APP-014	PLANNED	ongoing platform release	Phoenix spans must carry tenant.id so per-tenant cost dashboards (Phoenix + BQ usage logs + Stripe tier) can join. Billing-evidence layer for Phase 2.5.6 unit economics. tracing.py is the wire-in point; gates Phase 2.5.6 SaaS billing.
Per-tenant LLM budget circuit breaker APP-014	PLANNED	Phase 2.0	Reactive enforcement of per-plan token caps using tenant.id span attribute + tenant_metadata in Mongo. Halts a tenant's agent calls when daily/monthly budget is exceeded.
Parallel tool use (function-call fan-out)	PLANNED	Phase 2.5	Gemini supports parallel function calls in a single turn. The orchestrator currently routes sequentially; sentiment + macro + news per-ticker fan-out halves wall time when migrated. Orchestrator latency cut. Deterministic for evals.
Live API (bidirectional voice streaming)	UNDERUSED	Phase 2.5	WebSocket-based audio in / audio out. Replaces the current SSE chat sidebar with voice-driven SaaS interaction — the Phase 2.5 UX differentiator.
Vertex Agent Engine (managed deployment)	UNDERUSED	Phase 2.5	Managed scaling + native Phoenix trace integration. Alternative to Cloud Run; worth evaluating for Phase 2 production posture.
ParallelAgent migration of sentiment fan-out	UNDERUSED	Phase 2.5	ADK's explicit DAG primitive replaces natural-language routing on the deterministic legs (sentiment fan-out, F1 judge harness). Halves orchestrator wall time, deterministic for evals.
Search grounding tool (Google Search built-in)	UNDERUSED	Phase 2.5	Gemini-native real-time news + sentiment beyond our X/Reddit/RSS firehose — particularly for breaking events social hasn't picked up yet.
Code execution tool (built-in Python sandbox)	UNDERUSED	Phase 2.5	Ad-hoc compute without a custom tool. Useful for advisor what-ifs; flagged not for risk paths (we keep deterministic Python in our codebase for the audit trail).
1M-token long-context for full decision-history reasoning	UNDERUSED	Phase 2.5	Pass the entire decision history into a single judge call for cross-decision pattern detection — replaces the paginated reflection-agent rollups.
Thinking models / extended reasoning	UNDERUSED	Phase 3	Higher-cost reasoning chains for divergence diagnosis, calibration interpretation, monthly attribution writeups — anywhere the operator asks 'but why'.
Fine-tuning via Vertex AI	UNDERUSED	Phase 3	Domain-specific Gemini for the strategy advisor — fine-tune on our decision corpus + outcomes. Phase 3, not before ≥1K labeled outcomes.

Live data

Source: phoenix-spans (loading) · As of 17:20:59

Real-time from the Google Gemini (ADK) backend

Every tile below is a live read from the vendor backend via the FastAPI BFF. If a tile shows "—" the backend is unreachable or the metric is not yet wired (no hardcoded numbers — see anti-pattern #2).

Tokens today

ADK agents active

Tools registered

Cache hit rate (%)

Roadmap commitments

Roadmap dependencies

Capabilities enabled by this integration — what is built, what is gated, and why.

APP-009Phase 1

Gemini model strategy — Flash + Flash Lite tiering

Two-tier model selection (FAST for synthesis/chat, LITE for narrow classification) is the cost-discipline doctrine for the 9-agent topology. Module-level constants make upgrades atomic.

APP-014Phase 2.0

Per-tenant LLM budget circuit breaker

Phoenix tenant.id span attribute is the billing-evidence layer; per-plan token caps enforced reactively from tenant_metadata in Mongo.

2.0.1.4ongoing platform release

tracing.py wires tenant.id resource attribute on every span

Gates Phase 2.5.6 SaaS billing — without tenant.id on every Gemini span, per-tenant unit economics is unauditable.

Demo flow

End-to-end showcase journey

Five steps a judge or investor can replay live. Each step links to the page that demonstrates it.

1
POST /judge/run?mode=serial — see the span land in Phoenix with `genai.usage.input_tokens` + `genai.cost.total_usd` populated. Baseline cost. → Open /monitoring
2
POST /judge/run?mode=batch — batch span shows `genai.usage.cached_tokens` populated + `evaluator.batch.size` + the 50% batch discount applied on top of the 75% cache discount. → Open /monitoring
3
Open /monitoring. Cost-burn widget shows daily spend split by model (FAST vs LITE) — derived from MODEL_PRICING constants in api/routers/arize.py. → Open /monitoring
4
(Phase 2.5) Per-tenant cost dashboard composed of Phoenix tenant.id spans + BQ usage logs + Stripe subscription tier — single source of truth for SaaS unit economics.
5
(Phase 3) Live API voice-driven chat sidebar — bidirectional audio replaces the current SSE chat. The voice-UX differentiator for the SaaS pitch.

What's next

Top-3 vendor-enabled capabilities coming soon

Sourced from the vendor's playbook. Each entry is mapped to its delivery phase and the value it unlocks.

Live API (voice-driven chat)

Phase 2.5

Voice UX differentiator vs single-LLM chatbot competitors; replaces SSE with bidirectional audio.

ParallelAgent migration of sentiment fan-out

Phase 2.5

Halves orchestrator wall time on independent fetches; deterministic for evals.

Vertex Agent Engine (managed deployment)

Phase 2.5

Alternative to Cloud Run with native Phoenix trace integration + managed scaling.

Google Gemini (ADK)

What we use, what's gated, what's planned

Real-time from the Google Gemini (ADK) backend

Roadmap dependencies

Gemini model strategy — Flash + Flash Lite tiering

Per-tenant LLM budget circuit breaker

tracing.py wires tenant.id resource attribute on every span

End-to-end showcase journey

Top-3 vendor-enabled capabilities coming soon

Live API (voice-driven chat)

ParallelAgent migration of sentiment fan-out

Vertex Agent Engine (managed deployment)

AI Strategy Advisor