Vendor integration

Google Gemini (ADK)

9 ADK agents · Flash + Flash Lite tiered · Batch + Context Cache stacked · 66% F1 evaluator cost cut

Status

● Live

production · real data flowing

0 agents · 0 tools registered · — tokens today · —% cache hit

Capability matrix

What we use, what's gated, what's planned

The reasoning foundations behind every SentinelHub agent. A two-tier model strategy — Flash for synthesis and orchestration, Flash Lite for narrow classification — powers 0 specialist agents across 0 tools; Batch + Context Cache cut F1 evaluator cost by 66%.

Coming nextPer-tenant cost attribution dashboards land alongside the multi-tenant retrofit; prompt-injection guards and structured-output schemas harden the operator chat surface.

CapabilityStatusPhaseWhy / How / Note

9 ADK agents (orchestrator + 8 specialists)

APP-009

LIVEPhase 1

Multi-agent topology: orchestrator routes natural-language intent to market_analyst, sentiment, risk, advisor, decision_logger, reflection, chat, briefing. MODEL_FAST (gemini-flash-latest) for synthesis; MODEL_LITE (gemini-3.1-flash-lite-preview) for narrow classification.

Registered tools with strict per-agent allowlist

LIVEPhase 1

Plain-Python tool callables; ADK derives JSON schemas from type hints + docstrings. Strict per-agent allowlists enforced — only trade_executor can submit_bracket_order, only decision_logger writes to Mongo.

Two-tier model strategy (FAST + LITE)

APP-009

LIVEPhase 1

MODEL_FAST for orchestrator/advisor/chat/briefing; MODEL_LITE for sentiment classification, ticker scoring, evaluator. Module-level constants make model upgrades a one-line edit.

OpenInference auto-instrumentation via Phoenix

LIVEPhase 1.5.S (S3)

tracing.py wraps every Gemini + tool call; cached_tokens, input_tokens, output_tokens, cost.total_usd land on every span. Phoenix dashboard sees the entire 9-agent graph live.

Batch API path (50% discount)

LIVEPhase 1.5.S (S3)

evaluators/runner.py: run_evaluation_batch() submits the F1 LLM-as-judge sweep as one Gemini batch job. Falls back to serial if the SDK or model variant doesn't support batches.

Context Cache path (75% on cached input)

LIVEPhase 1.5.S (S3)

decision_judge.py caches the system instruction + 6-criterion rubric once per run (TTL 1h). cached_tokens visible in Phoenix spans; stacks with batch for the 66% headline F1 cost cut.

text-embedding-004 for F2 vector pipeline

CODE READYPhase 1.5

768-dim embeddings via gemini-embedding-004 power the Atlas Vector Search index over the reasoning corpus. Asymmetric task types (RETRIEVAL_DOCUMENT/RETRIEVAL_QUERY) wired correctly.

Embedder merged; gated on Atlas M10 cluster upgrade for the vector index.

Phoenix-traceable cost + cache instrumentation

CODE READYPhase 1.5.S (S3)

MODEL_PRICING + CACHED_INPUT_DISCOUNT (0.25) + BATCH_DISCOUNT (0.50) constants in api/routers/arize.py compute genai.cost.total_usd per span. Cost dashboard derives from this.

Cost constants live; the /monitoring page widget that aggregates them across the day is the next display layer.

tenant.id span attribute for SaaS billing

APP-014

PLANNEDongoing platform release

Phoenix spans must carry tenant.id so per-tenant cost dashboards (Phoenix + BQ usage logs + Stripe tier) can join. Billing-evidence layer for Phase 2.5.6 unit economics.

tracing.py is the wire-in point; gates Phase 2.5.6 SaaS billing.

Per-tenant LLM budget circuit breaker

APP-014

PLANNEDPhase 2.0

Reactive enforcement of per-plan token caps using tenant.id span attribute + tenant_metadata in Mongo. Halts a tenant's agent calls when daily/monthly budget is exceeded.

Parallel tool use (function-call fan-out)

PLANNEDPhase 2.5

Gemini supports parallel function calls in a single turn. The orchestrator currently routes sequentially; sentiment + macro + news per-ticker fan-out halves wall time when migrated.

Orchestrator latency cut. Deterministic for evals.

Live API (bidirectional voice streaming)

UNDERUSEDPhase 2.5

WebSocket-based audio in / audio out. Replaces the current SSE chat sidebar with voice-driven SaaS interaction — the Phase 2.5 UX differentiator.

Vertex Agent Engine (managed deployment)

UNDERUSEDPhase 2.5

Managed scaling + native Phoenix trace integration. Alternative to Cloud Run; worth evaluating for Phase 2 production posture.

ParallelAgent migration of sentiment fan-out

UNDERUSEDPhase 2.5

ADK's explicit DAG primitive replaces natural-language routing on the deterministic legs (sentiment fan-out, F1 judge harness). Halves orchestrator wall time, deterministic for evals.

Search grounding tool (Google Search built-in)

UNDERUSEDPhase 2.5

Gemini-native real-time news + sentiment beyond our X/Reddit/RSS firehose — particularly for breaking events social hasn't picked up yet.

Code execution tool (built-in Python sandbox)

UNDERUSEDPhase 2.5

Ad-hoc compute without a custom tool. Useful for advisor what-ifs; flagged not for risk paths (we keep deterministic Python in our codebase for the audit trail).

1M-token long-context for full decision-history reasoning

UNDERUSEDPhase 2.5

Pass the entire decision history into a single judge call for cross-decision pattern detection — replaces the paginated reflection-agent rollups.

Thinking models / extended reasoning

UNDERUSEDPhase 3

Higher-cost reasoning chains for divergence diagnosis, calibration interpretation, monthly attribution writeups — anywhere the operator asks 'but why'.

Fine-tuning via Vertex AI

UNDERUSEDPhase 3

Domain-specific Gemini for the strategy advisor — fine-tune on our decision corpus + outcomes. Phase 3, not before ≥1K labeled outcomes.

Live data

Source: phoenix-spans (loading) · As of 17:20:59

Real-time from the Google Gemini (ADK) backend

Every tile below is a live read from the vendor backend via the FastAPI BFF. If a tile shows "—" the backend is unreachable or the metric is not yet wired (no hardcoded numbers — see anti-pattern #2).

Tokens today

0

ADK agents active

0

Tools registered

0

Cache hit rate (%)

0

Roadmap commitments

Roadmap dependencies

Capabilities enabled by this integration — what is built, what is gated, and why.

APP-009Phase 1

Gemini model strategy — Flash + Flash Lite tiering

Two-tier model selection (FAST for synthesis/chat, LITE for narrow classification) is the cost-discipline doctrine for the 9-agent topology. Module-level constants make upgrades atomic.

APP-014Phase 2.0

Per-tenant LLM budget circuit breaker

Phoenix tenant.id span attribute is the billing-evidence layer; per-plan token caps enforced reactively from tenant_metadata in Mongo.

2.0.1.4ongoing platform release

tracing.py wires tenant.id resource attribute on every span

Gates Phase 2.5.6 SaaS billing — without tenant.id on every Gemini span, per-tenant unit economics is unauditable.

Demo flow

End-to-end showcase journey

Five steps a judge or investor can replay live. Each step links to the page that demonstrates it.

  1. 1

    POST /judge/run?mode=serial — see the span land in Phoenix with `genai.usage.input_tokens` + `genai.cost.total_usd` populated. Baseline cost. → Open /monitoring

  2. 2

    POST /judge/run?mode=batch — batch span shows `genai.usage.cached_tokens` populated + `evaluator.batch.size` + the 50% batch discount applied on top of the 75% cache discount. → Open /monitoring

  3. 3

    Open /monitoring. Cost-burn widget shows daily spend split by model (FAST vs LITE) — derived from MODEL_PRICING constants in api/routers/arize.py. → Open /monitoring

  4. 4

    (Phase 2.5) Per-tenant cost dashboard composed of Phoenix tenant.id spans + BQ usage logs + Stripe subscription tier — single source of truth for SaaS unit economics.

  5. 5

    (Phase 3) Live API voice-driven chat sidebar — bidirectional audio replaces the current SSE chat. The voice-UX differentiator for the SaaS pitch.

What's next

Top-3 vendor-enabled capabilities coming soon

Sourced from the vendor's playbook. Each entry is mapped to its delivery phase and the value it unlocks.

Live API (voice-driven chat)

Phase 2.5

Voice UX differentiator vs single-LLM chatbot competitors; replaces SSE with bidirectional audio.

ParallelAgent migration of sentiment fan-out

Phase 2.5

Halves orchestrator wall time on independent fetches; deterministic for evals.

Vertex Agent Engine (managed deployment)

Phase 2.5

Alternative to Cloud Run with native Phoenix trace integration + managed scaling.

Live tiles backed by GET /api/gemini/capabilities (Phoenix-span aggregator, 60s cache).

AI Strategy Advisor

agents · tools · Gemini Flash

Welcome to SentinelHub. I'm the AI Strategy Advisor — I coordinate 9 specialist agents to help you analyse markets, build strategies, and manage risk.

Ask me about market conditions, your portfolio, trading strategies, or past performance.

Powered by Gemini Flash · Connects to Alpaca, MongoDB, Phoenix