Building StoxFlow: A Hybrid Local/Cloud AI Agent for Stock Research

Analyzing equities requires pulling data from various sources: fundamental statements, technical price charts, and recent news articles. Doing this manually for every stock ticker takes time. Automating this with LLMs can get expensive quickly if you pipe full news article bodies directly into cloud APIs.

To solve this, I built StoxFlow, an AI stock research agent for Indian equities. It uses a decoupled three-tier architecture (FastAPI, NiceGUI, and LangGraph) and implements a hybrid local/cloud LLM pipeline to process data efficiently.

Here is a breakdown of how it works, the architecture, and how it optimizes API costs.

The Architecture

StoxFlow is divided into three layers:

Frontend (NiceGUI): A Python-based dashboard UI that accepts stock tickers, displays execution steps in real-time, and renders interactive tabs for synthesized analysis reports and raw fundamental tables.
Backend API (FastAPI): A REST API that triggers the research pipeline and exposes structured endpoints.
Execution Engine (LangGraph & LiteLLM): A stateful workflow engine that coordinates ticker resolution, data pre-processing, news crawling, and report compilation.

NiceGUI (Dashboard) ──> FastAPI (API) ──> LangGraph (Agent Pipeline)
                                                 │
                                                 ├── Resolve Company (Local)
                                                 ├── Preprocess Financials & OHLC (Upstox API)
                                                 ├── Scrape & Digest News (Local Ollama)
                                                 └── Synthesize Thesis (Cloud Gemini)

The Pipeline: Under the Hood

The research workflow is modeled as a state-graph using LangGraph. This ensures that the agent follows a predictable sequence of tasks with clear inputs, outputs, and error boundaries:

resolve_company: Resolves a user’s raw query (e.g. “TCS” or “Adani Port”) into a standardized ISIN and Upstox instrument key.
fetch_company_data: Retrieves the company’s financial profile, technical OHLC candle series, and recent news articles concurrently using a ThreadPoolExecutor to speed up API response times.
digest_news: Crawls the full text of recent news articles, extracts financial takeaways, and assigns sentiment and impact scores.
synthesize_report: Compiles the preprocessed financial metrics, technical trends, and digested news summaries into a structured JSON investment thesis.

Token & Cost Optimization: The Hybrid LLM Approach

Using cloud LLMs like Gemini or OpenAI to parse multiple full-text news articles is expensive. Scraped news articles easily span thousands of tokens each, and sending ten articles to a cloud API for summarization leads to massive input token costs.

StoxFlow optimizes this by splitting tasks between local and cloud models:

Local Model (Ollama + Qwen 2.5 3B): Handles high-frequency, extraction-heavy tasks. The news digestion node crawls articles and uses a local instance of qwen2.5:3b to identify relevant facts, filter out noise, score sentiment, and output a compact JSON summary.
Cloud Model (Gemini 2.5 Flash Lite): Handles the final report synthesis. The engine forwards the clean, token-optimized data summaries to the cloud model in a single prompt.

By filtering and structuring raw news text locally before sending it to the cloud, StoxFlow reduces cloud token consumption by over 80% while retaining high-reasoning output for the final investment thesis.

Real-Time Observability with Arize Phoenix

Debugging LLM prompts and agent state transitions can be difficult without transparency. StoxFlow integrates Arize Phoenix using OpenInference instrumentation to track execution telemetry.

Running the stack automatically launches a local Phoenix server. It records:

Node Latencies: Exactly how long each step of the LangGraph took to execute.
Full Prompt Logs: The exact system and user prompts sent to local and cloud LLMs, along with the raw completions.
Token Usage: Telemetry on token counts across local and cloud calls, making cost tracking simple.

Storing Structured Output

When the agent finishes running, it outputs a clean dashboard view and writes the complete analysis to a structured JSON file under reports/research_{TICKER}.json.

Each report contains:

Fundamentals Analysis: Summaries of key ratios, shareholding patterns, and balance sheet observations.
Price Trend Analysis: Calculation of the 52-week price range and relation to the 40-Week SMA (a macro trend proxy).
Recent News Events: Categorized by sentiment and market impact.
Executive Summary: A multi-paragraph investment thesis detailing opportunities and risks.

Running the Project

If you want to run the project locally:

Install Ollama and download the model:
```
ollama pull qwen2.5:3b
```
Configure your keys in a .env file (requires an Upstox API Analytics Token and a Google Gemini API Key):
```
UPSTOX_API_KEY="your_upstox_analytics_token"
GOOGLE_API_KEY="your_gemini_api_key"
```

Install python requirements and execute:

pip install -r requirements.txt
python run.py

Check out the full codebase and configuration instructions in the stoxflow-agent repository.

Building StoxFlow: A Hybrid Local/Cloud AI Agent for Stock Research

The Architecture

The Pipeline: Under the Hood

Token & Cost Optimization: The Hybrid LLM Approach

Real-Time Observability with Arize Phoenix

Storing Structured Output

Running the Project

Want to build production-ready AI?

Written by S L Manikanta

Related Articles

LangGraph: Production-Ready Workflow Orchestration