Green Orchestration
Framework Architecture

A multi-stage pipeline that combines prompt optimization, complexity analysis, and carbon-aware routing to minimize the energy cost of LLM inference - without sacrificing response quality.

v0.1 Beta Open Source ISM Final Product

The 7-Layer Pipeline

Each prompt passes through all layers sequentially before a model is invoked.

1
💬
User Interface Layer
Web-based chat interface with real-time environmental metrics display
Frontend
2
✂️
Prompt Optimization Layer
T5-based model compresses tokens 30–50% while preserving semantic meaning - hosted on Hugging Face
GreenPromptsOptimizer
3
🧠
Task Complexity Analyzer
Estimates complexity 0–100 via linguistic entropy, token length, and task classification (factual / reasoning / code / creative)
Custom Model
4
Sustainability-Aware Orchestrator
Core engine: evaluates energy-accuracy tradeoffs and routes to the minimum-viable model tier. Integrates RL policy for adaptive routing.
Core
5
🤖
Model Pool
Dynamic registry of Small / Medium / Large models. Model-agnostic - works with chatbots, code gen, NLP pipelines, and more.
Agnostic
6
🌿
Energy & Carbon Estimation Module
Calculates energy (FLOPs × token count) and CO₂ emissions (grid intensity × kWh) per inference. Uses CodeCarbon for benchmarking.
Metrics
7
📊
Feedback & Logging Layer
Logs all decisions and metrics. Provides data for RL training, Pareto visualization, and cumulative sustainability reporting.
Analytics

Component Specifications

✂️
Prompt Optimizer
GreenPromptsOptimizer
  • Fine-tuned T5 encoder-decoder model trained on prompt compression pairs
  • Removes filler words, redundant context, and verbose phrasing
  • Preserves semantic intent - complexity score guards against over-compression
  • Achieves 30–50% token reduction across diverse prompt types
  • Hosted on Hugging Face Hub for API access
  • Input: raw user prompt → Output: optimized prompt + token delta
🧠
Complexity Scorer
Custom Classifier
  • Scores prompts 0–100 based on estimated reasoning demand
  • Features: token length, Shannon entropy, task type, syntactic depth
  • Task classes: factual, summarization, reasoning, coding, creative
  • Prevents over-routing simple queries to expensive models
  • Prevents under-routing complex prompts to models that can't handle them
  • Output feeds directly into the orchestration routing decision
Orchestration Engine
Core Framework
  • Multi-objective optimization: accuracy, latency, and carbon intensity
  • Three routing modes: Eco (minimize energy), Balanced (default), Performance
  • Carbon-aware: integrates real-time ERCOT grid carbon intensity data
  • Carbon budget enforcement: per-session CO₂ quota (default 5g)
  • Adaptive thresholds informed by empirical energy-performance benchmarking
  • Model-agnostic: integrates any model via standard API or local inference
🔄
Reinforcement Learning Router
Adaptive
  • Policy gradient agent learns optimal routing from historical data
  • State: complexity score, task type, grid intensity, session budget
  • Action: select model tier (small / medium / large)
  • Continuously improves as usage data accumulates
  • Replaces static threshold routing over time for better decisions
reward = accuracyλ × energy

GreenInfer vs Standard LLM APIs

Feature GreenInfer Standard API (e.g. ChatGPT)
Energy transparency ✓ Per-inference metrics shown ✗ Hidden
Prompt optimization ✓ 30–50% token reduction ✗ None
Model routing ✓ Complexity-based tiering ✗ Always same model
Carbon-aware routing ✓ Real-time grid data ✗ None
Carbon budget control ✓ Per-session quota ✗ None
Open source ✓ Framework is open ✗ Closed
Reinforcement learning ✓ Adaptive routing policy ✗ Static
Model-agnostic ✓ Any LLM, any task type ✗ Vendor-locked

How we estimate emissions

Energy is estimated using a combination of token-based proxy metrics and runtime GPU utilization measurements. CO₂ is calculated using real-time grid carbon intensity.

  • Energy ≈ FLOPs × token_count × model_efficiency_factor
  • CO₂ = Energy(kWh) × grid_intensity(gCO₂/kWh)
  • ERCOT average: ~200 gCO₂/kWh (varies by time of day)
  • CodeCarbon library used for empirical GPU benchmarking
energy_estimator.py Framework Module
# Energy estimation module
def estimate_energy(tokens, model_tier, grid_intensity):
    """
    Estimate energy and CO2 for an inference call.
    tokens: optimized token count
    model_tier: 'small' | 'medium' | 'large'
    grid_intensity: gCO2/kWh (from ERCOT API)
    """
    ENERGY_COEFFICIENTS = {
        'small':  4e-5,   # Wh per token
        'medium': 1.8e-4,
        'large':  9.5e-4,
    }
    energy_wh = tokens * ENERGY_COEFFICIENTS[model_tier]
    energy_kwh = energy_wh / 1000
    co2_grams = energy_kwh * grid_intensity

    return {
        'energy_wh':  energy_wh,
        'co2_grams': co2_grams,
        'savings_vs_large': calc_savings(tokens, model_tier),
    }