GreenInfer - Framework Architecture

System Design

The 7-Layer Pipeline

Each prompt passes through all layers sequentially before a model is invoked.

💬

User Interface Layer

Web-based chat interface with real-time environmental metrics display

Frontend

↓

✂️

Prompt Optimization Layer

T5-based model compresses tokens 30–50% while preserving semantic meaning - hosted on Hugging Face

GreenPromptsOptimizer

↓

🧠

Task Complexity Analyzer

Estimates complexity 0–100 via linguistic entropy, token length, and task classification (factual / reasoning / code / creative)

Custom Model

↓

⚡

Sustainability-Aware Orchestrator

Core engine: evaluates energy-accuracy tradeoffs and routes to the minimum-viable model tier. Integrates RL policy for adaptive routing.

Core

↓

🤖

Model Pool

Dynamic registry of Small / Medium / Large models. Model-agnostic - works with chatbots, code gen, NLP pipelines, and more.

Agnostic

↓

🌿

Energy & Carbon Estimation Module

Calculates energy (FLOPs × token count) and CO₂ emissions (grid intensity × kWh) per inference. Uses CodeCarbon for benchmarking.

Metrics

↓

📊

Feedback & Logging Layer

Logs all decisions and metrics. Provides data for RL training, Pareto visualization, and cumulative sustainability reporting.

Analytics

Deep Dive

Component Specifications

✂️

Prompt Optimizer

GreenPromptsOptimizer

Fine-tuned T5 encoder-decoder model trained on prompt compression pairs
Removes filler words, redundant context, and verbose phrasing
Preserves semantic intent - complexity score guards against over-compression
Achieves 30–50% token reduction across diverse prompt types
Hosted on Hugging Face Hub for API access
Input: raw user prompt → Output: optimized prompt + token delta

🧠

Complexity Scorer

Custom Classifier

Scores prompts 0–100 based on estimated reasoning demand
Features: token length, Shannon entropy, task type, syntactic depth
Task classes: factual, summarization, reasoning, coding, creative
Prevents over-routing simple queries to expensive models
Prevents under-routing complex prompts to models that can't handle them
Output feeds directly into the orchestration routing decision

⚡

Orchestration Engine

Core Framework

Multi-objective optimization: accuracy, latency, and carbon intensity
Three routing modes: Eco (minimize energy), Balanced (default), Performance
Carbon-aware: integrates real-time ERCOT grid carbon intensity data
Carbon budget enforcement: per-session CO₂ quota (default 5g)
Adaptive thresholds informed by empirical energy-performance benchmarking
Model-agnostic: integrates any model via standard API or local inference

🔄

Reinforcement Learning Router

Adaptive

Policy gradient agent learns optimal routing from historical data
State: complexity score, task type, grid intensity, session budget
Action: select model tier (small / medium / large)
Continuously improves as usage data accumulates
Replaces static threshold routing over time for better decisions

reward = accuracy − λ × energy

Comparison

GreenInfer vs Standard LLM APIs

Feature	GreenInfer	Standard API (e.g. ChatGPT)
Energy transparency	✓ Per-inference metrics shown	✗ Hidden
Prompt optimization	✓ 30–50% token reduction	✗ None
Model routing	✓ Complexity-based tiering	✗ Always same model
Carbon-aware routing	✓ Real-time grid data	✗ None
Carbon budget control	✓ Per-session quota	✗ None
Open source	✓ Framework is open	✗ Closed
Reinforcement learning	✓ Adaptive routing policy	✗ Static
Model-agnostic	✓ Any LLM, any task type	✗ Vendor-locked

Energy Modeling

How we estimate emissions

Energy is estimated using a combination of token-based proxy metrics and runtime GPU utilization measurements. CO₂ is calculated using real-time grid carbon intensity.

→ Energy ≈ FLOPs × token_count × model_efficiency_factor
→ CO₂ = Energy(kWh) × grid_intensity(gCO₂/kWh)
→ ERCOT average: ~200 gCO₂/kWh (varies by time of day)
→ CodeCarbon library used for empirical GPU benchmarking

            energy_estimator.py
            Framework Module
          

# Energy estimation module
def estimate_energy(tokens, model_tier, grid_intensity):
    """
    Estimate energy and CO2 for an inference call.
    tokens: optimized token count
    model_tier: 'small' | 'medium' | 'large'
    grid_intensity: gCO2/kWh (from ERCOT API)
    """
    ENERGY_COEFFICIENTS = {
        'small':  4e-5,   # Wh per token
        'medium': 1.8e-4,
        'large':  9.5e-4,
    }
    energy_wh = tokens * ENERGY_COEFFICIENTS[model_tier]
    energy_kwh = energy_wh / 1000
    co2_grams = energy_kwh * grid_intensity

    return {
        'energy_wh':  energy_wh,
        'co2_grams': co2_grams,
        'savings_vs_large': calc_savings(tokens, model_tier),
    }

Green OrchestrationFramework Architecture

The 7-Layer Pipeline

Component Specifications

GreenInfer vs Standard LLM APIs

How we estimate emissions

Green Orchestration
Framework Architecture