AI that thinks about
the planet

GreenInfer routes every prompt to the most energy-efficient model capable of answering it accurately, cutting AI compute energy by up to 97% vs always using the largest model.

Try GreenInfer How it works
97%
Max Energy Saved
73%
Avg Reduction
98.9%
Classifier Accuracy
183g
CO₂ Avoided
Pipeline

How GreenInfer works

Every prompt passes through a 7-layer orchestration pipeline that scores complexity, optimizes tokens, and routes to the right model tier before a single GPU cycle runs.

01
📝
Prompt In
User sends query
02
🔬
Complexity Score
0 to 100 via DistilBERT
03
Prompt Optimize
T5 removes filler tokens
04
🌎
Carbon Route
ERCOT grid check
05
🎯
Model Route
Small / Medium / Large
06
Answer
With energy metrics
Small TierEco
0.9 mWh
Llama 3.2 1B  ·  Simple queries  ·  55% of traffic
Medium TierBalanced
3.8 mWh
Llama 3.1 8B  ·  Reasoning  ·  30% of traffic
Large TierHeavy
48.0 mWh
Llama 3.3 70B  ·  Complex / Code  ·  15% of traffic
Routing examples
"What is the capital of France?"
Small0.9 mWh · -98%
"Explain transformer attention mechanisms"
Medium3.8 mWh · -92%
"Build a REST API with JWT authentication in Python"
Large48 mWh · baseline
Features

Built for sustainable inference

Every layer reduces unnecessary compute while preserving answer quality.

🔬
Complexity Scoring
DistilBERT classifier trained on 600 labeled examples rates every prompt 0 to 100. 98.9% validation accuracy across 4 tiers.
T5 Prompt Optimizer
Silently rewrites prompts to remove filler words before inference, cutting input tokens by an average of 35% with no quality loss.
🌎
Carbon-Aware Routing
Uses hourly ERCOT grid intensity estimates to defer expensive queries when the grid is running dirty, reducing CO₂ further.
🤺
Cascade Engine
Starts with the smallest model and only escalates to medium or large if confidence is below threshold, inspired by FrugalGPT.
💡
Smart Preview Mode
For complex queries, shows a summary and outline first. User confirms before the full expensive response runs, preventing wasted reruns.
📊
Per-Response Metrics
Every answer shows energy used in mWh, CO₂ emitted, tokens saved, and a Green Efficiency Score 0 to 100 with an improvement tip.
Benchmark Results

Real numbers from real experiments

All figures from running the benchmark suite on 20 prompts across complexity tiers. Source: experiments/benchmark.py

97%
Energy saved (small-tier queries)
vs always using Llama 3.3 70B
73%
Average energy reduction
mixed workload, all query types
98.9%
Classifier accuracy
DistilBERT, 600 training examples
183g
CO₂ avoided
20-prompt benchmark test set
55%
Routing accuracy
matching human-labeled tier
35%
Avg token reduction
via T5 prompt optimizer
Python Framework

Drop-in green inference

Import the package and get energy-aware responses in three lines.

Python — greeninfer package
from greeninfer import GreenInfer # Auto-routes to the right model tier gi = GreenInfer() result = gi.chat("Explain photosynthesis") print(result.response) # "Photosynthesis is the process by which..." print(result.energy_mwh) # 0.9 (small model, 98% less than large) print(result.model_tier) # "small" print(result.co2_grams) # 0.000178 # Switch modes result = gi.chat("Write a sorting algorithm", mode="eco") result = gi.chat("Analyze this legal contract...", mode="performance")
Full Framework Docs

Ready to chat green?

Every message shows exactly how much energy it used and how much was saved by not using the largest model.