AI that thinks about
the planet

GreenInfer routes every prompt to the most energy-efficient model capable of answering it accurately, cutting AI compute energy by up to 97% vs always using the largest model.

Try GreenInfer How it works

97%

Max Energy Saved

73%

Avg Reduction

98.9%

Classifier Accuracy

183g

CO₂ Avoided

Pipeline

How GreenInfer works

Every prompt passes through a 7-layer orchestration pipeline that scores complexity, optimizes tokens, and routes to the right model tier before a single GPU cycle runs.

📝

Prompt In

User sends query

→

🔬

Complexity Score

0 to 100 via DistilBERT

→

⚡

Prompt Optimize

T5 removes filler tokens

→

🌎

Carbon Route

ERCOT grid check

→

🎯

Model Route

Small / Medium / Large

→

✓

Answer

With energy metrics

Small TierEco

0.9 mWh

Llama 3.2 1B · Simple queries · 55% of traffic

Medium TierBalanced

3.8 mWh

Llama 3.1 8B · Reasoning · 30% of traffic

Large TierHeavy

48.0 mWh

Llama 3.3 70B · Complex / Code · 15% of traffic

Routing examples

"What is the capital of France?"→

Small0.9 mWh · -98%

"Explain transformer attention mechanisms"→

Medium3.8 mWh · -92%

"Build a REST API with JWT authentication in Python"→

Large48 mWh · baseline

Features

Built for sustainable inference

Every layer reduces unnecessary compute while preserving answer quality.

🔬

Complexity Scoring

DistilBERT classifier trained on 600 labeled examples rates every prompt 0 to 100. 98.9% validation accuracy across 4 tiers.

⚡

T5 Prompt Optimizer

Silently rewrites prompts to remove filler words before inference, cutting input tokens by an average of 35% with no quality loss.

🌎

Carbon-Aware Routing

Uses hourly ERCOT grid intensity estimates to defer expensive queries when the grid is running dirty, reducing CO₂ further.

🤺

Cascade Engine

Starts with the smallest model and only escalates to medium or large if confidence is below threshold, inspired by FrugalGPT.

💡

Smart Preview Mode

For complex queries, shows a summary and outline first. User confirms before the full expensive response runs, preventing wasted reruns.

📊

Per-Response Metrics

Every answer shows energy used in mWh, CO₂ emitted, tokens saved, and a Green Efficiency Score 0 to 100 with an improvement tip.

Benchmark Results

Real numbers from real experiments

All figures from running the benchmark suite on 20 prompts across complexity tiers. Source: experiments/benchmark.py

97%

Energy saved (small-tier queries)

vs always using Llama 3.3 70B

73%

Average energy reduction

mixed workload, all query types

98.9%

Classifier accuracy

DistilBERT, 600 training examples

183g

CO₂ avoided

20-prompt benchmark test set

55%

Routing accuracy

matching human-labeled tier

35%

Avg token reduction

via T5 prompt optimizer

Python Framework

Drop-in green inference

Import the package and get energy-aware responses in three lines.

Python — greeninfer package

from greeninfer import GreenInfer # Auto-routes to the right model tier gi = GreenInfer() result = gi.chat("Explain photosynthesis") print(result.response) # "Photosynthesis is the process by which..." print(result.energy_mwh) # 0.9 (small model, 98% less than large) print(result.model_tier) # "small" print(result.co2_grams) # 0.000178 # Switch modes result = gi.chat("Write a sorting algorithm", mode="eco") result = gi.chat("Analyze this legal contract...", mode="performance")

Full Framework Docs

AI that thinks about the planet

How GreenInfer works

Built for sustainable inference

Real numbers from real experiments

Drop-in green inference

Ready to chat green?

AI that thinks about
the planet