Green Orchestration
Framework
A 7-layer pipeline that intelligently routes AI inference requests to the most energy-efficient model capable of answering accurately. Built with Python, deployed to Hugging Face Spaces, and integrated with Groq's API.
Most AI applications send every query to the largest, most capable model available. This is like driving a semi-truck to pick up a coffee — it works, but it wastes enormous energy. GreenInfer's Green Orchestration Framework (GOF) applies a simple insight from computer science: match resource allocation to task requirements.
The framework is not an AI model itself. It is a routing and optimization layer — analogous to LangChain but focused on sustainability rather than chaining. It sits between the user and the model pool, analyzing each prompt before any expensive inference runs.
The key original contribution is the Smart Preview system: for complex queries, the framework generates a short 2-sentence summary and bullet outline using the small model first. The user confirms before the full expensive response runs, preventing wasted large-model reruns when users want to refine their question.
https://sirenice-greeninfer-backend.hf.space{
"response": "Photosynthesis is the process...",
"model_tier": "small",
"model_name": "llama-3.2-1b-preview",
"complexity_score_100": 18,
"complexity_label": "Low",
"original_tokens": 12,
"tokens_saved": 4,
"reduction_pct": 33,
"energy_mwh": 0.87,
"co2_grams": 0.000172,
"energy_saved_pct": 98,
"cascade_path": ["small"],
"escalations": 0,
"optimizer_used": true
}
git clone https://github.com/srineshtor21-coder/GreenInfer
cd GreenInfer
pip install -r requirements.txt
export GROQ_API_KEY="your_key_here"
from greeninfer import GreenInfer
# Initialize (auto-loads all layers)
gi = GreenInfer()
# Basic chat
result = gi.chat("Explain quantum entanglement simply")
print(result.response) # The answer
print(result.model_tier) # "small" | "medium" | "large"
print(result.energy_mwh) # e.g. 0.87
print(result.co2_grams) # e.g. 0.000172
print(result.tokens_saved) # e.g. 4
print(result.cascade_path) # e.g. ["small"]
# Eco mode (maximum energy savings)
result = gi.chat("Write a sorting algorithm", mode="eco")
# Performance mode (accuracy priority)
result = gi.chat("Analyze this legal document...", mode="performance")
GROQ_MODELS = {
"small": "llama-3.2-1b-preview", # 0.9 mWh/query — 55% of traffic
"medium": "llama-3.1-8b-instant", # 3.8 mWh/query — 30% of traffic
"large": "llama-3.3-70b-versatile", # 48.0 mWh/query — 15% of traffic
"vision": "llama-3.2-11b-vision-preview" # image inputs
}
# Routing thresholds (balanced mode)
ROUTE = {
"small": complexity < 30,
"medium": 30 <= complexity < 65,
"large": complexity >= 65
}
| Feature | GreenInfer | LangChain | FrugalGPT | Default API |
|---|---|---|---|---|
| Energy-aware routing | ✓ | ✗ | ✓ | ✗ |
| Carbon grid integration | ✓ | ✗ | ✗ | ✗ |
| Smart Preview (confirm before gen) | ✓ | ✗ | ✗ | ✗ |
| Token-level energy metrics | ✓ | ✗ | Partial | ✗ |
| Prompt optimizer | ✓ | ✗ | ✗ | ✗ |
| Cascade engine | ✓ | ✗ | ✓ | ✗ |
| Per-response Green Score | ✓ | ✗ | ✗ | ✗ |
| Open source | ✓ | ✓ | ✓ | ✗ |
These numbers come from running the full benchmark suite included in the repository. The 20-prompt test set covers all four complexity tiers: simple factual, explanation, analysis, and code generation.
Note: 55% routing accuracy reflects the challenge of exact-tier matching. The more relevant metric for sustainability is energy savings: even misrouted queries (e.g. routing medium-complexity to small) still save substantial energy while usually producing acceptable answers.