About
GreenInfer

A research-backed, open-source Green Orchestration Framework for sustainable AI inference. Built as an ISM Final Product to show that AI efficiency and environmental responsibility can go hand in hand.

ISM Final Product Frisco ISD Green AI Research

AI's hidden energy cost

Training and running large language models consumes enormous amounts of electricity. A single ChatGPT query uses roughly 10x the energy of a Google search, and the gap keeps growing as models get larger and usage scales globally.

The core inefficiency is that every query, whether "hi" or "design a distributed system," gets routed to the same massive model. There is no intelligence in the routing, no awareness of energy cost, and no attempt at optimization.

GreenInfer tackles this at the infrastructure level, before a single GPU cycle is burned on inference.

70%+ savings available today

Research shows that 60 to 70 percent of real-world AI queries are simple enough to be handled by small, efficient models, if the routing decision is made correctly. Papers like FrugalGPT and the cascading LLM literature back this up with empirical results.

Combining smart routing with prompt optimization and carbon-aware scheduling creates a compounding effect where savings from each layer stack together.

The models, tools, and APIs exist today. GreenInfer integrates them into a unified, developer-friendly framework.

Build roadmap

ComponentDescriptionStatus
GreenPromptsOptimizerT5-based prompt compression model, fine-tuned on 127+ prompt pairs, hosted on Hugging Face✓ COMPLETE
Complexity ScorerRule-based linguistic scorer using Shannon entropy, token length, and task classification signals✓ COMPLETE
Complexity ClassifierDistilBERT fine-tuned on 600 labeled prompts achieving 98.9% accuracy on the validation set✓ COMPLETE
Orchestration EngineCore routing logic combining complexity, energy estimates, mode, and carbon budget into model selection✓ COMPLETE
Cascade InferenceSmall to medium to large escalation with confidence-based gating between tiers✓ COMPLETE
Energy EstimatorToken-based proxy estimation with CodeCarbon integration for local GPU measurement✓ COMPLETE
WebsiteFull multi-page site: landing, chat UI, framework docs, impact dashboard, about✓ COMPLETE
HF Space BackendFastAPI server with CORS, optimizer loading, Groq cascade, and carbon metrics endpoint✓ LIVE
Carbon-Aware RoutingERCOT real-time grid carbon intensity integration for dynamic mode adjustment⌛ IN PROGRESS
User AccountsSign up, login, and persistent chat history per user⌛ IN PROGRESS
RL RouterPolicy gradient agent learning optimal routing from historical accuracy and energy logs→ PLANNED
Open Source ReleaseFramework published to PyPI with full documentation for developer adoption→ FINAL

Researcher & Mentor

🧑‍💻
Srinesh Toranala
Student Researcher  ·  ISM Program, Frisco ISD

Frisco ISD student building GreenInfer as an Independent Study and Mentorship final project. The project brings together a year of research in AI efficiency, energy systems, and sustainable computing. Srinesh previously built GreenPromptsOptimizer, a T5-based prompt compression model that now forms the first layer of the GreenInfer pipeline. GreenInfer represents a full-stack research effort spanning model training, framework engineering, backend deployment, and public product launch, built with the goal of making green AI genuinely accessible to developers everywhere.

PythonPyTorchTransformersFastAPIHugging FaceGreen AI
🎓
Marta Adamska
PhD Candidate  ·  University of Lancaster  ·  Mentor

Marta Adamska is a PhD Candidate at the University of Lancaster whose research sits at the intersection of AI systems, sustainability, and computational efficiency. Ms. Adamska's expertise in sustainable computing has been deeply instrumental to this project, providing the research direction, key papers, and guidance that shaped GreenInfer from an early idea into a working framework. I am grateful for her continued mentorship and for pushing the technical depth of this work throughout the research process.

Skills demonstrated

🧠
AI Research
Literature review spanning energy-efficient inference, model cascading, LLM serving, and sustainable agent design
💻
ML Engineering
Fine-tuning T5 and DistilBERT, building complexity classifiers, and writing production-grade inference pipelines in PyTorch
Systems Design
Multi-layer pipeline balancing accuracy, latency, and real-time carbon intensity with cascade and budget enforcement
🌐
Full-Stack Dev
HTML/CSS/JS frontend, FastAPI backend on Hugging Face Spaces, and Groq API integration for real AI responses
📊
Data Analysis
Empirical benchmarking, Pareto frontier analysis, energy comparison experiments, and statistical model evaluation
🌿
Sustainability
Carbon-aware computing, grid intensity modeling, ERCOT data integration, and per-session CO2 budget enforcement

Key references

Papers recommended by Ms. Adamska and reviewed during research. These directly shaped the design decisions in GreenInfer.

Towards Greener LLMs
2024  ·  arXiv:2403.20306  ·  Core motivation for energy-aware LLM deployment
arxiv.org/pdf/2403.20306
CATP-LLM: Cost-Aware Task Planning for LLMs
System design reference  ·  Models execution time and memory; adapted here for energy using NVML
Informed the orchestrator architecture and energy budget enforcement design
Budget ML Agent
ACM  ·  dl.acm.org/doi/full/10.1145/3703412.3703416  ·  Cost-aware agent design adapted for energy budgeting
dl.acm.org/doi/full/10.1145/3703412.3703416
EnergAgent: Energy-Aware Agent Framework
GI  ·  dl.gi.de/items/4ee3b7d1-80a3-46c8-9eb6-26985eb607ab  ·  Direct inspiration for energy-aware routing decisions
dl.gi.de/items/4ee3b7d1-80a3-46c8-9eb6-26985eb607ab
Sustainable Web Agents
2025  ·  arXiv:2511.04481  ·  Energy sustainability in deployed AI agents
arxiv.org/pdf/2511.04481
Cost of Dynamic Reasoning
2025  ·  arXiv:2506.04301  ·  Referenced in problem motivation; analyzes compute cost of chain-of-thought reasoning
arxiv.org/pdf/2506.04301
How Hungry is AI?
2025  ·  arXiv:2505.09598  ·  Empirical energy consumption numbers for LLMs and coding agents
arxiv.org/pdf/2505.09598
GreenMyLLM
2024  ·  arXiv:2411.11892  ·  Rough energy consumption benchmarks referenced in motivation
arxiv.org/pdf/2411.11892
Green-Code: Energy-Aware Coding Agent
IEEE  ·  ieeexplore.ieee.org/document/11044793  ·  Shows energy-aware design in coding-focused agents
ieeexplore.ieee.org/document/11044793
DynamoLLM: Dynamic LLM Serving
arXiv:2408.00742  ·  Energy-aware LLM serving; cascading model ideas
arxiv.org/abs/2408.00742
Efficiency-Oriented Work in Agents
2025  ·  arXiv:2601.14192  ·  Survey of efficiency-focused approaches in agent systems
arxiv.org/pdf/2601.14192
FrugalGPT: Reducing LLM Cost and Improving Performance
Chen et al. (2023)  ·  Stanford  ·  Core inspiration for the cascade routing architecture
Demonstrated that cascading smaller models before escalating saves 40-90% of compute cost
Energy and Policy Considerations for Deep Learning in NLP
Strubell et al. (2019)  ·  ACL  ·  Foundational paper quantifying the carbon cost of large model training
CodeCarbon: Estimating the Carbon Footprint of Computation
Lacoste et al. (2019)  ·  NeurIPS Workshop  ·  Energy benchmarking library integrated in the framework

Open source. Open science.

GreenInfer is built to be shared. The framework is on GitHub for any developer to use, extend, or build on top of.

Try the Chatbot Read the Docs