GreenInfer — About the Project

The Problem

AI's hidden energy cost

Training and running large language models consumes enormous amounts of electricity. A single ChatGPT query uses roughly 10x the energy of a Google search, and the gap keeps growing as models get larger and usage scales globally.

The core inefficiency is that every query — whether "hi" or "design a distributed system" — gets routed to the same massive model. There is no intelligence in the routing, no awareness of energy cost, and no attempt at optimization.

GreenInfer tackles this at the infrastructure level, before a single GPU cycle is burned on inference.

The Opportunity

70%+ savings available today

Research shows that 60 to 70 percent of real-world AI queries are simple enough to be handled by small, efficient models, if the routing decision is made correctly. Papers like FrugalGPT back this up empirically.

Combining smart routing with prompt optimization and carbon-aware scheduling creates a compounding effect where savings from each layer stack together.

The models, tools, and APIs exist today. GreenInfer integrates them into a unified, developer-friendly framework.

Project Status

Build roadmap

Component	Description	Status
GreenPromptsOptimizer	T5-based prompt compression, fine-tuned on 127+ prompt pairs, hosted on Hugging Face	✓ Complete
Complexity Scorer	Rule-based linguistic scorer using entropy, token length, and task classification	✓ Complete
DistilBERT Classifier	Fine-tuned on 600 labeled prompts, 98.9% accuracy on validation set	✓ Complete
Orchestration Engine	Core routing logic combining complexity, energy, mode, and carbon budget	✓ Complete
Cascade Inference	Small→Medium→Large escalation with confidence-based gating	✓ Complete
Energy Estimator	Token-based proxy estimation with CodeCarbon integration	✓ Complete
Website	Full multi-page site: landing, chat UI, framework docs, impact, about	✓ Complete
HF Space Backend	FastAPI server with CORS, optimizer, Groq cascade, carbon metrics	✓ Live
Carbon-Aware Routing	ERCOT real-time grid intensity for dynamic mode adjustment	⌛ In Progress
User Accounts	Sign up, login, persistent chat history via Supabase	⌛ In Progress
RL Router	Policy gradient agent learning optimal routing from history	→ Planned

The Team

Researcher & Mentor

🧑‍💻

Srinesh Toranala

Student Researcher · ISM Program, Frisco ISD

Frisco ISD student building GreenInfer as an ISM final project. GreenInfer brings together a year of research in AI efficiency, energy systems, and sustainable computing. Previously built GreenPromptsOptimizer, a T5-based prompt compression model that forms the first layer of the GreenInfer pipeline. The project spans model training, framework engineering, backend deployment, and public product launch — built to make green AI genuinely accessible to developers everywhere.

PythonPyTorchTransformersFastAPIHugging FaceGreen AI

🎓

Marta Adamska

PhD Candidate · University of Lancaster · Mentor

PhD Candidate at the University of Lancaster researching AI systems, sustainability, and computational efficiency. Ms. Adamska's expertise in sustainable computing has been deeply instrumental to this project, providing the research direction, key papers, and guidance that shaped GreenInfer from an early idea into a working framework. Her mentorship and technical depth pushed the rigor of this work throughout the research process.

ISM Program

Skills demonstrated

🧠

AI Research

Literature review spanning energy-efficient inference, cascading, and sustainable agent design

💻

ML Engineering

Fine-tuning T5 and DistilBERT, building classifiers, and production-grade inference pipelines

⚡

Systems Design

Multi-layer pipeline balancing accuracy, latency, and real-time carbon intensity

🌐

Full-Stack Dev

HTML/CSS/JS frontend, FastAPI backend on HF Spaces, Groq API integration

📊

Data Analysis

Benchmarking, Pareto frontier analysis, energy comparison, statistical evaluation

🌿

Sustainability

Carbon-aware computing, ERCOT grid modeling, per-session CO₂ budget enforcement

Research Foundation

Key references

Papers recommended by Ms. Adamska and reviewed during research. These directly shaped the design decisions in GreenInfer.

Towards Greener LLMs

2024 · arXiv:2403.20306 · Core motivation for energy-aware LLM deployment

arxiv.org/pdf/2403.20306

FrugalGPT: Reducing LLM Cost and Improving Performance

Chen et al. (2023) · Stanford · Core inspiration for cascade routing architecture

Demonstrated cascading smaller models saves 40-90% of compute cost

Budget ML Agent

ACM · dl.acm.org/doi/full/10.1145/3703412.3703416 · Cost-aware agent design adapted for energy budgeting

dl.acm.org/doi/full/10.1145/3703412.3703416

EnergAgent: Energy-Aware Agent Framework

GI · Direct inspiration for energy-aware routing decisions

dl.gi.de/items/4ee3b7d1-80a3-46c8-9eb6-26985eb607ab

How Hungry is AI?

2025 · arXiv:2505.09598 · Empirical energy consumption numbers for LLMs

arxiv.org/pdf/2505.09598

DynamoLLM: Dynamic LLM Serving

arXiv:2408.00742 · Energy-aware LLM serving; cascading model ideas

arxiv.org/abs/2408.00742

Energy and Policy Considerations for Deep Learning in NLP

Strubell et al. (2019) · ACL · Foundational paper quantifying carbon cost of model training

CodeCarbon: Estimating the Carbon Footprint of Computation

Lacoste et al. (2019) · NeurIPS Workshop · Energy benchmarking library integrated in framework

Get Involved

Open source. Open science.

GreenInfer is built to be shared. The framework is on GitHub for any developer to use, extend, or build on top of.

Try the Chat Read the Docs

AboutGreenInfer