WattBot: Estimating AI Emissions & Costs with Retrieval Augmented Generation (RAG)

Summary: AI systems can consume vast amounts of energy and water, but reliable emissions data remains hard to find and harder to trust. In this challenge, you’ll build a retrieval-augmented generation (RAG) system that extracts credible environmental impact estimates from peer-reviewed sources. Your model must output concise, citation-backed answers—or explicitly indicate when the evidence is missing. The goal: turn scattered academic knowledge into transparent, actionable insights for researchers, engineers, and policy makers.

Method areas: Retrieval‑augmented generation (RAG) workflows and optical character recognition (optional) to better parse figures/tables from PDFs.

Prerequisites: Prior exposure to natural language processing (NLP) and deep learning fundamentals is recommended. No prior experience with large language models (LLMs), RAG, or Hugging Face is required, but you must be willing to learn! See Resources for Getting Started.

Background

The environmental impact of artificial intelligence (AI) is an emerging concern in both industry and academia. As large models grow in popularity, so do their energy and resource requirements—raising questions about carbon emissions, water usage, and sustainability practices across the AI lifecycle. However, the knowledge needed to estimate or compare these impacts is often buried in technical literature, scattered across different domains like machine learning, energy systems, and environmental science. Retrieval-augmented generation (RAG) offers a promising approach to extracting and synthesizing this information from unstructured documents. By combining document retrieval with generative language models, RAG systems can produce factually grounded answers supported by references.

Goal

In this challenge, you’ll use RAG to tackle real sustainability questions, drawing from a curated corpus of over 50 scholarly articles on AI’s environmental impact. Your system should generate responses that include:

a concise natural‑language answer
a numeric estimate and unit (when applicable)
refs – IDs of the cited documents
supporting materials from the retrieved reference(s) (e.g., a verbatim quote, table reference, or figure reference)

If no evidence exists, your system must emit the standardized “unable to answer” fallback. Scores combine retrieval precision, numerical accuracy, and citation faithfulness.

Example questions & answers

Question: What is the estimated CO₂ emissions (in pounds) from training the BERT‑base model for 79 hours on 64 V100 GPUs?
Answer: 1438 lbs. Supporting refID(s): [strubel2019], Supporting materials: [Table 3].
Question: True or False: New AI data centers often rely on air cooling due to high server power densities.
Answer: FALSE. Supporting refID(s): [li2025], Supporting materials: [“In general, new data centers dedicated to AI training often rely on liquid cooling due to the high server power densities.”]
Question: What term refers to the amount of water evaporated, transpired, or incorporated into products, defined as “water withdrawal minus water discharge”?
Answer: Water consumption. Supporting refID(s): [li2025], Supporting materials: [“Water consumption: It is defined as “water withdrawal minus water discharge”, and means the amount of water “evaporated, transpired, incorporated into products or crops, or otherwise removed from the immediate water environment [13].””]

Question complexity

Multi‑document fusion: some queries require combining facts from separate papers (e.g., GPU power specs in one, regional carbon intensity in another).
Visual reasoning: a handful of answers live in figures or tables—OCR or table extraction may be needed.
Unanswerable cases: a small subset is intentionally unsupported; your system must detect these and use a fallback response.

The full dataset will be available at the following URL after the launch date on Sept. 11, 2025: kaggle.com/competitions/WattBot2025

corpus.zip – 50 + sustainability‑focused research papers, model cards, and cloud‑pricing reports (PDFs) you’ll parse—OCR may be needed for some figures and scanned pages.
metadata.csv – document IDs, titles, and full citations.
train_questions.csv – a small, labeled subset of 100+ Q&A pairs to tune your pipeline.
test_questions.csv – a larger set of questions with answers hidden; performance on this file decides the leaderboard.

Required

Version Control with GitHub Desktop: All hackathon attendees must know how to work in Git/GitHub. Click the link if you need a quick refresher (1 hour tutorial) before the hackathon kicks off in September!
Intro to Python: While it’s possible to build RAG pipelines in other languages, Python tends to be the go-to choice. You can work in another language if you find another team member that’s also willing.

Helpful but not strictly required

Intro to Natural Language Processing (NLP): You don’t necessarily need to understand everything in this intro lesson, but it will help to have some background familiarity with NLP
Basic Neural Networks: Some exposure to deep learning concepts is helpful for understanding how LLMs work, but beginners are welcome if paired with experienced teammates.
No prior experience with large language models (LLMs), RAG, or Hugging Face is mandatory, but you must be willing to learn! See Resources for Getting Started.

Intro to Natural Language Processing (NLP): Brush up on NLP basics before diving head-first into RAG pipelines.
Generative AI for Everyone: Learn about generative AI and LLM fundamentals.
Retrieval Augmented Generation (RAG) Workshop: Learn RAG fundamentals (also see a similar workshop from Analytics Vidhya)
LLM University: Crash course on LLMs, text embeddings, and RAG.
Exploring Fact‑Based QA with RAG: Romeo & Juliet: Starter notebook on retrieval+generation
RAG Use Cases on Campus: Video showcase of practical retrieval‑augmented workflows at UW‑Madison
Pytesseract: OCR with Tesseract (LSTM) in Python: Extract text from figures and scanned PDFs

Found a helpful resource during your project? Share it with others through our open-source knowledge base, ML+X Nexus!

The full corpus and Q&A pairs will be available at the following URL after the launch date on Sept. 11,  2025: kaggle.com/competitions/WattBot2025

If you have any questions about participating, please contact the challenge organizer: Chris Endemann (endemann@wisc.edu).