Back to insights
Length:
15 min
Published:
June 9, 2025

Every time we use artificial intelligence - whether for generating or processing text, images or videos - we run energy-intensive processes in the background. These have an impact on the consumption of electricity and even water, which is used directly or indirectly for running and cooling computing devices.
In order to approach AI more responsibly, it is essential to understand what is going on behind the scenes. In doing so, we will not only gain an understanding that it is not an infallible source of truth, but also that it has an impact on the environment.
Our interaction with AI models has a significant resource dependency. But how exactly does our messaging with these models consume electricity, and what's more, water? The answer lies in a few key phrases and phases:
Before the model is ready to generate responses, it goes through a rigorous training process.
Large language models (LLM) learns through self-supervised learning. That is, the models are not trained on data with explicit labels, but learn to predict the next word based on the previous context. This approach allows models to efficiently use large unstructured texts.
The basic architecture of these models is transformers, which allow efficient sequence processing and the capture of long-term dependencies in the text. During training, the model traverses the text, predicts the following words, and adjusts its internal parameters based on errors. This process is performed on large-scale data including texts from the Internet, books, code, and other sources.
Two things are crucial for effective machine learning: good data and enough of it. With this, models learn to mimic human interaction by predicting what word might follow based on the previous context they maintain in their memory, known as the context window.
However, the training is not only energy intensive for computing, but also water intensive. For example, training the GPT-3 model in Microsoft data centres consumed approximately 700,000 litres of drinking water in the form of steam emissions. This amount of water corresponds, for example, to the consumption needed to produce roughly 320 Tesla electric cars.
This water consumption is also due to the need to keep servers in data centres at an optimal temperature. Cooling is often accomplished using water systems, where water absorbs the heat generated by the servers and then evaporates in cooling towers. This process is energy intensive, raising sustainability concerns in the context of the growing use of AI.

The size of large language models (LLMs) is determined by the number of parameters, neurons, which can be thought of as weights in the neural network. These parameters affect the model's ability to capture complex linguistic patterns and relationships. In general, the more parameters a model has, the better its ability to understand and generate natural language.
Small models, for example with several units of billions of parameters, can be run locally on computers. These models are suitable for applications where security, speed of response and lower hardware requirements are important.
On the other hand, the largest and most powerful models, such as LLaMA 3.1 with 405 billion parameters, require specialized hardware and infrastructure. Servers with high VRAM capacity and powerful GPUs are required to run them. For example, running LLaMA 3.1 in fp16 mode (precision weighting, floating point) requires approximately 972 GB VRAM.
These large-scale models are usually hosted in data centres that provide the necessary computing power and infrastructure to run them. Cloud platforms such as Google Cloud Platform or Microsoft Azure offer services for deploying and managing these models, including performance optimisation and scalability.

After entering a prompt, the request is immediately sent to the data centre, where it is forwarded to the dedicated hardware. Each step of input processing and output generation by large language models (LLMs) contributes to the overall energy consumption.
Tokenization - The text input is divided into smaller units called tokens. This process requires computing power, but its energy consumption is relatively low compared to other steps.
Conversion to embeddings - Each token is converted into a numeric vector (embedding) that captures its semantic meaning. This step involves matrix operations and runs on specialized hardware, which contributes to energy consumption.
Model processing (Inference) - Embeddings are further processed by a model architecture, typically based on transformers. This step is the most energy intensive, especially for large models with billions of parameters.
Predicting the next token - The model calculates the probabilities of the following tokens and generates the output. The energy consumption of this step depends on the length of the generated text and the complexity of the model. Models with about 7 billion parameters consume approximately 3-4 Joules per token.
Influence output parameters - Parameters like "temperature" change the nature of the output (deterministic vs. random). They do not directly affect power consumption, but they can affect the length and complexity of text, which indirectly changes the power consumption.
The larger and more complex the model, the more power it requires. Each step - from tokenisation to output generation - runs on dedicated hardware, often on a GPU like the NVIDIA H100.
How difficult are these calculations?
A single NVIDIA H100 (SXM) GPU can consume up to 700 watts. In data centres, these GPUs are plugged into racks, which consume tens of kilowatts. To give you an idea, with 700 Wh energy can be operated:
| Equipment / Activity | Consumption | Operating time with 700 Wh | | --- | --- | --- | | NVIDIA H100 GPU (SXM) | 700 W | 1 hour | | LED bulb | 10 W | 70 hours (approx. 3 days) | | Charging your smartphone | 15 Wh per charge | Approximately 46 charges |
LLM operation generates a significant amount of heat. Therefore, data centres use advanced cooling systems that consume additional energy as well as water. Cooling can account for up to 30-40% the total power consumption of the data centre. The average water consumption for cooling can reach 1.8 litres for each kWh of energy. Some data centres may therefore consume a daily 11-19 million litres of water, which corresponds to the city's consumption with 30,000-50,000 inhabitants.
A single prompt may have low consumption, but with billions of queries per day, the cumulative impact increases significantly. A University of California, Riverside study reports that processing 5-50 AI prompts consumes up to 0.5 litres of water (mostly for cooling).
It is common that reasoning models are significantly more energy intensive than classical models. This is evident, for example, in DeepSeek, where the V3 version is approximately 8 times more efficient than R1, similar to the OpenAI o3 model compared to GPT-4.1. For a more detailed comparison of energy consumption, I recommend reading the following article.
In the image below we can observe a significantly higher token usage of the reasoning mode R1 compared to the classical models.

| Task type | Energy consumption / inquiry | CO2e emissions / query | Comparison of consumption | | --- | --- | --- | --- | | Simple search | ~0.300 Wh | ~0.2 g | Power LED bulb (10 W) for ~2 minutes | | ChatGPT (e.g. GPT-4o) | ~0.421 Wh | ~0.3 g | Power LED bulb for ~2.5-3 minutes | | More energy-intensive models (o3, DeepSeek) | ~23.82 Wh | ~23 g | Charging the mobile ~3-4x |
Energy: approximately 430 MWh per day, which corresponds to the consumption of approximately 14,000 households (average daily consumption of 30 kWh).
CO2 emissions: approximately 300 tonnes per day.
Water consumption: up to 10 million litres a day.


The integration of LLM models (large language models) into digital tools commonly encountered in the online environment is becoming a modern trend. Most major search engines - including Google Search - now use AI models to summarise search results. However, this approach combines the energy intensity of traditional search with the additional computational costs associated with running an LLM. You can read more about this here.
Unlike alternatives such as DuckDuckGo or Ecosia, Google does not currently allow these AI features to be fully disabled, which leads to an increased energy footprint even for simple searches.
The direct correlation between our query and real resource consumption shows that every interaction with AI has an energy, and thus ecological, cost. The International Energy Agency estimates that global electricity consumption by data centers will more than double by 2030 - to approximately 945 terawatt hours (TWh), which is roughly the amount consumed by all of Japan today.
Given these demands, two questions arise: how do data centers actually work, that they consume so much energy and water, and what can we do to minimize this footprint?
The first question, what are data centres, which are at the heart of all these processes, and what specific environmental impacts their operation entails, will be addressed in the next section. The second question will then be addressed in the final article of this trilogy.
Looking to learn more about AI's environmental impact? Check out these related articles:
Back to insights
Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.