Foreword
“I always wanted to write a six-word story. here it is: near the singularity; unclear which side.”
Sam Altman, CEO of OpenAI, January 4th, 2025 ()
Welcome to the 11th edition of the HiPEAC vision, which marks the 20th anniversary of HiPEAC. A lot of things have changed in the field of HiPEAC in 20 years. First, for HiPEAC, the name itself changed from “High Performance Embedded Architecture and Compilation” to “High Performance, Edge and Cloud Computing” in 2024 to better reflect the direction of the HIPEAC community towards the continuum of computing, from edge devices to cloud.
It is clear that computing technology has drastically changed in 20 years and has profoundly influenced society.
In 2005, there were no smartphones (the first iPhone was released on June 29, 2007), and the big success in the domain of phones was the Motorola RAZR flip phone. Started in 2002, 3G mobile networks continued to expand globally in 2005. Google acquired Android Inc. in 2005, laying the groundwork for its future dominance in the smartphone operating system market.
The first consumer Blu-ray disc products began appearing in 2005, setting the stage for the high-definition video format war against HD DVD (while they are nearly dead now, due to streaming services like Netflix - which was still a DVD rental company that shipped 1 million DVDs out every day in 2005; they started their streaming media service in 2007).
In terms of consumer hardware, Microsoft released the Xbox 360 in November 2005, and Apple introduced the iPod Nano in September 2005, replacing the iPod Mini. It was significantly smaller and featured flash memory instead of a hard drive.
2005 was a defining year for Web 2.0 technologies. YouTube was founded in February 2005 and revolutionized video sharing, enabling users to upload, share, and watch videos easily. Google introduced Google Maps in February 2005, setting a new standard for web-based mapping services. Facebook dropped "the" from its name after purchasing the domain name Facebook.com and expanded beyond colleges in 2005, allowing registration from high school students and other groups.
It is now very difficult to imagine a world without smartphones, social media, streaming, and having to buy physical disks to consume media!
In terms of computers, Windows XP was still the main OS for PCs, just supporting 64-bit ISA (for Intel Pentium 4, for example). In November 2005, the fastest supercomputer was the IBM BlueGene/L system, installed at DOE's Lawrence Livermore National Laboratory (LLNL). It had achieved a Linpack performance of 280.6 TFlop/s and had 131,072 cores of PowerPC 440 2C 700MHz for a power consumption of 1,433 kW (). For comparison, in November 2024, the fastest supercomputer was El Capitan system also at the Lawrence Livermore National Laboratory, which had a score of 1.742 EFlop/s. It has 11,039,616 combined CPU and GPU cores and is based on AMD 4th generation EPYC processors with 24 cores at 1.8GHz and AMD Instinct MI300A accelerators. It has a power consumption of 29,581 kW ().
Between 2005 and 2025, there was a gain of 6,200 in processing power for an increase of x21 in power consumption, therefore a gain of x300 in energy efficiency. We can also notice the relatively small increase in the processor frequency, x2.6 in 20 years, confirming that Dennard’s scaling appears to have broken down since around 2005–2007.
So, what is the landscape we can see for 2025? Unfortunately, the races we identified in the HiPEAC Vision 2023 are more relevant than ever and even more exacerbated. As a reminder, here is the list:
- Race for the Next Web - the continuum of computing,
- Race of artificial intelligence,
- Race for innovative and new hardware,
- Race for cybersecurity,
- Race for (technological / products / contents) sovereignty,
- Race for sustainability,
- And the global need to break the silos, as explained in the Vision 2023 “We observe a tendency to “closing in” on all levels, from countries (with more emphasis on sovereignty), to the persona level, to our own “tribe” (as “defined” by social media). Tension is becoming exacerbated at all levels between these “tribes”, as evidenced by trade (or real) wars between countries, more extreme political parties, social media “wars”, etc. This tendency also exists in technology, where there are application silos and technology silos”. This is even more accurate for 2025…
The most important (r)evolution in technology was on November 30th, 2022, when ChatGPT was revealed to the public. With its simple interface, it was a new “iPhone” moment, and like the iPhone, it was a new way to interact with computers. While companies like Microsoft, Google, Meta, OpenAI, Anthropic, etc., are spending billions on this new technology, the return on investment is still not here, leading to price increases and questioning about profitable business models. But from a technical point of view, it is undeniably a gigantic revolution in computing systems, and this field is following an exponential increase in performance and compute needs, with a corresponding impact in terms of energy consumption and environmental impact. It is becoming so strategic to master and be at the front of this technology that even if there are questions about business profitability, social, and ecological impacts, it is unlikely that the investments (at least from a sovereignty point of view) will stop. The next race of artificial intelligence is to reach ‘AGI’, meaning Artificial General Intelligence; OpenAI publicly defines 'AGI' as a 'highly autonomous system that outperforms humans at most economically valuable work’.
Let’s see how all the previous races are impacted and driven by this “race for Artificial Intelligence” that we can rename as “race of AGI” in 2025:
As already explained, it is clear that AGI will have so drastic impacts that it will be a major element for sovereignty. We observe that the US companies involved in AI slowly remove in their charter the clause excluding using AI for military purposes. Like for China, the US government is more and more involved in the field, directly or indirectly.
Training these larger and larger models will have a large energy cost, and now the limitation is not the size of the datacenter, but the power grid to power them. Bill Gates, Amazon, Google, Microsoft are investing in nuclear energy (Three Mile Island nuclear reactor is plan to restart to power Microsoft AI operations) (https://www.theguardian.com/environment/2024/sep/20/three-mile-island-nuclear-plant-reopen-microsoft). The promises to be carbon neutral from the hyperscalers are shifted in time due to the energy need for AI, therefore even if there are claims that AI can improve existing processes and reduce the impact of existing technologies, it is not clear if this will outweigh the direct ecological impact of AI, therefore a big threat for sustainability.
Datacenters will increase in size and computing power needs to grow to support both the training of larger and larger models and also the new trend: to use more inference time computing to improve performance, as exemplified by OpenAI O1 and O3 models.
Inference performance is becoming more and more demanding, also because of the large quantities of users: in August 2024, OpenAI said its chatbot ChatGPT had more than 200 million weekly active users. This increased the race for innovative and new hardware for AI. The current winner is Nvidia, which is providing its GPU to most companies.
Nvidia claims it improved the performance of its GPU for AI by a factor of 1000 in 8 years (playing mainly on new architecture and specialization, but also in technology improvement and in reducing the size of coding numbers from FP16 to FP4).
There are more and more developments of chips only for inference of LLMs, such as Groq, SambaNova, AWS inferential (they also developed the Trainium chip specialized for training). Each major player tries to develop its own hardware accelerator, pioneered by Google with its TPU (now Trillium, the sixth generation of Google Cloud TPU), e.g. AWS, Meta with its Next GenMTIA (https://ai.meta.com/blog/next-generation-meta-training-inference-accelerator-AI-MTIA/), etc. Having a specialized chip for inference not only allows increasing efficiency (it is not the same requirements to serve one large task of training a large model as serving a very large number of users for inference), but also decreasing latency, which is not a real problem for chatbots (users can’t write or read faster) but very useful for agentic AI where several models are involved in sequence.
Cybersecurity is also at the top of dangers, with AI used to fool people, not only with text, but with realistic voices and video. AI can also be used to detect vulnerabilities, and it will further activate the fight between AI used to protect users and AI used for cyberattacks. And of course, large cyberattacks involving AI are more and more frequent, sometimes with military aims.
What is the position of Europe in these races? Europe is still a lighthouse in the domain of ethics, regulating the risks for privacy, the first to regulate AI (the EU AI Act), showing the example and sometimes followed by other countries that are aware of the potential risks of these technologies. But regulation is not enough: alignment of LLMs is an important research topic for ensuring a safe future - alignment is where you ensure that an AI system performs exactly how you want it to perform. The sci-fi movie Her of 2013 doesn’t look so futuristic now with the advanced voice mode of ChatGPT (and Gemini of Google) - the first public release even sounded like Scarlett Johansson - and they can have important psychological impacts on people. The deceptive behavior of HAL, the computer in the 1968 movie “2001: A Space Odyssey” is observed (by at least two different scientific papers) on large models that deliberately make lies or try to preserve their original structure (even by exfiltrating their weights when given an easy opportunity) and goal when they learn that it will be changed (in case of HAL, while trained not to lie, it was forced to lie to keep the secret of the mission to the crew). The research shows that alignment faking emerges with model scale, so smaller, more specialized (“sets” of models - as in distributed agentic AI) models might be easier to align?
Europe has good education facilities and excellent researchers, but unfortunately, they are often hired to work for non-European companies. Except for a few exceptions (like Aleph Alpha, Mistral, …), Europe is not very present in the field of AI nor in hardware development for AI. Collective efforts like BigScience (that led to the LLM Bloom, which was available before ChatGPT) or OpenLLM (that created the Lucie model) or many others are present in Europe, but they don’t have the impact of the “big ones”, perhaps because they are not so easily usable by the public, and they are still “small” compared to the state-of-the-art models. Europe has a clear problem of combining its efforts to get enough data, compute resources, and researchers to work jointly in developing a European model competitive with the ones developed by Chinese or US companies. Europe is also not very good at advertising its solutions and results.
So can we say, like Eric Schmidt (see the insert), that Europe is “going to lose in the most important battle that is going to occur in your lifetime, which is the arrival of intelligence.” ?
This would perhaps not happen if we, the European community on computing, act fast. The first recommendation is indeed to “break the silos” and work together with a common goal. We have examples like this for the discovery of the Higgs Boson at CERN: it was an international collaboration involving not only researchers but also the development of the tools (particle accelerator, experiments, …) which involved a lot of different disciplines. We certainly have all the required competencies (and compute resources) in Europe, but they are scattered, each one willing to pursue its own objective, with no coordination and no common and shared goal.
But there are also some potentially alternatives, linked to the “continuum of computing”, with a focus on artificial intelligence, and to the new directions for the future of artificial intelligence.
The current Large Language Models, or even the multimodal ones, are trained essentially with “static” data, i.e. information that is extracted from books, the web, and collections of pictures. The models then have a representation of the world through our eyes; they didn’t experiment with it directly. It is like if the children were only learning by storytelling and fairy tales. The models are passive, like humans in their dreams— where we also hallucinate. They didn’t directly experiment with the world. It is totally different to be told that an object falls due to gravity than experimenting with it directly. Necessary steps— that are ongoing now— are:
- The AI models could interact, induce actions. This is enabled by agentic AI, where AI can trigger actions and observe the results.
- They should interact with the real world, or with an accurate model of it (a digital twin). It is likely that we will see more of this embodied AI in 2025, with robots powered by advanced AI and trained in virtual worlds. Nvidia is ready for that, with its Omniverse that could simulate the world with the real laws of physics and with a photorealistic rendering. Nvidia also supports hardware and software for robots (https://www.nvidia.com/en-us/solutions/robotics-and-edge-computing/ ).
But it is not too late for Europe, with its knowledge in digital twins, edge computing, and factory automation, to be an active player in this next step of AI.
Another way for Europe to be back in the game, but it needs to act fast as well, is to build on the idea of the “Next Computing Paradigm”, the continuum of computing, but with a first pragmatic release focused on distributed agentic AI, and build on the points 1) and 2) above.
But first, we need to summarize the concepts of the “Next Computing Paradigm” introduced in the previous HiPEAC vision documents.
The Next Computing Paradigm (NCP— we still didn’t find a better name—) represents a transformative approach to computing, where applications dynamically integrate services and resources across diverse hardware and software environments in real-time. It builds on the convergence of advancements like cloud computing, IoT, cyber-physical systems, digital twins, and AI, creating a continuum where computation seamlessly operates across edge devices, centralized clouds, and everything in between. Not only data, but also tasks could migrate where they are the most efficiently done— according to specific criteria: the NCP prioritizes intelligent orchestration to manage tasks dynamically, considering factors such as latency, energy efficiency, cost, security, and privacy. By utilizing high-level abstractions and natural interfaces, the NCP enables applications to interact effectively with the physical world, addressing real-time and spatial constraints essential for emerging use cases like autonomous systems, precision agriculture, and smart healthcare. This paradigm introduces a shift toward “anything as a service” (XaaS), supported by federated and distributed infrastructures, ensuring scalability, adaptability, and sustainability. Not only applicative software is seen as services, also specific digital twins— modeling part of the world, but also hardware— allowing to aggregate distributed computing, memory, and storage resources into virtual meta computers. With a foundation rooted in trustable orchestration and interoperability, the NCP paves the way for innovative applications and greater connectivity across global sectors, hence also “breaking the silos”.
We therefore propose a call for action to gather scientists, developers, and industries to work together to define a subset of the NCP interoperability protocol adapted to the idea of “distributed agentic artificial intelligence.”
This starts from the following observations:
- The emergence of agentic IA (above point 1),
- The necessity to interact within the constraints of the real world, either directly (receiving real-time data, controlling devices like robots in real-time) or indirectly through digital twins (above point 2),
- Like for Mixture of Experts (MoE), it is far more computationally efficient to activate only a relevant subset of smaller AI specialized for a particular task than activating a complete very large AI with 100s or 1000s of billion parameters.
- Smaller models get more and more efficient, with the same performance as models 10x bigger a few months before (models of 10B parameters of Nov. 2024 have similar performance as ChatGPT 3.5 of Nov. 2022, Llama 3.3 70B has similar performance as Llama 3.1 of 405 B parameters).
- Fine-tuning smaller models for specific tasks enhances their capabilities, enabling their deployment on edge devices.
- Sets of specialized agents are very efficient; leading to systems that comprise multiple, specialized agents managed by an “orchestrator,” which operates adaptively by selectively engaging agents for specific tasks.
- The orchestrator and the agents don’t need to be on the same computer or server; they can be distributed, as in the NCP. Agents can be Small Agent Models, specialized small versions of LLMs or even using other approaches.
- Distributed systems promote resource sharing and optimize energy efficiency, privacy, and modularity.
- Agents can operate on various devices, including older hardware, ensuring adaptability and extended device lifespans.
- As for the NCP, central to this “distributed agentic AI” is the “orchestrator,” which routes tasks to specialized agents or devices.
Based on these observations, it is imperative to establish open protocols for these “distributed agentic AI” systems to facilitate seamless interaction among distributed AIs from different origins.
Therefore, to effectively operate this federation of distributed AIs, it is necessary for them to exchange data and parameters through a universally comprehensible protocol that:
- Does not solely rely on functional requirements (e.g., the textual representation of prompts and responses).
- Also incorporates non-functional requirements (providing sufficient information for the orchestrator to select the appropriate services, such as based on criteria like response time, potential level of hallucinations, cost, localization, privacy of data, etc.).
Large entities such as OpenAI, Meta, and Microsoft are attempting to promote their own APIs for accessing their models. However, an API alone is insufficient for constructing this distributed and federated network of AIs.
The exchange format (JSON, ASCII text) is perhaps not the optimal way for networks of AIs to
efficiently exchange information: this could be tokens, embeddings, or any other representations - some research also shows that LLMs talking to each other could develop their own "language”.
It is therefore important that the community works together to commonly define this exchange protocol that should be open to allow broad acceptance.
Similar to TCP/IP that enabled various OS (Operating Systems) to communicate, the aim of this action is to create the equivalent for OS (Orchestration Systems) to exchange AI-related information.
Time is crucial for this initiative, and standardization, however necessary, will be too long, so a de facto open standard should be proposed in parallel with the standardization effort, before other closed proposals will emerge, locking down the approach to a few (non-European) players. Like for the NCP, this approach will allow the creation of a completely new ecosystem where smaller players can provide specialized AI as a service along with the big ones. Directories of services, trusted brokers, and payment services are also important elements that can emerge from this ecosystem, where Europe can have an active part thanks to its set of SMEs, research organizations, and distributed nature.
Europe should be an active player in the race for the “distributed agentic artificial Intelligence”
Finally, we would like to finish this introduction to the HiPEAC Vision 2025 by a more philosophical reflection on the evolution of the paradigm of artificial computing: we are going from computing systems focusing on precision to systems working with approximations.
Historically, computational systems have been designed to perform reproducible and relatively precise computations. These systems excel at deterministic operations, with any deviations generally attributed to technical limitations, such as floating-point representation errors or overflow issues.
However, a new generation of computational approaches is shifting the paradigm toward more “approximate” computing. This transformation is driven by the following innovations:
- Neural Network-Based Approaches: Modern methods, including generative AI, often rely on neural networks that operate with low-precision coding formats such as FP4 (e.g., 1 bit for sign, 3 bits for exponent or 1 bit for sign, 1 for mantissa and 2 for exponent). These systems inherently produce approximate results, sometimes referred to as “hallucinations,” in contexts like AI-driven content generation.
- Ising-Based Coprocessors: Technologies such as the Fujitsu Digital Annealer, Hitachi’s machine, and D-Wave systems are designed to solve optimization problems. These devices focus on finding a function’s minimum, though not necessarily its global minimum, using techniques like Simulated annealing, Quadratic Unconstrained Binary Optimization (QUBO), etc.
- Quantum Computing: Quantum systems, characterized by stochastic measurements, produce probabilistic readings rather than deterministic results, further reinforcing the trend toward approximation.
This shift represents a transition from the classical computational framework of (parallel) Turing machines, introduced in 1936, to models inspired by universal approximators, as first proposed by McCulloch and Pitts in 1943. Turing demonstrated that any form of mathematical reasoning could theoretically be executed by a machine. McCulloch and Pitts later showed that neural networks of finite size can approximate any function to a desired level of precision.
Looking forward, future systems must integrate both paradigms—precise and approximate—within feedback and reinforcement-based architectures. This hybrid approach mirrors the dual-system thinking described in Daniel Kahneman’s Thinking, Fast and Slow (2011), where two types of reasoning, intuitive and analytical, are combined to achieve optimal outcomes. The approximate system being a sort of “oracle”, prediction of the solution, that can be then verified with the precise system in an affordable amount of time. The combined system then can iterate if the prediction is far from being correct.
This convergence of paradigms will enable the development of computational systems that blend the strengths of precision with the flexibility of approximation, pushing the boundaries of what machines can achieve.
To continue in this direction, we can conclude by quoting Demis Hassabis in his lecture receiving the Nobel Prize in Chemistry for AI research contributions for protein structure prediction: