Skip to content
DOI

AI will radically change how we do software engineering in the future.

AI-Assisted Software Engineering (AISE)

by Andreas Metzger (Editor)

Generative AI, large language models and AI chatbots are new tools that can help humans with various creative activities. One of these activities is software engineering, which will be radically transformed by the use of AI. We call this AI-Assisted Software Engineering (AISE). AISE will have an impact on all parts of software engineering. We explore the opportunities but also important challenges of AISE. We discuss how AISE will affect the different software engineering tasks, and discuss important cross-cutting concerns, such as the software life-cycle, software security, as well as non-technical concerns.

Key insights

Key recommendations

  • Delivering AISE’s full potential requires addressing significant technical and non-technical challenges. Addressing these challenges requires dedicated research and innovation (R&I) actions to facilitate the competitiveness of the European primary and secondary software industry.

  • R&I actions on AISE will be key to addressing gaps identified in the Horizon Europe Strategic Plan 2025-2027 Analysis[1], which states that " Regarding AI for Software Engineering, ‘software co-engineering’, the following aspects of the software development lifecycle are not covered yet: code search using natural language processing techniques, using AI to analyse the code quality, perform automatic bug fixing, or the application of AI-based failure prediction algorithms for the software at operation time, have, at this stage not been fully incorporated; low-code approaches will increase software quality and productivity, wider take-up, and ease of deployment, also from close-to-the-user configuration and personalisation."

AI-Assisted Requirements Engineering

Requirements engineering is the process of defining and refining the needs and expectations of different stakeholders for a software system. Requirements engineering helps to establish a common understanding and agreement on the requirements among all the parties involved in the software engineering process. Requirements engineering uses various methods and techniques to support the whole requirements life cycle, from gathering and analysing requirements to validating and managing them[2].

Opportunities. AI combined with natural language processing can assist requirements engineering activities, since many requirements are expressed in natural language[3]. In particular AI Chatbots based on LLMs can deliver untapped automation opportunities in requirements engineering. AI chatbots can improve the communication and understanding with people from different backgrounds and skill levels, as well as non-technical users. Moreover, it may help to overcome the language barriers; for example, allowing the stakeholders to converse and provide input and feedback in their preferred language.

Challenges. AI chatbots are fascinating because they can answer natural language questions and generate rich text, but they also pose a problem for requirements engineering. The big language model behind an AI chatbot might "hallucinate", i.e., create nonsensical text that does not match the input. This means AI chatbots might have low fidelity. Also, AI chatbots might give different answers to the same question, which means they have low stability. Low fidelity and stability can lower the quality of the responses and affect the requirements engineering outcomes, leading to poor requirements. A key challenge is how to make sure that AI chatbots produce high-fidelity and high-stability responses. This might involve using specific prompt engineering that takes into account the requirements engineering context, as well as fine-tuning the parameters of the underlying LLMs.

AI-Assisted Code Creation and Maintenance

Code creation and maintenance are difficult because they require designing effective algorithms and programmatic solutions, ensuring compatibility with different platforms, and fixing errors or bugs that may arise. As software becomes more complex and sophisticated, it also becomes more challenging to ensure its quality, performance, and flexibility.

Generative AI tools (such as GitHub Copilot and Code Llama) are transforming programming, offering profound improvements in code creation, maintenance, and thereby significantly increase developer productivity. The adoption of such AI tools in SE is estimated to lead to a $1.5 trillion boost in global GDP by 2030[4].

Opportunities. Generative AI tools make code suggestions and autocompletions smarter, providing developers with code snippets that they may not have thought of. Programming is being supplemented by prompt engineering to generate code from natural language prompts. Generative AI tools become increasingly interactive (such as GPT Engineer) leading programmers to refine and improve their prompts to get the desired results, ultimately enhancing computational thinking and change how coding skills are learned.

Generative AI can make code refactoring and restructuring easier and faster, enhancing performance, readability, and modularity over time. Generative AI can also quickly generate patches and fixes for vulnerabilities and bugs, reducing the time to address critical issues. Engineers can use conversational interfaces to fine-tune and tailor AI-generated adaptations to their specific context, rather than accepting generic suggestions.

Generative AI can also help with legacy code maintenance, which is becoming more challenging as the expertise of legacy programming languages and environments fades away. Generative AI tools can analyse legacy code, find quality problems, and suggest refactoring or rewriting options (such as IBM’s watsonX Code Assistant that transforms COBOL code into Java).

Challenges. AISE offers novel possibilities for code generation and software engineering, but it also faces important difficulties. For example, LLMs can produce outputs that are not consistent or logical (hallucinations, as mentioned above), which can affect the quality and traceability of the code. Moreover, security is a major concern, as tools such as GitHub Copilot may generate code with vulnerabilities in about 40% of the cases[5]. Also, intellectual property issues can also emerge, as some generated code may unintentionally copy licensed programs (discussed further below, under “Non-Technical Concerns in AISE”). While Generative AI can produce high-quality code for general purpose situations, generating domain-specific code may still be limited. Here, retrieval-augmented generation can enhance code generation and summarization by seamlessly pulling relevant coding patterns from relevant databases, and aiding maintenance by ensuring consistent and up-to-date coding practices.

AI-Assisted Quality Assurance and Testing

As our dependency on software continues to grow, we see an increasing pressure on software engineering to quickly deliver software while assuring adequate software quality. The total cost of poor software quality in the US[6] alone amounted to 2.08 trillion USD in 2020. Given this central and crucial role of quality assurance, the use of AI for quality assurance has been widely studied in research. Even before the launch of ChatGPT (in November 2022), research delivered over 800 publications[7] with a compound annual growth rate of 38% (between 2018 and 2022).

Opportunities. Generative AI, LLMs and AI Chatbots will help evaluate code quality and make the quality process more efficient. Some examples are grouping and ranking tests and finding parts of the code that are more likely to have bugs. AISE will offer novel possibilities for quality assurance, such as generating better test assertions, more precisely fixing programs, and more accurately pinpointing bugs introduced during software changes.

Challenges. When using Generative AI, the generated tests may contain bugs in the test inputs or the expected outcomes. Generating the expected outcomes for the tests is difficult (this is known as the test oracle problem), as it may require information that is beyond the current capabilities of AI. Moreover, LLMs may produce unrealistic or erroneous outputs, such as calling non-existent functions or passing unsupported parameters to existing functions. Tests that do not have any compile or runtime errors still need to be checked for correctness. Additionally, when the tests fail, it is hard to determine whether the fault lies in the software itself or in the test. This calls for a systematic approach of how AI-generated tests are incorporated into the quality assurance process.

AI can help predict faults, failures and development effort in novel ways. These predictions can assist practitioners in making timely decisions. However, AI models may sometimes produce inaccurate predictions, which can result in wasting resources on false positives or overlooking important actions due to false negatives. Providing AI models with ways to express and guarantee their confidence in their outcomes would greatly improve the quality of AI-supported decision making.

AI-Assisted Integration and Deployment

To deliver software changes faster and more frequently, modern software development practices, such as DevOps, use automated Continuous Integration and Deployment (CI/CD) pipelines. This facilitates the rapid release and operation of new software versions.

Opportunities. Writing scripts for CI/CD pipelines can be complex and tedious for human developers and thus presents itself as a promising target for Generative AI. This can be combined with novel “virtualization” techniques to hide the low-level details of computing, storage and networking resources, such as Function as a Service (FaaS) or Serverless Architecture, where developers only specify the resource requirements for their software, without worrying about the deployment targets.

As the computing infrastructure shifts from centralized cloud to decentralized Cloud-Edge-IoT continuum, AI becomes an essential tool for managing and supporting the deployment of software on the available resources. To handle the ever-expanding computing continuum for continuous delivery, advanced learning and reasoning capabilities (such as multi-agent reinforcement learning, deep reinforcement learning, and “meta” reinforcement learning) offer novel opportunities to capture the dynamicity, complexity and uncertainty of the Cloud-Edge-IoT continuum.

Challenges. AI can help automate the integration and deployment of software, but it faces many challenges, including (1) How to plan the deployment with multiple objectives, preferences, and constraints that may vary depending on the situation; (2) How to capture the high-level intents of developers instead of low-level actions that may not reflect their goals; (3) How to keep developers in the loop and let them review and refine the deployment scripts over time; (4) How to coordinate the deployment agents across different locations, domains, or providers to achieve a global optimal solution; (5) How to build trust between developers and AI by explaining the deployment decisions and outcomes.

AI-Assisted Adaptation

A software system that can change its own structure and behaviour at runtime is called a (self-)adaptive software system. Examples of adaptive software systems are cloud systems that can scale up or down, IoT systems that can learn and act intelligently, and process management systems that can anticipate and adjust to changes.

A crucial component of an adaptive system is its adaptation logic, which specifies when and how the system should adapt itself. When developing the adaptation logic, developers have to deal with design time uncertainty, which means they have incomplete information about when and how the system should adapt[^].

Opportunities. Using AI in an online fashion (e.g., online deep learning) is a new way to deal with design time uncertainty for adaptive systems. For example, by using deep reinforcement learning at run time, the system can learn from data that is only available when it is running, and thus cope with uncertainty better. Moreover, the system can benefit from predictive monitoring, which is a technique that estimates the likelihood of failures in the near future by leveraging deep learning models and complex operational data.

Challenges. Online reinforcement learning has been used successfully for adaptive systems, but important challenges remain. Modern deep learning algorithms are stochastic, which means that their performance can significantly vary. This means that even if deep learning models perform well in the lab, they may not perform well in an actual production environment. While factors that cause variance may be controlled (e.g., non-deterministic neural network layers or random weight initialization), this typically leads to performance degradation. A more promising way thus is to use "meta"-learning techniques on top of the deep learning models, such as ensemble learning, which aggregates and thereby enhances the results of multiple deep learning models.

The successful application of online reinforcement learning depends on how well the learning problem, and in particular the reward function, is defined. Software engineers need to explicitly define a reward function, which quantifies the feedback to the RL algorithm. Getting the reward function right such that it accurately reflects the trade-off among different goal dimensions and achieves its overall learning goal is a challenge.

In addition to leveraging AI chatbots to provide natural-language and interactive explanations and thereby support debugging of adaptive systems[1:1], a novel direction may be to use LLMs to generate suitable reward functions from a sufficiently concise description of the problem context and overall learning goal.

AI-Assisted Software Life-Cycle

As highlighted above, AISE will transform software engineering by enabling higher levels of automation for various tasks, from requirements engineering to operations and legacy systems maintenance. This will improve the efficiency and speed of the software life cycle and supply chain. Many solutions have been proposed to automate specific software engineering tasks using AI techniques. A recent survey[1:2] indicates that the research output on using ML for automating individual software engineering tasks shows a compound annual growth rate of 33% (from 2018 to 2020).

Opportunities. Automating individual software engineering tasks is beneficial, but there is also a great potential to exploit synergies among these tasks across the whole software life cycle and supply chain. For instance, a single quality assurance task may not be enough to ensure the desired software quality. Ideally, one would use a suitable combination of different tasks – such as dynamic testing and static code analysis. Also, exploiting the synergies between different tasks can increase the effectiveness of the individual tasks. For example, having a good estimate of the fault density of a software component, e.g., by using deep learning-based fault prediction methods, could be used to optimise and prioritise testing effort and budget.

Challenges. One way to leverage synergies in AISE is to use the output of one AI-based software engineering task as input for another task. For example, one can use the predicted fault-proneness of a component to prioritize test cases. However, this approach may not fully exploit the potential of AI. A more interesting direction is to leverage synergies that take into account the specific characteristics of AI models. For example, an AI model can provide an explanation that makes the output of deep learning models more understandable, and thus more trustworthy, when used as input for another activity.

AI-Assisted Security

The software supply chain consists of many elements, such as code, tools, people, and processes. These elements interact with each other to create, test, deploy, and maintain software products. However, this also exposes the software supply chain to various threats (e.g., as discussed in the NESSI position paper on software security[1:3]). To protect the software supply chain from cyber-attacks, which are becoming more frequent and advanced, AI can be a useful tool; for example, the US White House has launched the AI Cyber Challenge to Protect America’s Critical Software.

Opportunities. AI can enhance efficiency, productivity, and quality in software engineering and security, as discussed in previous sections. AI can help programmers develop secure code by suggesting secure code snippets. AI can generate more comprehensive test cases that improve test coverage. AI can improve patch management by detecting vulnerabilities and by predicting the impact of a patch on the system's stability before deployment. Other promising opportunities include analysing large amounts of data for better insights and contextual awareness (e.g., anomaly detection), learning and adapting to changing conditions and predicting new threats, and drawing conclusions and generating action plans in real-time (e.g., incident response).

AI has the potential to improve security in all areas of the software supply chain and life cycle; e.g., via software verification and validation, software composition analysis, certification and conformity assessment, risk management for software-intensive complex systems, vulnerability detection, maintenance, reporting, and threat intelligence sharing.

Challenges. AI for secure software engineering still faces many shortcomings including generation of insecure code, uncertain prediction quality, inadequate reasoning, and low transparency. High-quality datasets are crucial for training secure AI models. For example, curated datasets may be needed to train and to validate AI models so they generate more secure code with less vulnerabilities. To ensure that software engineers can comprehend, modify, and update the code, AI-assisted code generation should be transparent and explainable.

Unfortunately, AI tools for software security may also be misused by hackers and cyber-criminals. Software developers and security professionals should understand the features and impacts of AI-based cyber-attacks and how to stop them. The AI tools themselves may be vulnerable and need security measures to reduce their AI-specific security threats.

AI adoption for security is limited due to integration difficulties, a shortage of skilled professionals, and concerns about accuracy, dependability, and implementation costs. AI models, such as LLMs, pose security and privacy risks themselves, such as model inversion attacks revealing training data, bypassing safety features and using prompt injection for harmful outputs, data poisoning causing incorrect behaviour.

Non-Technical Concerns in AISE

Besides the technical concerns discussed above, AISE also offers novel non-technical opportunities but at the same time also introduces major non-technical challenges.

Opportunities. Generative AI can be used for various purposes that can improve innovation and efficiency. It can teach developers new coding skills and languages by giving them interactive suggestions[1:4]. It can also help developers brainstorm new ways of solving problems by generating code examples[2:1]. Moreover, it can help developers evaluate the possible outcomes of their software, such as how it might affect different stakeholders, the company's ROI, and other ethical issues. Lastly, it can also help developers understand the legal aspects of software engineering, such as AI, data, and platform laws, to ensure compliance and avoid IPR violations.

In particular, AI chatbots based on LLMs can provide natural-language and interactive explanations for software systems that have AI and ML components. Explainability can help software and service providers comply with the relevant laws and regulations, such as the GDPR and the AI Act in the EU. Furthermore, explainability can help software and service users trust the software systems by understanding how they produce their results and whether they are acceptable or not.

Challenges.

Intellectual Property. AI-generated content, such as synthetic code, poses new legal questions for the current IP regimes that are based on human creativity[1:5]. How to give fair credit and licenses for the products of proprietary models with different levels of licensing is a controversial issue, especially when the models are trained on user-generated data that is taken from the internet without compensating the users. It is not clear whether AI systems that create content independently without human input can be regarded as authors or not. Therefore, there is a demand for more transparency and consistency in the ownership and licensing of AI-generated content, as AISE becomes more common. To safeguard the rights and interests of both AISE tool developers and users, new licensing schemes and attribution standards should be created and harmonised. To deal with the legal challenges caused by AI-generated code that does not involve significant human input, copyright law and liability models should be adjusted accordingly.

Transparency. LLMs are complex AI systems that do not reveal their inner workings easily, making it hard to apply traditional methods of technology governance on such non-transparent LLMs. Some methods have been suggested to explain parts of LLMs, but they still face many shortcomings, such as ensuring fidelity of explanations and the difficulty of evaluating and comparing explanations. To ensure quality and security, companies could establish internal oversight boards that review the used AISE tools and underlying LLMs before they are deployed. These boards could evaluate the potential risks of biases, security breaches, and misalignment with engineering objectives. Additionally, random audits of the code generated by LLMs could help verify its stability.

Sustainability. LLMs are powerful tools but also very costly in terms of energy and resources. They can produce thousands of tonnes of CO2 equivalent during training, which may increase as models get bigger[1:6]. In addition, energy consumption during inference (i.e., when generating the outcomes) becomes a concern with the increased use and adoption of these AI models. To reduce the environmental impact of LLMs, potential directions may be to use green computing infrastructure (including processors) and to reuse and leverage existing models. We can also prune large models to make them smaller and faster without losing quality[2:2]. While LLMs may offer benefits for sustainability, such as automating quality assurance and reducing human effort, we need more research to understand the trade-offs and best practices for LLMs across their lifecycle.

Overreliance. The code-generation capabilities of modern coding assistants are impressive. However, the quality of the underlying LLMs and thus their generated code may deteriorate over time, requiring proper tracking and testing in order not to rely on low-quality models. LLMs should help (hence the term AI-assisted), not replace, human and collective programming knowledge. Otherwise, the risks of wrong outputs and unethical design may increase due to less human supervision and expertise. To ensure that LLMs are aligned with ethical values and social norms, they should be developed and deployed with the participation of relevant stakeholders, including the public and the affected groups. Moreover, LLMs should be subject to constitutional principles and expert oversight, as well as feedback mechanisms that can improve their performance and reduce their harms. Educating people about the benefits and risks of LLMs is essential for fostering trust and awareness and preventing overreliance on these powerful AI models.

AUTHOR

Andreas Metzger is Vice Chair of NESSI (the European Networked Software and Services Initiative – https://nessi.eu/), Professor of software engineering at the University of Duisburg-Essen, and Head of “Adaptive Systems” at paluno (the Ruhr Institute for Software Technology – https://paluno.uni-due.de/en/).

REFERENCES

[1]: Christof Ebert, Panos Louridas: Generative AI for Software Practitioners. IEEE Softw. 40(4): 30-38, 2023
[2]: European Commission, Directorate-General for Research and Innovation, Horizon Europe strategic plan 2025-2027 analysis, Publications Office of the European Union, 2023, https://data.europa.eu/doi/10.2777/637816
[3]: Klaus Pohl. Requirements Engineering: Fundamentals, Principles, and Techniques, Springer, 2010
[4]: Walid Maalej: From RSSE to BotSE: Potentials and Challenges Revisited after 15 Years. arXiv preprint, 2023; https://arxiv.org/abs/2304.09308
[5]: Thomas Dohmke, Marco Iansiti, and Greg Richards. "Sea Change in Software Development: Economic and Productivity Analysis of the AI-Powered Developer Lifecycle." arXiv preprint, 2023; https://arxiv.org/abs/2306.15033
[6]: Pearce, Hammond, et al. "Asleep at the keyboard? assessing the security of github copilot’s code contributions" IEEE Symposium on Security and Privacy, 2022
[7]: Herb Krasner. The cost of poor software quality in the US: a 2020 report. Consortium for Information and Software Quality (CISQ), 2021
[8]: Andreas Metzger; Xhulja Shahini; Johannes Haerkötter, Klaus Pohl. A Systematic Literature Review of Machine Learning for Uncovering Software Faults and Failures, 2023; https://doi.org/10.5281/zenodo.7615631
[9]: Andreas Metzger, Clement Quinton, Zoltan Mann, Luciano Baresi, Klaus Pohl, “Realizing Self-Adaptive Systems via Online Reinforcement Learning and Feature-Model-guided Exploration”, Springer Computing, March, 2022, pp. 1–22, 2022; https://doi.org/10.1007/s00607-022-01052-x
[10]: Andreas Metzger, Jone Bartel, Jan Laufer, „An AI Chatbot for Explaining Deep Reinforcement Learning Decisions of Service-oriented Systems”, 21st Int’l Conference on Service-Oriented Computing (ICSOC 2023), 2023, Springer, 2023; https://arxiv.org/abs/2309.14391
[11]: Simin Wang et al. Machine/Deep Learning for Software Engineering: A Systematic Literature Review. IEEE Transactions on Software Engineering, 49(3): 1188-1231, 2023
[12]: https://nessi.eu/nessi-paper-on-software-security/
[13]: Steven I. Ross, Fernando Martinez, Stephanie Houde, Michael J. Muller, Justin D. Weisz, The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces, 2023
[14]: David Noever, Kevin Williams, Chatbots as Fluent Polyglots: Revisiting Breakthrough Code Snippets. arXiv preprint, 2023; https://arxiv.org/abs/2301.03373
[15]: Giorgio Franceschelli, Mirco Musolesi, Copyright in generative deep learning. Data & Policy, 4, e17, 2022
[16]: https://aiindex.stanford.edu/report/
[17]: Elias Frantar, Dan Alistarh: SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot. ICML 2023: 10323-10337


  1. https://aiindex.stanford.edu/report/ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎

  2. Elias Frantar, Dan Alistarh: SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot. ICML 2023: 10323-10337 ↩︎ ↩︎ ↩︎

  3. Walid Maalej: From RSSE to BotSE: Potentials and Challenges Revisited after 15 Years. arXiv preprint, 2023; https://arxiv.org/abs/2304.09308 ↩︎

  4. Thomas Dohmke, Marco Iansiti, and Greg Richards. "Sea Change in Software Development: Economic and Productivity Analysis of the AI-Powered Developer Lifecycle." arXiv preprint, 2023; https://arxiv.org/abs/2306.15033 ↩︎

  5. Pearce, Hammond, et al. "Asleep at the keyboard? assessing the security of github copilot’s code contributions" IEEE Symposium on Security and Privacy, 2022 ↩︎

  6. Herb Krasner. The cost of poor software quality in the US: a 2020 report. Consortium for Information and Software Quality (CISQ), 2021 ↩︎

  7. Andreas Metzger; Xhulja Shahini; Johannes Haerkötter, Klaus Pohl. A Systematic Literature Review of Machine Learning for Uncovering Software Faults and Failures, 2023; https://doi.org/10.5281/zenodo.7615631 ↩︎

The HiPEAC project has received funding from the European Union's Horizon Europe research and innovation funding programme under grant agreement number 101069836. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them.