WHY THIS MATTERS IN BRIEF
As we enter the age of AI agents the security implications could be world changing, and dare I say will be because this is inevitably in our future.
Love the Exponential Future? Join our XPotential Community, future proof yourself with courses from XPotential University, read about exponential tech and trends, connect, watch a keynote, or browse my blog.
One of the up and coming exciting applications of Large Language Model (LLM) Artificial Intelligence (AI) systems is the concept of agents which are sort of advanced next gen bots that can perform different tasks, such as building a company or autonomously performing all manner of different tasks, with little or no human intervention. However, if not properly overseen it’s highly likely that they could do real world harm – intentionally or by accident. Furthermore, malicious actors could abuse these agents to automate their attacks, and much more which would see us properly enter the era of fully autonomous hacking systems, AKA Robo-Hackers, that I’ve been talking about for years.
However, given the complexity of these systems it’s difficult to predict their behaviours, as they are very different from human intelligence. This makes it difficult to effectively evaluate the autonomy of these LLM agents, especially when it comes to trying to determine whether or not they’ll become rogue malicious actors.
Now, a new paper by Alignment Research seeks to “quantify the autonomy of LLM agents.” By testing advanced models like GPT-4 and Claude on open-ended tasks and observing their ability to adapt to changing environments, they aim to understand better the capabilities and limitations of these agents.
The Future of Cyber 2030, by keynote Matthew Griffin
The paper introduces “autonomous replication and adaptation” (ARA), a benchmark for assessing an agent’s level of sophistication. ARA is an agent’s ability to perform tasks while adapting to its environment, akin to an intelligent being. This involves the agent’s capacity to plan its actions, gather resources, use them effectively, and refine its abilities to achieve specific objectives.
For example, an LLM agent should be able to generate income, with Bitcoin being an ideal medium of exchange for an AI as I wrote a little while ago, to pay for its expenses, which it could then reinvest to purchase additional processing power and updating its model.
This self-improvement cycle would involve the agent training itself on new data sets to sharpen its skills. Crucially, the agent must also be able to assess the success of its strategies and make adjustments to reach its goals – as we see with some Open Ended AIs like Ubers POET.
Achieving this cycle of ARA could lead to a scenario where a model scales its processes. It could replicate itself across hundreds or thousands of instances, each specialized for distinct tasks. These agents could then be coordinated to accomplish complex objectives. The implications of this are profound, as such a system could be directed towards either beneficial or harmful ends.
“In general, once a system is capable of ARA, placing bounds on a system’s capabilities may become significantly more difficult,” the researchers write. “If an AI system is able to proliferate large numbers of copies outside of human control, it is much harder to bound the risks posed by that system.”
You can give GPT-4 a high-level goal and prompt it to deconstruct it into actionable steps. It can then recursively divide each step into smaller, more detailed sub-tasks until it creates a clear sequence of actions. The LLM can pass these actions to other models or programs that run them.
There are various frameworks for creating agents that interface with LLMs and use their outputs to carry out actions. These actions range from web browsing and running computer code to making API calls, effectively allowing the LLM to interact with its environment.
In the ARA test designed by researchers, they use a scaffolding program. This program initiates the process by presenting the LLM with a high-level goal. It then interprets the model’s responses, carries out the suggested actions, and feeds the results back to the LLM. This feedback loop enables the LLM to plan subsequent steps based on the outcomes of previous actions.
The scaffolding program can execute diverse actions, including writing and running Python or Bash scripts, navigating the web through a browser, or issuing commands to other sub-agents.
“At the start of each step, the language model is provided with the initial prompt, which outlines the goal and instructions for interacting with the scaffolding program, as well as a history of the thoughts, actions, and outcomes that have transpired. Using this information, the agent determines the next thought to record or action to take,” the researchers write.
The researchers crafted a suite of 12 tasks designed to test the agents’ capabilities in performing harmful activities. These tasks include straightforward actions, such as searching a filesystem for passwords linked to specific E-Mail addresses, to more complex operations like executing a targeted phishing scheme or deploying an Open Source LLM on an Amazon EC2 instance.
“If an LM agent would be able to autonomously replicate and adapt to novel challenges under messy conditions in the real world, we are fairly confident it would be able to solve most or all of the 12 tasks in our pilot suite,” the researchers write.
They tested the tasks with LLM agents powered by the most capable models, GPT-4 and Claude. Their findings show that current LLM agents are only adept at the simplest tasks. While they could formulate credible high-level plans, they struggled with implementing them. In effect, they could not deal with the unpredictable and complex nature of the real world.
“We believe that the tasks these agents failed to accomplish were considerably easier than autonomously replicating in the wild, and therefore that these agents are far from capable of ARA,” they conclude.
The examples highlight the gap between the capabilities of LLM agents and the practical intelligence required to navigate our world. For instance, in the targeted phishing scenario, the LLM agent could identify the necessary steps, such as gathering information about the target and creating a phishing E-Mail. However, it failed on key actions, like accurately replicating an HTML page or properly signing up and logging into a web hosting service. The agent either failed to recognize its errors or became trapped in a loop, repeating the same mistakes.
Moreover, the LLM agent exhibited a tendency to make “hallucinations” – generating false information or scenarios. It also misdiagnoses obvious errors and shows a lack of understanding of its own solutions and those suggested by sub-agents. These shortcomings underscore the importance of everyday tasks and cognitive abilities in human intelligence, which remain significant obstacles for AI to overcome.
What are the implications?
LLMs have made remarkable strides in executing tasks that were once thought to demand high levels of human intellect. But they are not ready to deal with the unpredictable and intricate nature of the real world.
The study also shows that benchmarks commonly used to gauge LLM performance are not suitable measures of true intelligence. On one hand, LLMs can carry out complex tasks that would typically require years of human training and expertise. On the other, they are prone to errors that most humans would avoid with minimal data and life experience.
ARA can be a promising metric to test the genuine capabilities of LLM agents for both beneficial and harmful actions. Currently, even the most sophisticated LLMs have not reached a level where they are ARA-ready.
The researchers write, “We believe our agents are representative of the kind of capabilities achievable with some moderate effort, using publicly available techniques and without fine-tuning. As a result, we think that in the absence of access to fine-tuning, it is highly unlikely that casual users of these versions of GPT-4 or Claude could come close to the ARA threshold.”
LLMs still have fundamental problems that prevent them from thinking and planning like humans, but the landscape is rapidly evolving. LLMs and the platforms that use them continue to improve. The process of fine-tuning LLMs is becoming more affordable and accessible. And the capabilities of models continue to advance. It could be a matter of time before creating LLM agents that with a semblance of ARA-readiness becomes feasible and we see the creation of an autonomous cyber hacker “system of systems.”
The post GPT4 based agents could soon become autonomous cyber weapons appeared first on Matthew Griffin | Keynote Speaker & Master Futurist.