Theoretical Fundamentals

Problem Analysis and Theoretical Approach

With the objectives established at the Introduction chapter, our focus initially is on addressing each of these objectives individually. This involves connecting LLMs (Large Language Models) to a standard user, defined as someone lacking programming skills and familiarity with non-standard software tools beyond graphical applications like office tools. This suggests developing a "black box" approach regarding both the user and the LLM itself.

The AGI's Problem

For obtaining the objectives stablished, it is important to dive into the concept of AGI's. AGI's (Artificial General Intelligence) can understand, learn, and apply knowledge across a broad range of tasks, similarly to human intelligence. An AGI can theoretically perform any intellectual task that a human can do, applying its intelligence generally across different domains without specific training for each.

Implementing an AGI is practically impossible; however, we will approach the problem by axiomatizing that a human can solve any software problem p ∈ P using a knowledge source C, and a set of actions A, such that:

HumanSolution(p | C, A) = s

Where s is the solution to problem p and is also a heuristically viable solution in a Turing machine environment.

Definition of ECM's as an approximation to the AGI problem

We can approximate the problem by considering the knowledge of an LLM as C' ≈ C, i.e., an estimator of human knowledge; and the set of programmable actions as A' ≈ A, i.e., an estimator of the action space to solve p. Therefore, the creation of an AGI implies designing an algorithm that verifies the following equality:

Algorithm_1(p | C', A') = s

Finally, since C' is already integrated knowledge in LLMs by definition and fully accessible, we only need to design an architecture that acts as an intermediary between the knowledge C' and our problem p to achieve a set of actions A'. Execution of A' should reach our solution s, verifying the following equality:

Algorithm_1(p | C') = λ | λ ⊂ A' ∧ Algorithm_2(λ) = s

From now on, we will refer the Algorithm_1 or A_1 as the Cognitive Layer of the problem, where the objective is given a set of actions A', obtain the set of actions λ ∈ A' that solve the problem; and we will refer the Algorithm_2 or A_2 as the Execution_Layer of the problem, where the objective is given a set of actions λ, execute them in order to reach the solution s.

Objectives Simplified

The problem of "how to implement an AGI" is now simplified into three subproblems: 1. Send the user's problem request p into a query q that the LLM can understand q'. 2. Convert the LLM's natural language response into a set of executable actions λ ∈ A'. 3. Execute the set of actions λ.

This approach is underpinned by the philosophy: "If a human can solve it, then an LLM should also be able to". With this idea in mind, we will refer to the implementation of an architecture that accomplish the properties stablished as an Execution-Cognitive Machine or ECM

AutoGPT in relation with AGI's Problem

AutoGPT is a Python library that uses AI Agents based on the Memory-Profile-Planning-Action (PMPA) model discussed in this paper. It allows programmers to implement agents connected via HTTP requests to GPT-OpenAI and similar models (e.g., LLAMA, even with local execution) easily and maintained by thousands of users collaboratively.

A significant feature of this library is that it allows these agents to be equipped with a set of "skills" known as "actions" or "abilities," with which the agent will know how to interact and thus call sections of code that the programmer must have previously implemented in the agent.

The most powerful agent model implementing this technology is @evo.ninja, which successfully stands out for its abilities to solve software tasks/problems as shown in Figure 2 (use of files, I/O, CSV reading, and query resolution...).

Despite the achievements by AutoGPT, this library is far from fully integrating our ECM, as its action space A' is a considerably small subset given the set of problems p ∈ P, i.e., A' ≈ A. Therefore, the goal of GU-S will be to extend the "skills" architecture of AutoGPT to one in line with a ETM.

A' Expansion Method

Given the AutoGPT AGI's problem, we define the A' Expansion as an alternative to better approach this problem.

Before implementing the algorithm itself, we must establish the following axiom:

Manually programming a set of actions A' for the entire subset P is not viable due to scalability issues.

In simple terms, "it is not feasible to program all possible actions and solutions that AutoGPT should use." Thus, GU-Systems proposes simplifying this problem to implementing a protocol with which to describe an Action Space B' robust enough so that the composition of actions in B generates a space A' ≈ A.

In robotics planning problems, this issue has previously been raised, where a widespread solution even today is the use of known as BehaviorTrees. These systems use a set of simple algorithms that can be combined to generate a virtually scalable action tree for any problem p. That is, BehaviorTrees provide us an easy solution to "How to build complex plans with simple actions." Now we just need to determine WHICH will be these "simple actions" and determine if they are robust enough to reach a solution s'.

Thus, our actions b ∈ B are a set of commands and keystrokes (clicks, action on keyboard, etc.), and by combining these actions through a behavior tree, we can expand the action space B such that BT(B) ≈ A' ≈ A With this in mind, we have just made a significant approximation to the problem of ECMs, where we also obtain the following properties:

The more generalist the set B, the greater the convergence of A' with respect to A

The set of actions B is scalable at the programmer level

The resolution of the problem p ∈ P can be solved, regardless of the API or program with which it is interacting