Introduction

Implementation of the Cognition Layer Algorithm ($A_1$) with Planex

The Cognition Layer Algorithm, sometimes referred to as $A_1$, defines the intelligent or cognitive layer of the ECM. A detailed problem analysis can be found in the ECM Problem Analysis, where we divided the ECM cognition problem into three subproblems: Planify, Reduce, and Translate.

Design

Planex has been designed as a three-agent model. By leveraging LangChain, we can utilize the capabilities of large language models (LLMs) such as GPT-3.5. In this context, we assume the knowledge base $C'$ aligns with the properties established in the analysis. This framework allows us to effectively define and chain these three agents, with their behavior primarily specified through two methods:

Fine-Tuning: By fine-tuning the models, we can achieve the desired behavior from each agent more accurately. However, this methodology requires an unbiased dataset and extensive analysis/training.
Prompt Engineering (Selected Approach): Using prompt engineering, we can quickly deploy the agents with an approximation of the fine-tuned behavior. Zero-shot prompts enable experimentation, modification, and testing of agent results.

We selected the prompt engineering method because it allows for the rapid exploration of multiple agents and new approaches, in alignment with the easily replaceable modules of the ECM.

Prompt Engineering

All the prompts designed can be found in the /cognition_layer/planex/agents/prompts.py file of the repository. In that file, each agent has three defined prompts:

Instructions: Here we declare the expected behavior of the agent. As defined in Lei Wang et al. paper, we define the Profile of the agent. The main properties of each agent are:
Planner: The planner's objective is "to provide a detailed step-by-step plan to address the user's query."
Reducer: The reducer's objective is "to review a given plan and provide a new plan that achieves the same result using predefined functions."
Translator: The translator's objective is "to translate the plan into Exelent language."
Guidelines: Using a trial-and-error methodology, we define a set of rules that improve the agent's behavior by explaining and warning about possible failures and misconceptions about the goal. For example, "Ensure the new plan uses only the Exelent language constructs and achieves the same result as the original plan."
Example: By using one-shot prompting, we can improve the reliability and standardize the response of the model, specifying for each agent a possible result and its format.

For some agents we also provide some information about the system, such as which tools are available, which is the focused window, operative system, etc. For this step we could take advantage of langchain tool formatting for defining the valid tools the ECM can receive.

Results

Using this simple but efficient model, we achieve the following advantages:

Simple Task Solving: For tasks that require three or four consecutive steps, Planex can appropriately define and execute the correct actions. Examples include: "Open Spotify" and "Write 'hello world' on the terminal."
Fast and Controlled Results: The results are consistently defined after three steps, establishing a concrete number of steps to execute.

The main disadvantages found while testing this agent are the following:

Large Prompt: As this model uses a one-shot prompting methodology, it requires an extensive prompt each time it is executed, making the request more expensive and necessitating re-learning how to solve the query each time it is called. Fine-tuning could address this issue, but further research is needed.
No Failure Reaction: If the reducer or translator fails to properly select the correct tools, the agent cannot recover and must be fully reloaded.
Not Fully System Aware: Although we have introduced information about the system, the agent cannot fully understand and develop a mental simulation of the system's status. This limitation leads to failures where the agent assumes previously opened apps, defined requirements, etc.

All this properties can be tested in this repository by using the following command:

python ecm/core/main.py --agent planex

If you are in a safe environment you can also use the python executable in /ecm/core/run_in_host.py

PlanexV2

PlanexV2, located in the directory /cognition_layer/planexv2, is a four-agent model for $A_1$ that improves upon the original Planex agent by introducing the Blamer, an agent capable of reacting to exceptions and recalling the failed agent.

The Blamer follows the same schema as other agents, with three key properties:

Exception Handling: The Blamer is only called when an exception occurs. It receives the exception as a string and provides the system with context information about all Planex agents.
Response Specification: Using LangChain, we fully specify the format of the Blamer's response, defining three key concepts to resolve: the blamed agent (Who failed?), the explanation, and advice for avoiding the failure when the agent is called again.
Selective Recall: The Blamer can recall agents from the Planex chain as needed, skipping those that are not necessary (i.e., those that are correct).

Results

Key Advantages

Failure Reaction: The agent can now recover from failures, reusing agents and ensuring that the response will return a valid Exelent file.
Improved Accuracy: By showing the agents their failures, we achieve better results, with improved accuracy and more reliable plans.

Key Disadvantages

Cost of Recovery: Recovering from a failure requires a larger prompt, making it more expensive than expected, even though some agents are skipped.
Limited Recovery: The Blamer is not always able to fully recover Planex from failures. If the plan is too long (i.e., involves too many steps), PlanexV2 can enter a loop, failing to recover. To address this, PlanexV2 has a maximum step limit.

All these properties can be tested in this repository using the following command:

python ecm/core/main.py --agent planexv2