Introduction

In the paper Self-Trained Agents a research about training agents with an iterative mechanism is explored. As its fundamentals, the main process is to fine-tune an LLM by forcing it to reason about its world, correcting it and fine-tuning the agent with its own outputs. By using this process, we are able to fine-tune an agent without using a human generated.

In this repository we will take use of this methodology in order to fine-tune the Cognition Layer ($A_1$) Agents in order to minimize the costs, delay and empower the reasoning capabilities of LLMs.

Specialization Architecture

At the specialization.py module we can find a simple AI graph to implement Self-Training. This module can be used in order to generate samples (for a bigger dataset) that ensures the reasoning about a task is successful. The main algorithm uses the following steps:

Try: Receive the task from the user as a prompt, it then tries to generate an action that solves that query and predict the result that its action will generate in the user environment.

Human: "Open the firefox browser"

AI: {reasoning="The display...", action="open("Amazon"), learning=None, expectation="The window will open"}
Test: An effect_descriptor object is passed to the AI, this descriptor will be responsible of generating a description of the latests effects in the AI environment. With this description, the AI reasons if the expectation has been completed, and if so, the last reasoning will be returned for saving it as a sample.
Learn: If the test has failed, then the AI tries to generate new learnings, reasoning about its failures and how it could overcome it in the next iteration. Reset the environment and return to the try node.
Try: Iterate over the steps 1 to 3 adding the learnings obtained from previous nodes to the prompt of the agent.

An example usage of this module could be the following program:

if __name__ == "__main__":
    # Build an executor (You could use for example RosaInterpreter...)
    def test(query):
      result = execute(query)
      return result

    # Build the graph
    graph = SpecializationGraph(
        action_prompt="You are an AI computer expert. You must provide...",
        test=test,
        effect_descriptor=lambda: describe_environment(),
        reset_state=lambda: reset_my_env(),
    ).compile()

    # Execute the graph and obtain a sample
    user_input = "I want to do foo"
    config = {"configurable": {"thread_id": str(random.randint(5000, 15000))}}
    for event in graph.stream({"query": user_input}, config, stream_mode="values"):
        print("=" * 30)
        print(Fore.YELLOW + json.dumps(event, indent=4) + Fore.RESET)

Self-Training Architecture

At the specialization.py module you can find an alternative implementation of the previous architecture. In this case, we focus on improving the testing methods and making the architecture as an ubiquitous one. The main steps are the followng:

Try: Calls a function that will act emulating the "try" node of the Specialization graph, thus if the goal is to generate actions, the function will receive the user query, the learnings and previous failures in order to generate an action (function response)
Execute: Calls a function that executes the action provided from the try node. In contrast with the specialization graph, the learn function must have a form to obtain the real solution of the query, if the solution is not correct, the learn node will be called. Also, if the maximum iterations of the agent have been reached, it will also fail, enabling to retry a new reasoning from the begging.
Learn: Calls a function that contains generates a reasoning about what have failed and how it can be solved in posterior steps. The same architecture as in Specialization graph is used from now on.

An example usage of this module could be the following program:

if __name__ == "__main__":

    # Build the graph
    graph = SelfTrainGraph(
        agent=lambda input: agent_reason(input),
        response_executor=lambda agent_output: execute(agent_output),
        test_approval=lambda execution_response: check(execution_response, expected),
        learner=lamda graph_status: agent_generate_learnings(graph_status),
        max_iterations=4, # If it fails more than 4 times exit.
    ).compile()

    # Run the graph
    user_input = "..."
    for event in graph.stream(
        {"query": user_input, "success": False, "current_iterations": 0},
        stream_mode="values",
    ):
        result = event

    # Return sample
    return TrainingResponse(
        iterations=result["current_iterations"],
        output=json.loads(result["response"])["reasoning"],
        success=result["success"],
    )