← Blog

Post

Understanding Context Engineering and feedback loop tips based on signals

January 21, 2026/understanding-context-engineering-and-feedback-loop-tips-based-on-signals

Tips and tricks about context engineering, Factory AI's approach on optimising AI agents and X post on feedback loop.

Hi readers, today we are going to know more about Context Engineering and we could optimise our AI Agents by applying multiple techniques discussed below.

Starting with the article about Context Engineering from one of my friend Hacker R.C .

Lessons from his article.

We can improve our agents capabilities drastically by Tool reduction, Skills , Purging irrelevant context , compaction and subagents.

Let's first understand the value of implementing tools within our agentic workflow. There might be situations where we would want to evaluate scenarios or confirming a hypothesis like creating a test case for proving a vulnerability and running it to provide final verdict. If we don't perform such evaluation and completely depend on the AI models there are extremely high chances of false positives.

Key to remember about context : Context = System prompt + Tool definitions + Tool calls (inputs) + Tool outputs

  • Tool Reduction : As we know that the context contains information about tool definitions , input and output. Imagine having a workflow where your agent have tools that are overlapping in functionalities in some way i.e. both tools are trying to achieve same task and filling the context with their definitions and outputs. Well this was the case with our agent preview and it made us realise how much context we can same by removing such tools. We can further optimise it by examining the tools call traces and understand what part of the tool chain can we further optimise. For example passing the full path of python vs just calling python.

  • Scripting : R.C shared that if there are situations where we cannot batch series of task as one tool because of complexity then in such situations we can create scripts in bash or python and give the agent access.

  • MCP : To be very honest, I don't know much about MCP's and hence this section was not something I paid attention but I will be going into it in the upcoming articles.

  • Skills : At its core skills are just metadata with domain level expertise and scripts in a SKILLS.md file. In the file we describe metadata namely skill description and name. When agent believe that the skills are relevant to the work they are performing they can load the skill into context. RC mention in his blog that he has implemented certain domain level expertise in his multi agent system that allowed him better performance.

  • Purging Context : It refers to the practise of clearing the context by removing irrelevant information. He mentions purging can be done for example purge the output of the last few tools calls. He mentions

The usefulness of this completely depends on the work your agent is doing. If it spends majority of it’s time finding code snippets, it wouldn’t hurt much because the calls will stay which contain filenames, etc. only the outputs will be purged.

  • Compaction and dual-pass summarisation :

    • Compaction : The idea is pretty simple that we compress the context whenever we are about to reach 60-70% of the model context. The problem with this approach is that the we cannot determine if the data being lost in the process of compression is critical to our task or not.

At a certain threshold, the whole context window is sent to an instance of the model with a summary prompt and then the output of the model is used to replace the whole context with the summary.

  • Dual-pass Summarisation : R.C mentions that the workflow can be optimised by utilising the conversation history to create summarisation prompt and generate a summary.

  • JIT conversation history : Alternatively we can write our conversations in a file and ask the agent to read whenever required. Basically if the agent is not able to find the results in the summary it can refer back to our conversation file to seek for information.

  • Reference


Droid Article on performance improvement on Terminal-Bench

In the article Droid talks about the implementation they opted which made them top Terminal-Bench which measures AI agents ability to complete complex tasks spanning coding, tests and dependency management.

They shared that their design makes them model-agnostic and drives state-of-the-art performance.

Some of the notable things I've learned from their article:

  • They discovered in their tests that the AI "always" prioritised the recent knowledge over system level instructions or long agent knowledge.

We have found that these models exhibit recency bias by prioritising recent context over system-level instructions for low-level, nuanced details over long agent trajectories.

  • In their workflow of three-tier prompting (Tool Description, System Prompt , System Notifications) they injected contextual notifications. For example, when the agent is about to run a search, a notification might appear:

“Use rg (ripgrep) for grep operations; limit results to 100 lines and set a 5‑second timeout.”

The context would look like : Tool Description + System Prompt + System Notifications

  • Model specific Architecture : Different models approach towards a task differently and hence the way of dealing with the tasks also varies. Like handling of files or paths across models and done differently. If we are creating a system which supports multi-model we need to keep such architecture design in consideration.

  • Tool Design priciples : They talk about the principles for designing tools where they mention that their tools are limited to only to essential tasks which saves a lot of context and confusion for LLM's. Another tip they shared was regarding the input schemas which we need to simply to make sure the LLMs are understanding the input clearly without ambiguity.

  • Apart from these they also shared that making LLM's aware of their environment (system information) can improve the performance.

  • Reference


Correct way to use feedback loops with signals

I found the post randomly scrolling X and surprisingly it was very helpful. Before understanding anything about the way the author described the correct way of using feedback loops we want to understand what are signals. Signals are indicators that can help project evaluate if their agentic workflow is giving the output desired by the users or not for example thumbs up/down rating is an indications, redirecting to a link shared by our agent is an indicator and even the user retention is an indicator.

Author shared that depending on a single indicator is not the ideal way to build and measure the quality of our system and we must think through about them. For example considering response timing as indicator can make it worse. Think of it as the AI will choose shortcut or hallucinate to answer the user query to make sure we are doing it faster.

He shared that the ideal way to implement the feedback loop is to build a workflow where we are storing earlier conversations with signals and store them in semantic database. When a user asks about similar question the system fetch those earlier conversations , reranks them based on quality signals and take them as examples in the system prompt.

This whole process can help the system provide higher quality data over the time.