How to Build a Powerful LLM-Powered Knowledge Base
Learn how to capture information from multiple sources into an LLM-powered knowledge base and query it automatically using coding agents.
A knowledge base is a concept where you store a large amount of information and make it accessible for future use. This is incredibly powerful for:
- Better decision-making
- Quickly picking up on past context
- Aligning your team
Lately, I’ve started working a lot with setting up a knowledge base and routing as much context as possible into it to help improve all of the points above. Knowledge bases were always useful even before LLMs, because it’s always useful to access past knowledge. However, knowledge bases have grown exponentially more powerful because of LLMs.
This is because of two main reasons:
- You can capture more information in the knowledge bases
- You can more easily query the knowledge base (you don’t have to look through it manually)
In this article, I’ll cover why you should set up your own LLM-powered knowledge base, how to capture as much information as possible, and how to actively use the knowledge base.

I’ve been discussing this topic before, but I’ve grown more and more fond of it because of how popular it’s become. You have, for example, the president of Y Combinator building GBrain, or Andrej Karpathy building an LLM wiki, which are both examples of knowledge bases.
There is, of course, no ground truth for the optimal way to build a knowledge base. I think the most important thing is to actually start storing all of your context into a knowledge base and figuring out how to query it effectively — for example, when writing code, in meetings, or similar.
Why You Should Have a Knowledge Base
First of all, I’d like to cover why you should have a knowledge base. You can have different kinds: a personal one consisting of all the context you have personally, or a company-wide knowledge base consisting of knowledge or context the company possesses.
The reason you should have a knowledge base is that information is extremely valuable. The more information you can store and later access when needed, the better you will perform. You will, for example, be able to:
- Make better decisions because you have access to more context
- More quickly pick up on previous topics without having to look through a variety of different sources
- Align different people together because they have a single source of truth
These concepts apply to both personal and company-wide knowledge bases. I also believe that knowledge bases have become far more powerful because you can query them with LLMs. Previously, you would have had to manually look through the knowledge base to find relevant information, relying on your own memory to recall whether a certain piece of information was stored there.
Now that is completely turned around. The LLM can itself query the knowledge base — for example, with a RAG-type approach — and automatically find relevant information immediately. The LLM can itself decide when it needs to use the knowledge base.
In other words, you completely remove the human-in-the-loop requirement to access information, which makes the knowledge base so much more powerful.
Capturing Information into the Knowledge Base
The first step is, of course, to capture information into the knowledge base. Depending on how your knowledge base is built, this can happen in a variety of different ways.
The first thing I urge you to do is think of all the different sources of information you have access to, either personally or at the company. These include, for example:
- Meetings
- Your project management tool, such as Linear
- Your coding agent, such as Claude Code or Codex — what you’ve been working on with these models and which tasks are completed
- Physical office discussions
You can probably think of many other sources of information. The point is that you should map out all these different information sources and figure out an automatic way to route information from them into your knowledge base.
You and other people will not be willing to spend more time manually putting things into knowledge bases. You need to figure out a way to automatically do this to keep your knowledge base up to date.
It’s important that you fully automate the routing of information from the source to the knowledge base. If you require a manual step — for example, pasting meeting notes into the knowledge base — you’ll definitely forget about it and lose important context. The whole point of the knowledge base is that you store absolutely all information there and leave nothing out. That’s what makes a knowledge base so powerful.
For example, with meeting notes, you can have a cron job that syncs daily. It takes each meeting note that everyone in the company has had, or that you have had personally, and stores it in the knowledge base. You can set up a similar cron job for Linear or your project management tool to sync everything that happened there. Sync your coding agent logs — what you’ve been working on and anything you’ve discussed with your coding agent — and so on. All of this can easily be synced into the knowledge base with a daily cron job.
Physical office discussions are harder to fully automate. I haven’t fully figured this one out myself, but two options would be:
- Recording everything going on at all times, which would of course require consent
- Manually writing things down after having a discussion in the office
However, you might not even need to explicitly store office discussions, because most times after a physical discussion, the person I spoke with or I will take context from that discussion and write it into the coding agent. That discussion was usually prompted by a question about an implementation, so if that knowledge is actively used in your coding agent afterwards, you can fetch it from the coding agent logs.
If you complete this step successfully and store all the context you encounter every day into your knowledge base, you’ve done most of the work. This is the hard part. In the next section, I’ll cover the easier part: actively using that information when making decisions or interacting with your coding agents.
Utilizing Information from the Knowledge Base
If you have a synced knowledge base with all the information you require, you can now move on to actively utilizing it. There are two main approaches:
- Query the knowledge base when you have a question. This should be done through your coding agent — you ask it a question, and it should know to query the knowledge base to find the answer.
- Have the coding agent passively utilize the knowledge base whenever it does work.
The first application is pretty self-explanatory, so I’ll spend more time on the second point.
Having the coding agent passively utilize the knowledge base whenever it does work — for example, to implement a feature, fix a bug, and so on — is very powerful. There are two main approaches to doing this.
Grep-Based Inference
One approach is to have a top-level markdown file in the knowledge base that explains the entire knowledge base and where the different information is stored. This file is updated whenever you add more information to the knowledge base.
The upside of this approach is that you’re using grep, which is often more powerful than embedding-based search because it’s better able to find the correct information when needed. However, this also requires you to put that markdown file into the context of the LLM all the time. This file can grow quite large, which can become a problem after a while.
Embedding-Based Inference
The second approach is embedding-based inference. This is what GBrain is built for. Whenever you run a query, you run an embedding search — like a RAG — against the knowledge base, and it fetches relevant chunks. If the LLM determines that the fetched information is relevant, it can look further into the relevant files.
This is probably the better approach for using the knowledge base during inference because it doesn’t require an active search and doesn’t consume a large number of input tokens on the knowledge base for every task you perform.
Which approach works best will ultimately depend on your specific use cases.
Conclusion
All in all, I urge you to:
- Try to set up a knowledge base
- Write as much information into it as possible
- Read how others have set up their knowledge bases
- Try to set it up yourself
Then actively use this knowledge base whenever you do work on your computer using a coding agent. Knowledge bases will become incredibly powerful and valuable in the years to come, and building one can give you a real advantage — because having access to a lot of information will be a definite asset in the future. Furthermore, this is data specific to your company or personal context that, in many cases, only you have access to. If you don’t store it, you may never be able to retrieve that information again.