How Graph RAG Transforms Knowledge Discovery


i spent last sunday debugging a search tool over 2,000 news articles about ukraine. the tool used vector embeddings. i typed "what groups are planning attacks on civilian infrastructure?" and got back five articles about dairy farming in wisconsin. not helpful.
the embeddings were "correct" by every metric. cosine similarity scores looked great. but the system couldn't connect "novorossiya" mentions across 30 documents to see they were targeting banks, factories, and tv stations. it just found words that lived near "attack" in vector space. "milk attack" must be a thing.
you've probably done this. i know i have. built a rag system that works perfectly for simple fact lookups. then watched it fail the moment someone asks a question requiring actual synthesis.
six hours of my life i'll never get back.
regular rag works fine when you're asking "what's the capital of france?" it's great for "find me the paragraph about jeff's birthday." it's terrible for "what patterns show up across all our customer complaints?" it's terrible for "what are these documents trying to tell me that i haven't thought to ask about?"
i used to think the problem was my embeddings. i'd spend weeks tuning chunk sizes, trying different models, adding metadata. we even trained a custom embedding model on our domain. costs piled up. the dairy farm problem stayed.
the first time i tried graphrag, i was skeptical. "another fancy name for the same thing," i thought. i'd seen "graph" slapped on enough products to last a lifetime. but i was wrong. fundamentally wrong.
what regular rag does: it turns your text into vectors. millions of floating-point numbers that capture "meaning" in some abstract space. when you ask a question, it turns that into a vector too. then it finds the closest text chunks. it's a geometric distance problem. it's math.
what graphrag does: it reads your documents with an llm and actually builds a map. it says "here's a person, here's an organization, here's what they did to each other." it creates a knowledge graph with entities as nodes and relationships as edges. then it summarizes communities of connected things.
here's a question people always ask: "isn't this just expensive metadata?" no. it's more like having a researcher read everything, take notes on index cards, and organize them by theme. the surprising part? the llm does this during indexing, not at query time. once the graph is built, queries become cheap.
my coworker tried to test both systems with the same question about ukrainian news. she asked "who is targeting financial institutions?" regular rag returned three articles that mentioned "bank" but had nothing to do with planned attacks. they were about currency exchange rates. the top result was from a russian news site talking about interest rates. useless.
graphrag returned a structured answer. it listed specific groups, their targets (privatbank atms, credit unions), and linked back to 17 source documents. it connected mentions of the same group across russian and ukrainian sources. it didn't just find the word "bank"—it found the relationship between entities and actions.
the difference isn't better math. it's better reading comprehension.
what actually happens during graphrag indexing is kind of wild. the llm processes your text in chunks. for each chunk, it extracts entities and relationships. it says "this person spoke at this place" or "this organization planned this action." it even extracts factual claims with dates and specifics. then it merges all these extractions into a graph. the same entity mentioned 50 times becomes one node with 50 pieces of evidence attached.
most tutorials tell you to use small chunks for rag. 256 tokens, maybe 512. they say bigger chunks dilute meaning. but graphrag works better with larger chunks because context matters for relationship extraction. a 600-token window gives the llm enough room to see "john met with mary at the warehouse" in one piece. the surprising detail: this actually reduces total llm calls compared to traditional rag's overlapping windows.
look, here's what i mean. with regular rag, you might chunk a 1-million-token corpus into 2,000 chunks of 512 tokens. with graphrag, you might use 1,700 chunks of 600 tokens. fewer chunks, but each one gets processed for entities and relationships. the upfront cost is higher because you're asking the llm to understand not just encode. but once it's done, you've built something that can answer questions about the whole corpus, not just find similar sentences.
why the dns method fails
people think they can fix regular rag with better retrieval strategies. hierarchical indices, multi-hop retrieval, hyde. i've tried them. they help at the margins. but they all share the same flaw: they're still searching for similar text, not similar ideas.
when you ask "what are the main themes in my data?" regular rag has no vector to match. it can't create a summary from scratch. it can only find what's already written. graphrag, meanwhile, has pre-generated community summaries. it knows that clusters of entities form around topics like "military activity" or "infrastructure threats" because it mapped the relationships.
the problem isn't what you think. it's not that vector search is bad. vectors are great for many things. the problem is that some questions require synthesis, not retrieval. they require connecting a mention on page 3 to a mention on page 300 and realizing they're the same person using different aliases.
graphrag handles this because the graph structure captures identity. the llm extractor might pull "novorossiya" and "new russia" as separate entities, but they get clustered together during graph construction. then when you query, you get both names in context. regular rag would treat those as different vectors in space.
what breaks in practice:
here's what broke when we deployed regular rag for our analysts:
it couldn't connect partial names across documents ("senator smith" vs "john smith")
it missed implicit relationships buried in long paragraphs
it gave different answers to the same question worded slightly differently
it had no answer for "what patterns do you see?"
graphrag fixed these. not perfectly. but noticeably. the analyst team stopped asking for raw document access after two weeks. that was the real win.
i almost didn't write this section.
the random tangent is supposed to be off-topic. here's mine: our team names all internal tools after greek gods. we have prometheus for metrics, athena for search, hermes for messaging. it was cute for three tools. we now have fourteen tools and nobody knows which god does what. i spent twenty minutes last week explaining to a new hire that "daedalus" handles document ingestion and no, it's not related to "icarus" the monitoring tool. maybe we should have used boring names like "search-v2" and "metrics-prod." but where's the fun in that?
this matters because it reminds you that technology is built by people who name things badly and get confused by their own conventions. graphrag isn't magic. it's a system built by researchers who probably have their own version of the greek god problem. they call it "community detection" and "entity extraction" but it's really just "getting a computer to take notes the way a good analyst would."
most people don't need graphrag.
there. i said it. if you're building a chatbot over a product manual, use regular rag. it's simpler, cheaper, faster. if your questions are mostly "where is the reset button?" or "what's the return policy?" you don't need a knowledge graph.
graphrag is overkill for small datasets. if you have 50 documents, just read them. if you have 500, regular rag will probably work fine. the tipping point is somewhere around "i can't keep the whole story in my head anymore." that happened for us at about 1,000 documents.
it's also too expensive for high-frequency updates. rebuilding the graph every day costs real money. if your data changes hourly, graphrag will hurt. microsoft's paper notes indexing took 281 minutes for a 1-million-token dataset. that's fine for static archives, brutal for live news.
and you need a decent llm for extraction. gpt-3.5 might work. but gpt-4 is noticeably better at finding relationships. if you're using a weak model, you'll get a weak graph. garbage in, garbage out.
the biggest misconception? that graphrag replaces regular rag. it doesn't. it's another tool. we still use vector search for quick fact lookups. we use graphrag for investigation, synthesis, pattern-finding. smart teams use both.
i still think about that sunday sometimes. should have walked my dog instead of staring at embedding visualizations. but at least now when someone asks "what's really going on in this data?" i have an answer that isn't about dairy farming.
here's a question for you: what's the last question your search tool failed to answer? was it a fact, or was it a pattern?
Enjoyed this article? Check out more posts.
View All Posts