Advanced Retrieval Techniques In RAG | Part 02 | Parent Document Retrieval
Hey there, welcome back to the second article in the Advanced Retrieval Techniques In RAG Series. In the last article post, we went over the basics of advanced retrieval techniques and explained how to build a simple RAG pipeline. In this article, we’ll dive deeper into understanding how parent document retrieval works in a RAG pipeline.

In the last article post, we dived deeper into two main techniques of Small-to-Big retrieval techniques in a RAG pipeline. We talked of Parent Document Retrieval or the Smaller Child chunks Referring to Bigger Parent Chunk.
Chunk References: Smaller Child Chunks Referring to Bigger Parent Chunk
Just to give you a quick recap of what parent document retrieval or Small Child Chunks Referring to Bigger Parent Chunk is, here is what we talked of in the last article:
Begin by retrieving smaller segments of data that is most relevant to answering a query, then use their associated parent identifiers to access and return the larger parent chunk of data that will be passed as context to the LLM (Large Language Model). In LangChain, this can be done with the ParentDocumentRetriever
Let’s build a pipeline using this technique. We’ll basically build child chunks that point or reference a larger parent chunk. During query time, we perform a search against the smaller chunks and retrieve their parent chunks and pass it to the LLM as context. This will allow us to pass in a larger and much richer context semantically to the LLM hence, reducing the chances of hallucination.
Data Ingestion
To ingest data into our vector store and have it saved permanently there are a couple of steps we’ll follow to achieve this.
Project Directory Setup
Based on the project directory setup we created in the last article. Move into the ParentDocumentRetrieval folder and add a main.py
file into it.

Create Child Nodes or Chunks
For each segment of text that is 1024 characters long, we further divide it into smaller segments:
- Eight segments are 128 characters each.
- Four segments are 256 characters each.
- Two segments are 512 characters each. Additionally, we include the initial 1024-character text segment to our collection of segments.
At the end of this all we’ll have 173 individual nodes. It’s crucial to set the chunk_overlap=0
. Let’s write the code to do exactly this.
from llama_index import (
Document,
StorageContext,
VectorStoreIndex,
SimpleDirectoryReader,
ServiceContext,
load_index_from_storage,
)
from llama_index.retrievers import RecursiveRetriever
from llama_index.node_parser import SimpleNodeParser
from llama_index.embeddings import OpenAIEmbedding
from llama_index.schema import IndexNode
from llama_index.llms import OpenAI
# for loading environment variables
from decouple import config
import os
# set env variables
os.environ["OPENAI_API_KEY"] = config("OPENAI_API_KEY")
# create LLM and Embedding Model
embed_model = OpenAIEmbedding()
llm = OpenAI(model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
embed_model=embed_model, llm=llm)
# load data
documents = SimpleDirectoryReader(
input_dir="../dataFiles").load_data(show_progress=True)
doc_text = "\n\n".join([d.get_content() for d in documents])
docs = [Document(text=doc_text)]
# create nodes parser
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024)
# split into nodes
base_nodes = node_parser.get_nodes_from_documents(documents=docs)
# set document IDs
for idx, node in enumerate(base_nodes):
node.id_ = f"node-{idx}"
# create parent child documents
sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
SimpleNodeParser.from_defaults(chunk_size=c, chunk_overlap=0) for c in sub_chunk_sizes
]
all_nodes = []
for base_node in base_nodes:
for n in sub_node_parsers:
sub_nodes = n.get_nodes_from_documents([base_node])
sub_inodes = [
IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
]
all_nodes.extend(sub_inodes)
# also add original node to node
original_node = IndexNode.from_text_node(base_node, base_node.node_id)
all_nodes.append(original_node)
all_nodes_dict = {n.node_id: n for n in all_nodes}
print(all_nodes[0])
print(len(all_nodes_dict))
# creating index
index = VectorStoreIndex(nodes=all_nodes, service_context=service_context)

That’s the first step, so what next from here?
Creating A Retriever (Recursive Retriever)
Now that we have the embeddings created, let’s move into creating a retriever. Here is the code for doing this:
from llama_index import (
Document,
StorageContext,
VectorStoreIndex,
SimpleDirectoryReader,
ServiceContext,
load_index_from_storage,
)
from llama_index.retrievers import RecursiveRetriever
from llama_index.node_parser import SimpleNodeParser
from llama_index.embeddings import OpenAIEmbedding
from llama_index.schema import IndexNode
from llama_index.llms import OpenAI
# for loading environment variables
from decouple import config
import os
# set env variables
os.environ["OPENAI_API_KEY"] = config("OPENAI_API_KEY")
# create LLM and Embedding Model
embed_model = OpenAIEmbedding()
llm = OpenAI(model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
embed_model=embed_model, llm=llm)
# load data
documents = SimpleDirectoryReader(
input_dir="../dataFiles").load_data(show_progress=True)
doc_text = "\n\n".join([d.get_content() for d in documents])
docs = [Document(text=doc_text)]
# create nodes parser
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024)
# split into nodes
base_nodes = node_parser.get_nodes_from_documents(documents=docs)
# set document IDs
for idx, node in enumerate(base_nodes):
node.id_ = f"node-{idx}"
# create parent child documents
sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
SimpleNodeParser.from_defaults(chunk_size=c, chunk_overlap=0) for c in sub_chunk_sizes
]
all_nodes = []
for base_node in base_nodes:
for n in sub_node_parsers:
sub_nodes = n.get_nodes_from_documents([base_node])
sub_inodes = [
IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
]
all_nodes.extend(sub_inodes)
# also add original node to node
original_node = IndexNode.from_text_node(base_node, base_node.node_id)
all_nodes.append(original_node)
all_nodes_dict = {n.node_id: n for n in all_nodes}
# creating index
index = VectorStoreIndex(nodes=all_nodes, service_context=service_context)
# creating a chunk retriever
vector_retriever_chunk = index.as_retriever(similarity_top_k=2)
retriever_chunk = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_chunk},
node_dict=all_nodes_dict,
verbose=True,
)
# retrieve needed nodes
nodes = retriever_chunk.retrieve(
"Can you tell me about the key concepts for safety finetuning"
)
for node in nodes:
print(node)
The code above only retrieves and prints the nodes onto the screen, here is the output of the code:

Now, let’s move ahead and implement a RetrieveQueryEngine
here is the code we can use for this:
from llama_index import (
Document,
StorageContext,
VectorStoreIndex,
SimpleDirectoryReader,
ServiceContext,
load_index_from_storage,
)
from llama_index.retrievers import RecursiveRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.node_parser import SimpleNodeParser
from llama_index.embeddings import OpenAIEmbedding
from llama_index.schema import IndexNode
from llama_index.llms import OpenAI
# for loading environment variables
from decouple import config
import os
# set env variables
os.environ["OPENAI_API_KEY"] = config("OPENAI_API_KEY")
# create LLM and Embedding Model
embed_model = OpenAIEmbedding()
llm = OpenAI(model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
embed_model=embed_model, llm=llm)
# load data
documents = SimpleDirectoryReader(
input_dir="../dataFiles").load_data(show_progress=True)
doc_text = "\n\n".join([d.get_content() for d in documents])
docs = [Document(text=doc_text)]
# create nodes parser
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024)
# split into nodes
base_nodes = node_parser.get_nodes_from_documents(documents=docs)
# set document IDs
for idx, node in enumerate(base_nodes):
node.id_ = f"node-{idx}"
# create parent child documents
sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
SimpleNodeParser.from_defaults(chunk_size=c, chunk_overlap=0) for c in sub_chunk_sizes
]
all_nodes = []
for base_node in base_nodes:
for n in sub_node_parsers:
sub_nodes = n.get_nodes_from_documents([base_node])
sub_inodes = [
IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
]
all_nodes.extend(sub_inodes)
# also add original node to node
original_node = IndexNode.from_text_node(base_node, base_node.node_id)
all_nodes.append(original_node)
all_nodes_dict = {n.node_id: n for n in all_nodes}
# creating index
index = VectorStoreIndex(nodes=all_nodes, service_context=service_context)
# creating a chunk retriever
vector_retriever_chunk = index.as_retriever(similarity_top_k=2)
retriever_chunk = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_chunk},
node_dict=all_nodes_dict,
verbose=True,
)
query_engine_chunk = RetrieverQueryEngine.from_args(
retriever_chunk, service_context=service_context
)
response = query_engine_chunk.query(
"What did the president say about Covid-19"
)
print(str(response))
In the code above, I got rid of this lines of code:
retriever_chunk = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_chunk},
node_dict=all_nodes_dict,
verbose=True,
)
# retrieve needed nodes
nodes = retriever_chunk.retrieve(
"Can you tell me about the key concepts for safety finetuning"
)
for node in nodes:
print(node)
Running the code, this is the output:

Evaluation
Now, let’s move on to evaluating the pipeline’s performance.
Step 1: Copy Database Over
The first thing I would do is to copy the Sqlite file from the basicRAG folder into the ParentDocumentRetrieval folder, I do this to keep the metrics of the basic RAG pipeline we built from the last article over. This is done to enable us to compare performance, this is not a must to do if you don’t want to. There is a way around this where you can specify the path to the Sqlite database you wish to us. This can be done by:
# RAG pipeline evals
tru = Tru(database_file="<file_path_to_database_you_want_to_use>")
You can use the code below to still evaluate your pipeline’s performance. Once done copying over the default.sqlite
file, you should have your project directory looking like this:

You can also evaluate your final RAG pipeline against a set of questions. Create a file called eval_questions.txt
in side this file, have the following questions:
- What measures did the speaker announce to support Ukraine in the conflict mentioned?
- How does the speaker propose to address the challenges faced by the United States in the face of global conflicts, specifically mentioning Russia’s actions?
- What is the speaker’s plan to combat inflation and its impact on American families?
- How does the speaker suggest the United States will support the Ukrainian people beyond just military assistance?
- What is the significance of the speaker’s reference to the NATO alliance in the context of recent global events?
- Can you detail the economic sanctions mentioned by the speaker that are being enforced against Russia?
- What actions have been taken by the U.S. Department of Justice in response to the crimes of Russian oligarchs as mentioned in the speech?
- How does the speaker describe the American response to COVID-19 and the current state of the pandemic in the country?
- What are the four common-sense steps the speaker mentions for moving forward safely in the context of COVID-19?
- How does the speaker address the economic issues such as job creation, infrastructure, and the manufacturing sector in the United States?
The eval_questions.txt
file should be in the same directory as your main.py
file. The purpose of this is to enable us to evaluate the RAG pipeline against multiple questions instead of just a single question as we had in the last article.
from typing import List
from llama_index import (
Document,
VectorStoreIndex,
SimpleDirectoryReader,
ServiceContext,
)
from llama_index.retrievers import RecursiveRetriever
from llama_index.query_engine import RetrieverQueryEngine
from llama_index.node_parser import SimpleNodeParser
from llama_index.embeddings import OpenAIEmbedding
from llama_index.schema import IndexNode
from llama_index.llms import OpenAI
# for loading environment variables
from decouple import config
import os
from trulens_eval import Feedback, Tru, TruLlama
from trulens_eval.feedback import Groundedness
from trulens_eval.feedback.provider.openai import OpenAI as OpenAITruLens
import numpy as np
# set env variables
os.environ["OPENAI_API_KEY"] = config("OPENAI_API_KEY")
# create LLM and Embedding Model
embed_model = OpenAIEmbedding()
llm = OpenAI(model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(
embed_model=embed_model, llm=llm)
# load data
documents = SimpleDirectoryReader(
input_dir="../dataFiles").load_data(show_progress=True)
doc_text = "\n\n".join([d.get_content() for d in documents])
docs = [Document(text=doc_text)]
# create nodes parser
node_parser = SimpleNodeParser.from_defaults(chunk_size=1024)
# split into nodes
base_nodes = node_parser.get_nodes_from_documents(documents=docs)
# set document IDs
for idx, node in enumerate(base_nodes):
node.id_ = f"node-{idx}"
# create parent child documents
sub_chunk_sizes = [128, 256, 512]
sub_node_parsers = [
SimpleNodeParser.from_defaults(chunk_size=c, chunk_overlap=0) for c in sub_chunk_sizes
]
all_nodes = []
for base_node in base_nodes:
for n in sub_node_parsers:
sub_nodes = n.get_nodes_from_documents([base_node])
sub_inodes = [
IndexNode.from_text_node(sn, base_node.node_id) for sn in sub_nodes
]
all_nodes.extend(sub_inodes)
# also add original node to node
original_node = IndexNode.from_text_node(base_node, base_node.node_id)
all_nodes.append(original_node)
all_nodes_dict = {n.node_id: n for n in all_nodes}
# creating index
index = VectorStoreIndex(nodes=all_nodes, service_context=service_context)
# creating a chunk retriever
vector_retriever_chunk = index.as_retriever(similarity_top_k=2)
retriever_chunk = RecursiveRetriever(
"vector",
retriever_dict={"vector": vector_retriever_chunk},
node_dict=all_nodes_dict,
verbose=True,
)
query_engine_chunk = RetrieverQueryEngine.from_args(
retriever_chunk, service_context=service_context
)
# RAG pipeline evals
tru = Tru()
openai = OpenAITruLens()
grounded = Groundedness(groundedness_provider=OpenAITruLens())
# Define a groundedness feedback function
f_groundedness = Feedback(grounded.groundedness_measure_with_cot_reasons).on(
TruLlama.select_source_nodes().node.text
).on_output(
).aggregate(grounded.grounded_statements_aggregator)
# Question/answer relevance between overall question and answer.
f_qa_relevance = Feedback(openai.relevance).on_input_output()
# Question/statement relevance between question and each context chunk.
f_qs_relevance = Feedback(openai.qs_relevance).on_input().on(
TruLlama.select_source_nodes().node.text
).aggregate(np.mean)
tru_query_engine_recorder = TruLlama(query_engine_chunk,
app_id='Parent_document_retrieval',
feedbacks=[f_groundedness, f_qa_relevance, f_qs_relevance])
eval_questions = []
with open("./eval_questions.txt", "r") as eval_qn:
for qn in eval_qn:
qn_stripped = qn.strip()
eval_questions.append(qn_stripped)
def run_eval(eval_questions: List[str]):
for qn in eval_questions:
# eval using context window
with tru_query_engine_recorder as recording:
query_engine_chunk.query(qn)
run_eval(eval_questions=eval_questions)
# run dashboard
tru.run_dashboard()
Once we run the code, you can click on the link in the terminal

This will open up a Streamlit we app that looks like so:

From the image above, we can see that the Parent Document Retrieval
pipeline is doing well compared to the LlamaIndex_App1
app which is the basic RAG pipeline.
Feel free to click around and explore the Streamlit dashboard more.
Next Article
In the next article, we’ll dive into Sentence Window Retrieval Technique and see just how well it performs and reduces token usage hence, saving you more money.
Conclusion
Congratulations for making it this far, I hope this has given you the understanding of how Parent Document retrieval works. Hope you find use cases for it in your day to day building of RAG pipelines.
Other plaforms you can reach out to me:
Happy coding! and see you next time, the world keeps spinning.