Post

Chatting with Your PDFs - a Python and LangChain Guide

Chatting with Your PDFs - a Python and LangChain Guide

Ever wished you could just ask your PDFs questions and get straight answers? Imagine turning those static documents into interactive conversations. In this guide, we’ll show you how to build a system that lets you chat with your PDFs using Python and LangChain. By the end, you’ll have a tool that transforms your PDFs into responsive resources, ready to answer your queries on demand.​

png

The process involves five key steps:

  1. Loading a PDF Document: Converting the PDF into text data for processing.​

  2. Splitting the Text into Chunks: Dividing the text into manageable sections to enhance retrieval and analysis.​

  3. Creating a Vectorized Database: Embedding these text chunks into vectors and storing them in a vector database, facilitating efficient similarity searches.​

  4. Retrieving Relevant Documents: Identifying and extracting the most pertinent text chunks in response to user queries.​

  5. Generating Responses with an LLM: Utilizing an LLM to generate accurate answers based on the retrieved context.​

To show how this works, we’ll use a sample transcript from a fictional team meeting. In this meeting, the team discusses which project to tackle next, evaluating three app ideas: CandyShop, FitFlow, and BookNest. They go over each project’s features, technical aspects, user experience, and business potential to decide on the best option. You can read the full transcript here.

Each step is explained with beginner-friendly descriptions, code examples, and references to the official LangChain documentation, ensuring a comprehensive understanding of the process. Let’s get started!

This guide builds upon concepts and code introduced by Rishab Kumar, in his tutorial on interacting with PDFs using LangChain and LLMs. You can watch his original tutorial here.​

Step 1. Loading a PDF document

The first step is to load the content of a PDF file into a format that we can work with. This involves extracting the text and metadata from the PDF file.

PyPDFLoader Documentation

LangChain provides PyPDFLoader (in the langchain_community.document_loaders module) for this purpose. PyPDFLoader uses the PyPDF library under the hood to read PDF files and extract text.

  • What it does: PyPDFLoader reads a PDF file and converts it into one or more LangChain Document objects containing the text and metadata of the PDF. It supports configurations like handling password-protected PDFs, extracting images, and choosing how to split the PDF (by page or as one continuous document).

  • Input: The loader takes a file path (str or Path) to the PDF. Optionally, you can provide a password (for encrypted PDFs) or set parameters like mode to control splitting. No special credentials are needed to use it.

  • Output: The load() method returns a list of Document objects, where each document contains a portion of the PDF text (by default, one document per page if configured). Each [Document] includes the page content as text and associated metadata (like page number, author, title, etc.).

  • Usage details: By default, PyPDFLoader will treat the whole PDF as a single text flow (so that text isn’t cut off between pages). If you prefer each page separately, you can initialize it with mode="page", and it will return one [Document] per page (with page numbers in metadata). After loading, you can access the Document.page_content for text and Document.metadata for metadata.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Install required packages for PyPDFLoader (uncomment the line below if running locally)
# !pip install langchain_community pypdf

from langchain_community.document_loaders import PyPDFLoader

# Initialize the PDF loader with the path to a PDF file.
file_path = "pdfs/meeting_transcript.pdf"  # replace with your PDF file path
loader = PyPDFLoader(file_path)  # You can set mode="page" to split by page if desired.

# Load the PDF documents.
docs = loader.load()

# Print the number of documents loaded and the first 200 characters of the first document's content.
total_docs = len(docs)
first_doc_content = docs[0].page_content[:200]

print(f"[-] Loaded {total_docs} document(s) from the PDF.")
print(f"[-] The first 200 characters of the first document's content are:\n\n```\n{first_doc_content}\n```")
1
2
3
4
5
6
7
8
9
10
11
12
[-] Loaded 4 document(s) from the PDF.
[-] The first 200 characters of the first document's content are:

```
Software Development Meeting – Project Selection Discussion 
Date: March 20, 2025 
Participants: 
• Sarah Johnson (Product Manager) 
• Mike Chen (Lead Developer) 
• Laura Singh (UX/UI Designer) 
• Ale
```

Step 2. Splitting the PDF Text into Chunks

Large documents often need to be divided into smaller pieces (chunks) before processing, so that the LLM can handle them within its context size and so that relevant pieces can be retrieved individually.

RecursiveCharacterTextSplitter Documentation

LangChain’s RecursiveCharacterTextSplitter (in langchain_text_splitters) is a convenient utility for splitting text.

  • What it does: This text splitter breaks a long text into smaller chunks, trying to do so at natural boundaries like paragraph or sentence breaks. It “recursively” looks for the largest separator (like "\n\n" for paragraph) and falls back to smaller separators (like "\n" for line, " " for space, and ultimately character level) to respect a desired chunk size​. This approach preserves whole sentences and paragraphs when possible, making chunks more semantically meaningful. See Recursively split by character for more details.

  • Input: You specify parameters like chunk_size (maximum characters in a chunk) and chunk_overlap (number of characters to overlap between consecutive chunks, which helps maintain context continuity) when instantiating the splitter​. You can also provide a custom list of separators (the default is ["\n\n", "\n", " ", ""] which means try paragraph, then newline, then space, then no separator)​. If not given, it uses the defaults which are suitable for English text.

  • Output: The splitter’s methods output a list of chunks. For example, split_text(some_long_text) returns a list of text strings, each a chunk not exceeding the chunk_size. There is also split_documents(list_of_Documents) which splits each Document’s text and returns a new list of Documents (with smaller page_content and inherited metadata).

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Create a text splitter with 512 max characters per chunk and with 100 characters overlap.
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=24,
    add_start_index=True
)

# Suppose docs is the list of Document objects from the PDF loader (from step 1).
# We will split each document in docs into smaller chunks.
chunks = text_splitter.split_documents(docs)

# Print the number of chunks created and the first 100 characters of the first chunk's content.
total_chunks = len(chunks)
first_chunk_content = chunks[0].page_content[:100]

print(f"The documents where splitted into {total_chunks} chunks.")
print(f"The first 100 characters of the first chunk are:\n\n```\n{first_chunk_content}```")
1
2
3
4
5
6
7
8
The documents where splitted into 19 chunks.
The first 100 characters of the first chunk are:

```
Software Development Meeting – Project Selection Discussion 
Date: March 20, 2025 
Participants: 
• ```

We can also inspect the amount of characters in each chunk to ensure that they are within the desired range:

1
2
for i, chunk in enumerate(chunks, start=1):
    print(f"Chunk n°{i} has {len(chunk.page_content)} characters")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Chunk n°1 has 488 characters
Chunk n°2 has 424 characters
Chunk n°3 has 455 characters
Chunk n°4 has 442 characters
Chunk n°5 has 339 characters
Chunk n°6 has 444 characters
Chunk n°7 has 490 characters
Chunk n°8 has 496 characters
Chunk n°9 has 448 characters
Chunk n°10 has 410 characters
Chunk n°11 has 179 characters
Chunk n°12 has 406 characters
Chunk n°13 has 440 characters
Chunk n°14 has 464 characters
Chunk n°15 has 418 characters
Chunk n°16 has 437 characters
Chunk n°17 has 361 characters
Chunk n°18 has 439 characters
Chunk n°19 has 467 characters

Diving into the Chunked Content

We might expect each chunk to have exactly chunk_size characters, with the last chunk_overlap characters of one chunk matching the first chunk_overlap characters of the next. However, this is not always the case. To visualize this, let’s take a simpler example with chunk_size=500 and chunk_overlap=50.

1
2
3
4
5
6
7
8
9
dumb_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    add_start_index=True
)

dumb_chunks = dumb_text_splitter.split_documents(docs)

print(f"The documents where splitted into {len(dumb_chunks)} chunks.")
1
The documents where splitted into 19 chunks.

Let’s print the length of the first ten chunks:

1
2
3
for i, chunk in enumerate(dumb_chunks[:10], start=1):
    print(f"Chunk n°{i} has {len(chunk.page_content)} characters")
print("...")
1
2
3
4
5
6
7
8
9
10
11
Chunk n°1 has 488 characters
Chunk n°2 has 424 characters
Chunk n°3 has 455 characters
Chunk n°4 has 442 characters
Chunk n°5 has 339 characters
Chunk n°6 has 444 characters
Chunk n°7 has 490 characters
Chunk n°8 has 496 characters
Chunk n°9 has 448 characters
Chunk n°10 has 410 characters
...

None of these chunks has exactly 500 characters. Next, let’s compare the last 50 characters of the first chunk to the first 50 characters of the second chunk:

1
2
3
4
5
6
7
8
first_chunk_length = len(dumb_chunks[0].page_content)
first_chunk_final_content = dumb_chunks[0].page_content[first_chunk_length-50:]
second_chunk_initial_content = dumb_chunks[1].page_content[:50]

print(f"The last 50 characters of the first chunk are:\n\n```\n{first_chunk_final_content}\n```")
print("\n")
print(f"The first 50 characters of the second chunk are:\n\n```\n{second_chunk_initial_content}\n```")

1
2
3
4
5
6
7
8
9
10
11
12
The last 50 characters of the first chunk are:

```
xciting concepts—CandyShop, FitFlow, and BookNest.
```


The first 50 characters of the second chunk are:

```
Let's discuss each thoroughly, considering feature
```

Looking at the output, you’ll notice that the final 50 characters of the first chunk aren’t identical to the first 50 characters of the second chunk. But, why is this happening?

This is because RecursiveCharacterTextSplitter looks for “friendly” breakpoints (such as sentence or paragraph boundaries) near the size limit instead of splitting mechanically at the exact 500-character mark. This approach produces more readable chunks, even though the overlap may not be a perfect character-for-character match.

Step 3. Creating a Vectorized Database

With the text now in manageable chunks, the next step is to index them in a vector store. A vector store is essentially a database of numerical vectors that lets us do similarity search – we’ll convert each text chunk into a vector embedding and store them. Later, we can convert a query into a vector and find which stored vectors (chunks) are closest in meaning to the query.

The typical process of creating vectorized database involves the following steps:

  1. Embed the documents (transform text into numeric vectors).
  2. Store those vectors in the vector store (along with references to the original text).
  3. At query time, embed the query and search for the nearest vectors, retrieving the corresponding text chunks that are most similar to the query​

To implement this in LangChain, we need two components: an embeddings model to convert text into vectors and a vector store to store these vectors and support similarity search.

A. Generating Embeddings

OllamaEmbeddings Documentation

LangChain offers many embedding model integrations. Here we’ll use OllamaEmbeddings (from langchain_ollama) as an example embedding model. Ollama is a tool for running local LLM models, and OllamaEmbeddings will call an Ollama-served model to get text embeddings.

  • What it does: OllamaEmbeddings is an Embeddings class that generates vector embeddings for text by querying a local Ollama model​. Under the hood, it uses the Ollama API to get the embedding of the input text from the specified model.

  • Input: When instantiating OllamaEmbeddings, you provide the name of the model (which must be installed in your Ollama instance) via the model parameter​. You can also specify a base_url if your Ollama is served on a specific URL, but by default it uses the local instance. Check out the available models in Ollama README on GitHub.

  • Output: The main methods are embed_query(text: str) for embedding a single text (like a query) and embed_documents([texts]) for embedding a list of texts (like document chunks). These return a list of floats (the embedding vector) or a list of vectors for multiple texts. For instance, OllamaEmbeddings().embed_query("Hello world") might return a 768-dimensional vector (length depends on the model), e.g., [-0.0246, -0.0075, 0.0039, ...]​

  • Usage details: Keep in mind that generating embeddings will use the LLM, so it may be slower than specialized embedding models. This is an easy way to get started with local models, but for large document sets you might use more optimized embedding models.

Code

1
2
3
4
5
6
7
# Install langchain-ollama if needed (uncomment if running locally)
# !pip install langchain-ollama

from langchain_ollama import OllamaEmbeddings

# Initialize the embeddings model (make sure Ollama is running and model is pulled)
embeddings = OllamaEmbeddings(model="deepseek-r1")  # replace with the model name you have in Ollama

Let’s embed a sample text and visualize the output of the embed_query() method:

1
2
3
4
5
sample_text = "This is a sample sentence from the document."
vector = embeddings.embed_query(sample_text)

print(f"Embedding vector length: {len(vector)}")
print(f"Sample of vector values: {vector[:5]}")
1
2
Embedding vector length: 3584
Sample of vector values: [0.0051061264, 0.0038971296, -0.01398391, -0.003785599, 8.179655e-05]

B. Storing Embeddings in a Vector Store

Now that we can get embeddings for our document chunks, we need to store them in a vector store. LangChain provides multiple implementations. Two common ones are:

  • InMemoryVectorStore – a simple in-memory vector store (part of langchain_core.vectorstores).

  • FAISS – a wrapper for Facebook’s FAISS vector database (from langchain_community.vectorstores).

Check out LangChain explanation on vector stores and the full list of vector stores.

InMemoryVectorStore Documentation

  • What it does: An in-memory store that keeps vectors and documents in a Python dictionary. It computes similarity (cosine similarity) using numpy at query time​. This is easy to set up and requires no external dependencies.

  • Input: You initialize it with an Embeddings object (so it knows how to embed new queries or documents)​. You then add documents to it using add_documents([...]) or use the classmethod from_texts(texts, embedding=..., metadatas=...) to both embed and add texts in one step.

  • Output: It stores the documents internally. When you query (via similarity_search), it returns a list of Document objects that are similar to the query.

  • Usage details: This is best for smaller datasets or testing, since everything is in memory. There’s also no built-in persistence (the data is lost when the program ends, unless you manually save it). On the plus side, it’s simple and has no setup beyond providing an embedding model. Filtering by metadata is supported via a filter function if needed​.

FAISS Documentation

  • What it does: A more robust vector store that uses the FAISS library for efficient similarity search. FAISS is optimized in C++/CUDA and can handle large volumes of vectors quickly​.

  • Input: You need the faiss library installed (e.g., faiss-cpu via pip) and an Embeddings object. You can build a FAISS vector store by adding documents or using FAISS.from_texts(texts, embedding=...) which will handle embedding the texts and creating the index​. Alternatively, you can instantiate FAISS(embedding_function, index, docstore, index_to_docstore_id) if you already have a FAISS index, but typically the from_texts or from_documents classmethods are easier.

  • Output: The FAISS index is stored in memory (or GPU) while your program runs, but you can save it to disk for reuse (faiss_store.save_local(folder_path) and later FAISS.load_local(folder_path, embedding=...) to reload)​. Querying works similarly with similarity_search and returns [Document] objects.

  • Usage details: FAISS is great for larger datasets and can do approximate nearest neighbor search for speed. Use FAISS when you have a lot of documents or need persistence. Keep in mind you need to have the faiss Python package installed. LangChain’s FAISS integration will automatically handle embedding your documents and storing IDs, etc., if you use the convenient constructors​.

Code

Let’s create a vector store using both InMemoryVectorStore and FAISS and add the document chunks to them. We will start with InMemoryVectorStore:

1
2
3
4
5
6
7
8
9
10
from langchain_core.vectorstores import InMemoryVectorStore

# Assume `chunks` is the list of Document chunks from step 2, and `embeddings` is our OllamaEmbeddings from 3.A.

# Option 1: Using InMemoryVectorStore
in_memory_store = InMemoryVectorStore.from_texts(
    texts=[doc.page_content for doc in chunks], 
    embedding=embeddings
)
print(f"InMemoryVectorStore built with {len(in_memory_store.store)} vectors.")  # .store holds the internal dict of vectors
1
InMemoryVectorStore built with 19 vectors.

Now, let’s create a vector store using FAISS:

1
2
3
4
5
6
7
8
9
10
from langchain_community.vectorstores import FAISS

# Assume `chunks` is the list of Document chunks from step 2, and `embeddings` is our OllamaEmbeddings from 3.A.

# Option 2: Using FAISS vector store (requires faiss library installed)
faiss_store = FAISS.from_texts(
    texts=[doc.page_content for doc in chunks], 
    embedding=embeddings
)
print(f"FAISS vector store built with {faiss_store.index.ntotal} documents.")  # ntotal is number of vectors in the index
1
FAISS vector store built with 19 documents.

We can see both vector stores have been successfully created with 19 embeded vectors indexed.

Step 4. Retrieving Relevant Documents

Now that our document chunks are indexed in the vector store, we can retrieve the pieces that are most relevant to a user’s question. This step is typically called similarity search or nearest-neighbor search: we embed the user’s query with the same embedding model and find which document vectors in the store are closest.

Vector Store Documentation

Using LangChain’s vector store API, the simplest way to do this is by calling the similarity_search() method on the vector store. We specify the query and how many results (k) we want.

  • Input: The similarity_search(query: str, k: int) method takes the raw query text and the number of results to retrieve (k). Optionally, some stores allow filters (e.g., by metadata) or alternative search types, but by default it’s straightforward similarity matching.

  • Output: It returns a list of Document objects that were added to the store (in our case, chunks of the PDF) that are most relevant. These Document objects still contain their page_content (the chunk text) and metadata (e.g., which page it came from, if available). We will use their content as context for the answer.

  • Usage details: For InMemoryVectorStore, similarity_search uses cosine similarity on the numpy-stored vectors​. For FAISS, it uses the FAISS index’s search (which by default might be L2 distance unless cosine similarity was used to build the index with normalized vectors). In both cases, you get back Document objects. You can access doc.page_content to get the text. It’s common to join multiple retrieved chunks into a single context string for the prompt (as we’ll do in the next step). Make sure to handle the case where no relevant document is found (the result list could be empty if nothing was indexed or if k=0).

Code

Let’s retrieve the most relevant document chunk (i.e. k=4) for a sample query: “Which app did the team ultimately prefer to develop?”.

1
2
user_question = "Which app did the team ultimately prefer to develop?"
k = 4

First, we test in memory vector store similarity search:

1
2
3
4
5
6
7
8
9
10
11
# Perform similarity search with InMemoryVectorStore
results = in_memory_store.similarity_search_with_score(user_question, k)

for i, (doc, score) in enumerate(results, start=1):
    snippet = doc.page_content[:100].replace("\n", " ")

    print(f"[-] Chunk {i}")
    print(f"\t- Score: {score}")
    print(f"\t- Metadata: {doc.metadata}")
    print(f"\t- Chunk size: {len(doc.page_content)}")
    print(f"\t- Content: {snippet} (continues...)")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[-] Chunk 1
	- Score: 0.8312360574079253
	- Metadata: {}
	- Chunk size: 179
	- Content: the web front-end and React Native for mobile would streamline development. Given that it’s content- (continues...)
[-] Chunk 2
	- Score: 0.8004943294675131
	- Metadata: {}
	- Chunk size: 418
	- Content: • CandyShop: Easy to implement, strong commercial potential, quick to market, lower complexity.  • F (continues...)
[-] Chunk 3
	- Score: 0.7890622446353018
	- Metadata: {}
	- Chunk size: 439
	- Content: balanced, combining innovation and practicality. CandyShop remains attractive commercially, though   (continues...)
[-] Chunk 4
	- Score: 0.7865978555972704
	- Metadata: {}
	- Chunk size: 339
	- Content: or promotions would amplify organic marketing efforts. Another interesting feature could be group or (continues...)

Next, we test the FAISS vector store similarity search:

1
2
3
4
5
6
7
8
9
10
11
# Perform similarity search with FAISS
results = faiss_store.similarity_search_with_score(user_question, k)

for i, (doc, score) in enumerate(results, start=1):
    snippet = doc.page_content[:100].replace("\n", " ")

    print(f"[-] Chunk {i}")
    print(f"\t- Score: {score}")
    print(f"\t- Metadata: {doc.metadata}")
    print(f"\t- Chunk size: {len(doc.page_content)}")
    print(f"\t- Content: {snippet} (continues...)")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[-] Chunk 1
	- Score: 0.33000147342681885
	- Metadata: {}
	- Chunk size: 179
	- Content: the web front-end and React Native for mobile would streamline development. Given that it’s content- (continues...)
[-] Chunk 2
	- Score: 0.39899593591690063
	- Metadata: {}
	- Chunk size: 418
	- Content: • CandyShop: Easy to implement, strong commercial potential, quick to market, lower complexity.  • F (continues...)
[-] Chunk 3
	- Score: 0.4218757152557373
	- Metadata: {}
	- Chunk size: 439
	- Content: balanced, combining innovation and practicality. CandyShop remains attractive commercially, though   (continues...)
[-] Chunk 4
	- Score: 0.4266526699066162
	- Metadata: {}
	- Chunk size: 437
	- Content: Mike Chen:  CandyShop seems easiest technically, FitFlow has exciting challenges, but BookNest could (continues...)

Both vector stores return similar results. The scores indicate the similarity between the query and the retrieved document chunks, with higher scores indicating greater relevance.

Step 5. Generating an Answer with the LLM

Finally, with the relevant document chunks in hand, we use a Large Language Model to generate an answer to the user’s question, using those chunks as context. We’ll construct a prompt that provides the retrieved information to the LLM and asks the question, then get the LLM’s response.

To do this effectively, LangChain provides the ChatPromptTemplate class to help format the prompt, especially for chat models that expect a series of messages (system, user, assistant). We’ll also use the OllamaLLM class (from langchain_ollama.llms) to interface with an Ollama-served local LLM as our model to generate the answer.

Constructing the Prompt

ChatPromptTemplate Documentation

  • What it does: ChatPromptTemplate is a prompt template specifically designed for chat models (LLMs that take a list of messages as input). It allows you to define a template with multiple roles (system/human/assistant) and insert variables into those messages easily​. For example, you can create a chat prompt with a system instruction and a human question, where parts of the messages are templated.

  • Input: You can create a ChatPromptTemplate in a few ways. One simple way is ChatPromptTemplate.from_template(template_string), which treats the whole template string as a single user message (human role)​. Another way is ChatPromptTemplate.from_messages(list_of_message_prompts), where you can provide a list mixing system/human messages​. Each message can be given as a tuple like ("system", "You are a helpful assistant.") or ("human", "Question: {question}\nContext: {context}"). The template supports placeholders in {} that will be filled in with actual values later.

  • Output: The template itself is not a string but an object you can call or format with your inputs. When you format the template with actual values (e.g., prompt.format(context=..., question=...)), it produces a ChatPromptValue which contains the final messages (or a list of message objects). If you need a string for a non-chat model, you can convert it, but chat models in LangChain can accept the ChatPromptValue or list of messages directly.

Code

First, we create the prompt template, including the system message with the instructions to the LLM and a human message with the context (i.e., the retrieved document chunks) and the user question:

1
2
3
4
5
6
7
from langchain_core.prompts import ChatPromptTemplate

# Create a chat prompt template with a system and human message.
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant who answers questions based on the given context."),
    ("human", "Context:\n{context}\n\nQuestion: {question}")
])

Then, once the user has asked a question, we can make a query to the vector store to retrieve the most relevant document chunks and generate the formatted prompt:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Retrieve the top 4 most relevant chunks based on the user question.
retrieved_docs = faiss_store.similarity_search(
    query="Which app did the team ultimately prefer to develop?", 
    k=4
)

# Combine the retrieved chunks into one context string.
context_text = "\n".join([doc.page_content for doc in retrieved_docs])

# Format the prompt with the actual context and user question.
prompt = prompt_template.format(context=context_text, question=user_question)

print(f"The first 200 characters of the prompt are:\n```\n{prompt[:200]}\n```")
1
2
3
4
5
6
The first 200 characters of the prompt are:
```
System: You are a helpful assistant who answers questions based on the given context.
Human: Context:
the web front-end and React Native for mobile would streamline development. Given that it’s conten
```

Using the LLM to Generate an Answer

With the prompt prepared, we can now call an LLM to get the answer. We’ll use OllamaLLM class, which allows us to run a local model hosted by Ollama through LangChain.

OllamaLLM Documentation

  • What it does: OllamaLLM is an implementation of LangChain’s LLM interface that sends prompts to an Ollama server and returns the model’s completion. It supports both text-completion models and chat-completion models that Ollama hosts. In our use case, we treat it as a text completion API: we provide the full prompt (including system/user messages) as input, and it returns the generated answer.

  • Input: When creating an OllamaLLM, you must specify the model name (just like with embeddings) – e.g., OllamaLLM(model=”llama2”) to use a Llama2 model loaded in Ollama​. You can also pass generation parameters like temperature, top_p, etc., if you want to control the randomness or filtering of the output (these are optional and have defaults). Ensure the model is downloaded (ollama pull ) and the Ollama service is running.

  • Output: The LLM’s generate or invoke method will return the model’s answer as a string (or as a ChatMessage, depending on usage). For simplicity, using .invoke(prompt) is an easy way to get the direct output​. The result is the answer the model gives based on the prompt.

  • Usage details: If using ChatPromptTemplate, you could also combine it with the LLM in a chain (LangChain allows you to do chain = prompt_templatellm and then chain.invoke({…}) by providing the variables​). Here we’ll do it in a simpler way: format the prompt to a string and call llm.invoke(). OllamaLLM takes care of sending this to the Ollama backend. Keep in mind that for large context, the model must fit it within its context window (some Ollama models may have limits like 2048 or 4096 tokens, etc.). The retrieved chunks should be chosen such that the final prompt stays within the model’s context length. Also, since we included a system message, OllamaLLM will handle it as part of the prompt (for chat models, it might prepend it appropriately or you might need to use a chat-specific call, but the LangChain interface abstracts it).

Code

1
2
3
4
5
6
7
8
9
from langchain_ollama.llms import OllamaLLM

# Initialize the LLM (local model via Ollama).
llm = OllamaLLM(model="deepseek-r1")  # replace with an available model name in your Ollama

# Invoke the LLM with the filled prompt to get an answer.
answer = llm.invoke(prompt)

print(f"The LLM's answer is:\n\n```\n{answer}\n```")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
The LLM's answer is:

```
<think>
Okay, so I'm trying to figure out which app the team ultimately preferred to develop based on the context provided. Let me go through this step by step.

First, looking at the initial context, there are three apps mentioned: CandyShop, FitFlow, and BookNest. Each has its own strengths and weaknesses. The team discussed each one, and their thoughts were recorded.

Mike Chen talked about CandyShop being easy to implement with strong commercial potential but less innovative. Sarah Johnson thought it might be good to consider BookNest because of its strategic value despite the competition. Laura Singh suggested user validation methods like surveys for BookNest, while Mike mentioned considering FitFlow too due to its high demand and technology integrations.

Jennifer Lee highlighted the technical opportunities with BookNest, especially Elasticsearch, which is useful for search capabilities. She also noted that FitFlow has high ROI but higher market competition, which could be a risk. CandyShop was deemed safest but less innovative.

Putting this all together, it seems like each app had its own set of advantages and challenges. The team seemed to prefer BookNest because of the strategic reasons mentioned, such as community-centric ideas and dual-platform approach. They also considered user validation methods for BookNest, which adds credibility to their decision-making process.

So, considering all these points, I think the team ultimately preferred BookNest because it balanced technical aspects with strategic value, despite its competition.
</think>

The team ultimately preferred **BookNest** as it balanced innovative features and community-centric ideas against strategic considerations. Despite the competitive landscape, its dual-platform approach and potential for long-term user engagement made it their choice.

Answer: The team ultimately preferred BookNest due to its strategic value, community focus, and potential for long-term growth despite market competition.
```

The generated answer is correct, as BookNest is the app the team decided to develop.

The Full Pipeline: Chatting with Your PDFs

Bringing It All Together

Now that we’ve walked through each step, let’s combine them into a simple script that lets you ask questions to your PDF documents.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS, VectorStore
from langchain_ollama import OllamaEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.documents import Document
from langchain_ollama.llms import OllamaLLM

model = OllamaLLM(model="deepseek-r1")
embeddings = OllamaEmbeddings(model="deepseek-r1")
prompt_template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant who answers questions based on the given context."),
    ("human", "Context:\n{context}\n\nQuestion: {question}")
])


def create_vector_store(pdf_path: str) -> VectorStore:
    loader = PyPDFLoader(pdf_path)
    documents = loader.load()
    
    text_splitter =RecursiveCharacterTextSplitter(
        chunk_size=512,
        chunk_overlap=24,
        add_start_index=True
    )

    chunked_docs = text_splitter.split_documents(documents)
    db = FAISS.from_documents(chunked_docs, embeddings)
    return db

def retrieve_docs(db: VectorStore, query:str, k=4):
    return db.similarity_search(query, k)

def question_pdf(question: str, documents: list[Document]) -> str:
    context = "\n\n".join([doc.page_content for doc in documents])
    prompt = prompt_template.format(context=context, question=question)
    answer = llm.invoke(prompt)
    return answer

We can interactively query your PDFs and receive context-based responses with the following code:

1
2
3
4
5
6
7
8
pdf_path = "pdfs/meeting_transcript.pdf"
db = create_vector_store(pdf_path)

question = "Which app did the team ultimately prefer to develop?"
retrieved_docs = retrieve_docs(db, question)
answer = question_pdf(question, retrieved_docs)

print(f"The answer to the question is:\n{answer}")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
The answer to the question is:

```
<think>
Okay, so I need to figure out which app the development team ultimately preferred to build. The user mentioned several options: CandyShop, FitFlow, and BookNest. Each has its own strengths and weaknesses.

First, looking at CandyShop: It's easy to implement, has strong commercial potential, is quick to market, but it might have lower complexity. That sounds appealing because if the team values speed and ease of deployment, this could be a good choice. However, they also mention it's less ambitious in terms of creativity.

Then there's FitFlow: This one has high demand and advanced technology integrations, which is great for innovation. But the complex data handling suggests higher technical challenges, and while it might have high ROI, it's more competitive on the market. So if the team wants something that stands out due to cutting-edge tech but isn't as concerned about immediate profitability or lower competition, this could be a fit.

BookNest is different because it has a unique community-centric idea with a dual-platform approach and significant user engagement potential. However, there's a higher complexity in development. The thought was considering BookNest strategically because of its growth potential and community features. But the team considered additional features like group orders or event-themed candy boxes as promotions to boost marketing.

In Sarah Johnson's thoughts, she suggested doing some market research for BookNest using surveys or social media polls to validate interest in the community features. She also mentioned promotions to help with organic marketing, which could add another layer of strategy but might be outside the core app development decision.

So putting this together: The team evaluated CandyShop for its simplicity and quick return on investment, FitFlow for its innovative tech despite higher complexity and competition, and BookNest for its unique selling points but with some reservations due to technical challenges. They also considered additional features beyond just developing one app—like group orders or event packs—but those might be more about marketing tactics rather than the core app choice.

I'm trying to determine which app they ultimately preferred. The initial response mentions that the team was leaning towards BookNest strategically but then asked for user validation via surveys and social media polls, along with promotions as part of their strategy. However, when considering which app to develop, CandyShop's simplicity might have been a draw despite its less ambitious creative side.

Wait, no—the last part says they considered all three options and did some market research before deciding on BookNest because it has unique features that could drive engagement. But if the user is asking which app they ultimately preferred, I think they ended up choosing BookNest based on their strategic evaluation, even though there were other considerations.

Alternatively, maybe CandyShop was the one developed first due to its simplicity, but they preferred something more with community focus despite higher technical challenges.

Wait, let me go back. The initial context says "given that it’s content-heavy, we’ll require efficient data management and robust search capabilities." So BookNest is dual-platform and has strong community features which likely include better data handling and searches—so perhaps this aligns well with the team's current needs for efficient development.

So putting all together, considering the strengths in terms of content management and user engagement, despite higher complexity, they might have ultimately preferred BookNest because it addresses their need for robust data management and search capabilities more effectively. But I'm not entirely sure; maybe CandyShop was easier to develop but didn't meet as many requirements.
</think>

The development team ultimately preferred **BookNest** due to its unique community-centric approach, dual-platform strategy, and potential for significant user engagement, which align well with their requirements for efficient data management and robust search capabilities. Although BookNest involves higher technical challenges, these factors made it the preferred choice over CandyShop or FitFlow.
```

Making It Even Better

While this script gets the job done, there’s always room for improvement:

  • Persist the Vector Database: Right now, we create the vector store every time we run the script. By saving it to disk, we can load it quickly without reprocessing the PDF.​

  • Handle Multiple Documents: Currently, it’s set up for one PDF. We could tweak it to handle multiple PDFs or even other document types, making our assistant more versatile.​

  • User-Friendly Interface: Adding a simple UI or command-line prompts would make it easier to input questions and view answers, enhancing the overall experience.​

Feel free to experiment and tailor the script to better suit your needs!

Conclusion

Alright, let’s wrap this up! We’ve walked through how to build a pipeline that lets you chat with your PDFs:

  • Load the PDF and extract its text.

  • Break the text into chunks that are easier to work with.

  • Embed and store these chunks in a vector database.

  • Fetch relevant chunks when you have a question.

  • Use an LLM to generate answers based on those chunks.

This method, known as Retrieval-Augmented Generation (RAG), ensures your LLM’s responses are grounded in the actual content of your documents.

The beauty of this setup? It’s modular! You can swap in different data sources, embedding models, vector databases, or LLMs to fit your needs. LangChain makes it easy to customize your pipeline while sticking to this effective workflow.

If you enjoyed this post and want to see more, feel free to explore my other articles. Also, connect with me on LinkedIn, where I regularly share updates, especially when new blog posts go live. Let’s keep the conversation going!

This post is licensed under CC BY 4.0 by the author.

Trending Tags