langchain-jina 0.0.1.dev0

pip install langchain-jina

Latest release

Released: Apr 6, 2025

An integration package connecting Jina Late Chunking and LangChain

Navigation

Verified details

These details have been verified by PyPI

Maintainers

👁 Avatar for tien-ngnvan from gravatar.com
tien-ngnvan

Unverified details

These details have not been verified by PyPI

Project links

Classifiers

License
- OSI Approved :: MIT License
Programming Language

Report project as malware

Project description

langchain-jina

This package contains the LangChain integration with Late Chunking

Installation

pip install -U langchain-jina

Environment Variable

Export your logins: export JINA_API_KEY="jina_*

Usage

1. Get Embedings

Here is an example usage of these classes:

fromlangchain_jinaimport LateChunkEmbeddings

text_embeddings = LateChunkEmbeddings(
 jina_api_key=os.environ.get("JINA_API_KEY"),
 model_name="jina-embeddings-v3"
)

text = [
 "Berlin is the capital and largest city of Germany, by both area and population.",
 "With 3.66 million inhabitants, it has the highest population within its city limits of any city in the European Union.",
 "The city is also one of the states of Germany, being the third smallest state in the country by area.",
]

# with late chunking
doc_result = text_embeddings.embed_documents(text, late_chunking=True)
print("With late_chunking")
for doc in doc_result:
 print(doc)

2. Build with Vectorstore

First of all, we need the context length entire input text limit with the model context length. So, we using tokenizer from transformers to check it.

fromtransformersimport AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v3")

Next, when the tokenizer is loaded, we can combine it with any text_splitter LangChain. The example below giving the instruction of handle the same method of authors.

fromlangchain_text_splittersimport RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
 chunk_size=200,
 chunk_overlap=0,
 length_function=len,
 is_separator_regex=False,
)

text_splitter.tokenizer = tokenizer

We create vectorstore embeding, here we use LateChunkQdrant

fromqdrant_clientimport QdrantClient
fromlangchain_community.docstore.documentimport Document
fromlangchain_jinaimport LateChunkQdrant


client = QdrantClient()

vectorstore = LateChunkQdrant(
 client, 
 collection_name="demo",
 embeddings=text_embeddings, 
 text_splitter=text_splitter
)

# load documents
with open("./state_of_the_union.txt") as f:
 state_of_the_union = f.read()

documents = [
 Document(
 page_content=state_of_the_union, 
 metadata={"source": "state_of_the_union.txt"}
 ),
]

vectorstore = vectorstore.from_documents(
 documents=documents, 
 embedding=text_embeddings,
 text_splitter=text_splitter,
 path="test_db", 
 collection_name="demo"
)

Finally, we can combine with any purpose

query = "What did the president say about ketanji brown jackson?" 
results = vectorstore.similarity_search(query, k=3)

for res in results:
 print(f"* {res.page_content} [{res.metadata}]")

License

This project is licensed under the MIT License

Project details

Verified details

These details have been verified by PyPI

Maintainers

👁 Avatar for tien-ngnvan from gravatar.com
tien-ngnvan

Unverified details

These details have not been verified by PyPI

Project links

Classifiers

License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

👁 Image

0.0.1.dev0 pre-release

Apr 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_jina-0.0.1.dev0.tar.gz (12.0 kB view details)

Uploaded Apr 6, 2025 Source

Built Distribution

Filter files by name, interpreter, ABI, and platform.

If you're not sure about the file name format, learn more about wheel file names.

Copy a direct link to the current filters

langchain_jina-0.0.1.dev0-py3-none-any.whl (12.2 kB view details)

Uploaded Apr 6, 2025 Python 3

File details

Details for the file langchain_jina-0.0.1.dev0.tar.gz.

File metadata

Download URL: langchain_jina-0.0.1.dev0.tar.gz
Upload date: Apr 6, 2025
Size: 12.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.0 Linux/6.5.0-27-generic

File hashes

Hashes for langchain_jina-0.0.1.dev0.tar.gz
Algorithm	Hash digest
SHA256	`8570a89e33705bf746e504ccfbd218e9e8e083b06c35db10295bc8176ef28841`
MD5	`a12425515ea5f5e518977571f423c616`
BLAKE2b-256	`43ed8a406f3818499c130e2a57346df6607008876144e5bde7fbb5fcebb347b6`

See more details on using hashes here.

File details

Details for the file langchain_jina-0.0.1.dev0-py3-none-any.whl.

File metadata

Download URL: langchain_jina-0.0.1.dev0-py3-none-any.whl
Upload date: Apr 6, 2025
Size: 12.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.2 CPython/3.11.0 Linux/6.5.0-27-generic

File hashes

Hashes for langchain_jina-0.0.1.dev0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b7dbe6992365c99674497f11869d2e128d932617f51a949e9602b440997de41a`
MD5	`ed62e34a739a8477d860fbe54b88071f`
BLAKE2b-256	`8fed91871e1dd3f91b322d6024524eab51fb2ecd0373f6fe0fced7b49a7c97e1`

See more details on using hashes here.

URL: https://pypi.org/project/langchain-jina/

⇱ langchain-jina · PyPI

langchain-jina 0.0.1.dev0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

langchain-jina

Installation

Environment Variable

Usage

1. Get Embedings

2. Build with Vectorstore

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes