langchain-jina 0.0.1.dev0
pip install langchain-jina
Released:
An integration package connecting Jina Late Chunking and LangChain
Navigation
Verified details
These details have been verified by PyPIMaintainers
๐ Avatar for tien-ngnvan from gravatar.comtien-ngnvan
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: MIT License (MIT)
- Author: tien.ngnvan@gmail.com
- Maintainer: tien.ngnvan@gmail.com
- Requires: Python <4.0, >=3.9
Classifiers
- License
- Programming Language
Project description
langchain-jina
This package contains the LangChain integration with Late Chunking
Installation
pip install -U langchain-jina
Environment Variable
Export your logins:
export JINA_API_KEY="jina_*
Usage
1. Get Embedings
Here is an example usage of these classes:
fromlangchain_jinaimport LateChunkEmbeddings text_embeddings = LateChunkEmbeddings( jina_api_key=os.environ.get("JINA_API_KEY"), model_name="jina-embeddings-v3" ) text = [ "Berlin is the capital and largest city of Germany, by both area and population.", "With 3.66 million inhabitants, it has the highest population within its city limits of any city in the European Union.", "The city is also one of the states of Germany, being the third smallest state in the country by area.", ] # with late chunking doc_result = text_embeddings.embed_documents(text, late_chunking=True) print("With late_chunking") for doc in doc_result: print(doc)
2. Build with Vectorstore
First of all, we need the context length entire input text limit with the model context length. So, we using tokenizer from transformers to check it.
fromtransformersimport AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v3")
Next, when the tokenizer is loaded, we can combine it with any text_splitter LangChain. The example below giving the instruction of handle the same method of authors.
fromlangchain_text_splittersimport RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( chunk_size=200, chunk_overlap=0, length_function=len, is_separator_regex=False, ) text_splitter.tokenizer = tokenizer
We create vectorstore embeding, here we use LateChunkQdrant
fromqdrant_clientimport QdrantClient fromlangchain_community.docstore.documentimport Document fromlangchain_jinaimport LateChunkQdrant client = QdrantClient() vectorstore = LateChunkQdrant( client, collection_name="demo", embeddings=text_embeddings, text_splitter=text_splitter ) # load documents with open("./state_of_the_union.txt") as f: state_of_the_union = f.read() documents = [ Document( page_content=state_of_the_union, metadata={"source": "state_of_the_union.txt"} ), ] vectorstore = vectorstore.from_documents( documents=documents, embedding=text_embeddings, text_splitter=text_splitter, path="test_db", collection_name="demo" )
Finally, we can combine with any purpose
query = "What did the president say about ketanji brown jackson?" results = vectorstore.similarity_search(query, k=3) for res in results: print(f"* {res.page_content} [{res.metadata}]")
License
This project is licensed under the MIT License
Project details
Verified details
These details have been verified by PyPIMaintainers
๐ Avatar for tien-ngnvan from gravatar.comtien-ngnvan
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: MIT License (MIT)
- Author: tien.ngnvan@gmail.com
- Maintainer: tien.ngnvan@gmail.com
- Requires: Python <4.0, >=3.9
Classifiers
- License
- Programming Language
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_jina-0.0.1.dev0.tar.gz.
File metadata
- Download URL: langchain_jina-0.0.1.dev0.tar.gz
- Upload date:
- Size: 12.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.0 Linux/6.5.0-27-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8570a89e33705bf746e504ccfbd218e9e8e083b06c35db10295bc8176ef28841
|
|
| MD5 |
a12425515ea5f5e518977571f423c616
|
|
| BLAKE2b-256 |
43ed8a406f3818499c130e2a57346df6607008876144e5bde7fbb5fcebb347b6
|
File details
Details for the file langchain_jina-0.0.1.dev0-py3-none-any.whl.
File metadata
- Download URL: langchain_jina-0.0.1.dev0-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.2 CPython/3.11.0 Linux/6.5.0-27-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7dbe6992365c99674497f11869d2e128d932617f51a949e9602b440997de41a
|
|
| MD5 |
ed62e34a739a8477d860fbe54b88071f
|
|
| BLAKE2b-256 |
8fed91871e1dd3f91b322d6024524eab51fb2ecd0373f6fe0fced7b49a7c97e1
|
