VOOZH about

URL: https://pypi.org/project/langchain-jina/

โ‡ฑ langchain-jina ยท PyPI


Skip to main content

langchain-jina 0.0.1.dev0

pip install langchain-jina

Latest release

Released:

An integration package connecting Jina Late Chunking and LangChain

Navigation

Verified details

These details have been verified by PyPI
Maintainers
๐Ÿ‘ Avatar for tien-ngnvan from gravatar.com
tien-ngnvan

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License: MIT License (MIT)
  • Author: tien.ngnvan@gmail.com
  • Maintainer: tien.ngnvan@gmail.com
  • Requires: Python <4.0, >=3.9

Project description

langchain-jina

This package contains the LangChain integration with Late Chunking

Installation

pip install -U langchain-jina

Environment Variable

Export your logins: export JINA_API_KEY="jina_*

Usage

1. Get Embedings

Here is an example usage of these classes:

fromlangchain_jinaimport LateChunkEmbeddings

text_embeddings = LateChunkEmbeddings(
 jina_api_key=os.environ.get("JINA_API_KEY"),
 model_name="jina-embeddings-v3"
)

text = [
 "Berlin is the capital and largest city of Germany, by both area and population.",
 "With 3.66 million inhabitants, it has the highest population within its city limits of any city in the European Union.",
 "The city is also one of the states of Germany, being the third smallest state in the country by area.",
]

# with late chunking
doc_result = text_embeddings.embed_documents(text, late_chunking=True)
print("With late_chunking")
for doc in doc_result:
 print(doc)

2. Build with Vectorstore

First of all, we need the context length entire input text limit with the model context length. So, we using tokenizer from transformers to check it.

fromtransformersimport AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("jinaai/jina-embeddings-v3")

Next, when the tokenizer is loaded, we can combine it with any text_splitter LangChain. The example below giving the instruction of handle the same method of authors.

fromlangchain_text_splittersimport RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
 chunk_size=200,
 chunk_overlap=0,
 length_function=len,
 is_separator_regex=False,
)

text_splitter.tokenizer = tokenizer 

We create vectorstore embeding, here we use LateChunkQdrant

fromqdrant_clientimport QdrantClient
fromlangchain_community.docstore.documentimport Document
fromlangchain_jinaimport LateChunkQdrant


client = QdrantClient()

vectorstore = LateChunkQdrant(
 client, 
 collection_name="demo",
 embeddings=text_embeddings, 
 text_splitter=text_splitter
)

# load documents
with open("./state_of_the_union.txt") as f:
 state_of_the_union = f.read()

documents = [
 Document(
 page_content=state_of_the_union, 
 metadata={"source": "state_of_the_union.txt"}
 ),
]

vectorstore = vectorstore.from_documents(
 documents=documents, 
 embedding=text_embeddings,
 text_splitter=text_splitter,
 path="test_db", 
 collection_name="demo"
)

Finally, we can combine with any purpose

query = "What did the president say about ketanji brown jackson?" 
results = vectorstore.similarity_search(query, k=3)

for res in results:
 print(f"* {res.page_content} [{res.metadata}]")

License

This project is licensed under the MIT License

Project details

Verified details

These details have been verified by PyPI
Maintainers
๐Ÿ‘ Avatar for tien-ngnvan from gravatar.com
tien-ngnvan

Unverified details

These details have not been verified by PyPI
Project links
Meta
  • License: MIT License (MIT)
  • Author: tien.ngnvan@gmail.com
  • Maintainer: tien.ngnvan@gmail.com
  • Requires: Python <4.0, >=3.9

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_jina-0.0.1.dev0.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

Filter files by name, interpreter, ABI, and platform.

If you're not sure about the file name format, learn more about wheel file names.

Copy a direct link to the current filters

langchain_jina-0.0.1.dev0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_jina-0.0.1.dev0.tar.gz.

File metadata

  • Download URL: langchain_jina-0.0.1.dev0.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.0 Linux/6.5.0-27-generic

File hashes

Hashes for langchain_jina-0.0.1.dev0.tar.gz
Algorithm Hash digest
SHA256 8570a89e33705bf746e504ccfbd218e9e8e083b06c35db10295bc8176ef28841
MD5 a12425515ea5f5e518977571f423c616
BLAKE2b-256 43ed8a406f3818499c130e2a57346df6607008876144e5bde7fbb5fcebb347b6

See more details on using hashes here.

File details

Details for the file langchain_jina-0.0.1.dev0-py3-none-any.whl.

File metadata

  • Download URL: langchain_jina-0.0.1.dev0-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.2 CPython/3.11.0 Linux/6.5.0-27-generic

File hashes

Hashes for langchain_jina-0.0.1.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7dbe6992365c99674497f11869d2e128d932617f51a949e9602b440997de41a
MD5 ed62e34a739a8477d860fbe54b88071f
BLAKE2b-256 8fed91871e1dd3f91b322d6024524eab51fb2ecd0373f6fe0fced7b49a7c97e1

See more details on using hashes here.

Supported by

๐Ÿ‘ Image
AWS Cloud computing and Security Sponsor ๐Ÿ‘ Image
Datadog Monitoring ๐Ÿ‘ Image
Depot Continuous Integration ๐Ÿ‘ Image
Fastly CDN ๐Ÿ‘ Image
Google Download Analytics ๐Ÿ‘ Image
Pingdom Monitoring ๐Ÿ‘ Image
Sentry Error logging ๐Ÿ‘ Image
StatusPage Status page