VOOZH about

URL: https://apify.com/attainable_iota/website-to-llm-knowledge-pack

⇱ Website To LLM Knowledge Pack Β· Apify


πŸ‘ Website To LLM Knowledge Pack avatar

Website To LLM Knowledge Pack

Under maintenance

Pricing

from $0.50 / 1,000 results

Go to Apify Store

Website To LLM Knowledge Pack

Under maintenance

Crawl any website and turn it into an LLM-ready knowledge pack. This Actor extracts clean main text + metadata, follows links with depth/URL filters, and outputs per-page dataset items plus knowledge.jsonl, knowledge.md, and manifest.json for RAG/embeddings pipelines.

Pricing

from $0.50 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ M Junaid Shaukat

M Junaid Shaukat

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

0

Monthly active users

6 months ago

Last modified

Share

Website to LLM Knowledge Pack

This Actor crawls a website and exports LLM/RAG-ready outputs:

  • Dataset items (one per page)
  • knowledge.jsonl (RAG-ready JSONL)
  • knowledge.md (Markdown bundle)
  • manifest.json (crawl stats + internal link graph)

We decided to split Apify SDK into two libraries, Crawlee and Apify SDK v3. Crawlee will retain all the crawling and scraping-related tools and will always strive to be the best web scraping library for its community. At the same time, Apify SDK will continue to exist, but keep only the Apify-specific features related to building Actors on the Apify platform. Read the upgrading guide to learn about the changes.

Resources

If you're looking for examples or want to learn more visit:

Getting started

For complete information see this article. To run the Actor use the following command:

$apify run

Deploy to Apify

Connect Git repository to Apify

If you've created a Git repository for the project, you can easily connect to Apify:

  1. Go to Actor creation page
  2. Click on Link Git Repository button

Push project on your local machine to Apify

You can also deploy the project on your local machine to Apify without the need for the Git repository.

  1. Log in to Apify. You will need to provide your Apify API Token to complete this action.

    $apify login
  2. Deploy your Actor. This command will deploy and build the Actor on the Apify Platform. You can find your newly created Actor under Actors -> My Actors.

    $apify push

Documentation reference

To learn more about Apify and Actors, take a look at the following resources:

You might also like

Site to LLM Knowledge Base

adambounhar/site-to-knowledge-base

Turn any website or docs into clean, LLM-ready Markdown for RAG and AI agents β€” one record per page, each with a token count. Sitemap- and robots.txt-aware, with predictable per-page pricing (no token credits). Simple knowledge-base ingestion.

πŸ‘ User avatar

Mohamed Adam BOUNHAR

2

Website Content Crawler

crawlerbros/website-content-crawler

Crawls websites and extracts clean text, markdown, or HTML content. Ideal for LLM training data, RAG pipelines, and knowledge base building.

46

RAG-Ready Web Scraper & Smart Chunker for AI Knowledge Bases

adinfosys-labs/rag-ready-web-scraper-smart-chunker-for-ai-knowledge-bases

RAG-ready web scraper that collects, cleans, deduplicates, filters, and chunks web content into structured datasets for AI pipelines. Generates high-quality knowledge-base data optimized for LLMs, embeddings, and vector databases

πŸ‘ User avatar

Artashes Arakelyan

7

GPT Crawler MCP β€” Knowledge files for ChatGPT, Claude, RAG

kazkn/gpt-crawler-mcp

Crawl any website and turn it into a clean knowledge file for your custom GPT, Claude Project, or RAG pipeline. Native MCP server in Standby mode + classic batch mode.

Knowledge Intelligence Engine β€” Website to Markdown for RAG

ryanclinton/website-content-to-markdown

Turn any website, documentation site or help centre into a retrieval-ready knowledge corpus for RAG and AI search. Clean Markdown plus chunks, change detection, deduplication, retrieval scoring, version awareness and a full corpus audit, in one run.

15

Google Knowledge Graph

seemuapps/google-knowledge-graph

Enrich a list of entity names (people, companies, places, things) with metadata from the Google Knowledge Graph.

Front Knowledge Base

canadesk/front-knowledge-base

Get Categories and Articles from any public Front Knowledge Base. It's fast and costs little.

πŸ‘ User avatar

Canadesk Support

4

Website Content Crawler

parseforge/website-content-crawler

Crawl any website and pull clean Markdown content ready for AI! Follow links across a whole domain and extract page text, titles, headings, images, and metadata. Perfect for building RAG pipelines, training datasets, knowledge bases, and vector databases. Start crawling content in minutes!