webarchiving
Here are 54 public repositories matching this topic...
Wayback Machine API interface & a command-line tool
- Updated
- Python
WARC + AI - Experimental Retrieval Augmented Generation Pipeline for Web Archive Collections.
- Updated
- Python
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
- Updated
- JavaScript
A list of things related to software, literature, and other content for π£ Memento
- Updated
Parse And Create Web ARChive (WARC) files with node.js
- Updated
- JavaScript
Various Jupyter notebooks about Common Crawl data
- Updated
- Jupyter Notebook
A dockerized, queued high fidelity web archiver based on Squidwarc
- Updated
- Python
Quick Cache and Archive search buttons
- Updated
- JavaScript
metawarc: a command-line tool for metadata extraction from files from WARC (Web ARChive)
- Updated
- Python
Awesome list dedicated to digital and data preservation tools, sources, services and so on.
- Updated
A social media open post web archiving tool
- Updated
- JavaScript
Digital Preservation of HTTP in documentary heritage.
- Updated
- Go
Decentralized web archiving
- Updated
- Python
A tool for detecting viruses and NSFW material in WARC files
- Updated
- Python
π File-Based Reference Filing System.
- Updated
- Go
A javascript for fighting link rot and content drift using link decoration and web archives.
- Updated
- HTML
Seeder - Czech webarchive curating tool and public site
- Updated
- Python
Parser for WARC (aka WebArchive) files
- Updated
- C#
Improve this page
Add a description, image, and links to the webarchiving topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the webarchiving topic, visit your repo's landing page and select "manage topics."
