VOOZH about

URL: https://huggingface.co/datasets/sayurio/english-manhwa-scrape

⇱ sayurio/english-manhwa-scrape · Datasets at Hugging Face


Dataset Viewer

The dataset viewer is not available because its heuristics could not detect any supported data files. You can try uploading some data files, or configuring the data files location manually.

English Manhwa Scrape Dataset

Request More Scrapes
Order Private Scrapes

Note: The total files are about 150+GB and my internet connection is slow af. So I'll be uploading them in batches

File Structure:

files/
- Manhwa 1 Name
 - Chapter 1.cbz
 - Chapter 2.cbz
 - Chapter 3.cbz
 - ......
 - Chapter n.cbz
- Manhwa 2 Name
 - Chapter 1.cbz
 - Chapter 2.cbz
 - Chapter 3.cbz
 - ......
 - Chapter n.cbz

Due to maximum 10000 files in a repo limit of huggingface, I had to further zip the already zipped (.cbz is basically zip) all the cbz chapters of a manhwa and the repo files looks like this

files/
- Manhwa 1 Name.zip
- Manhwa 2 Name.zip
- Manhwa 3 Name.zip
- ...
- Manhwa n Name.zip

Dataset Summary

This dataset is an extensive archive of scraped, English-translated manhwa, manga, and webtoons. It contains hundreds of complete or partially scraped series organized for easy downloading and local reading.

Disclaimer: This dataset contains highly explicit/NSFW (Not Safe For Work) adult content. User discretion is strictly advised.

Data Structure

To optimize download speeds and bypass file-count limits, the dataset is organized into .zip archives per series.

  • All data is located inside the files/ directory.
  • Each series is packed into a single, uncompressed archive (e.g., files/15 Beauties.zip).
  • Inside each .zip file, the individual chapters are stored in standard .cbz (Comic Book Zip) format.

This structure allows users to download only the specific series they want without having to pull the entire dataset at once.

Copyright and Fair Use

I do not own the copyrights to any of the manhwa, manga, webtoons, or translations included in this repository. All materials are uploaded under the principles of fair use, and this dataset is strictly intended for educational and research purposes only (such as machine learning, data analysis, and academic research).

How to Download

You can download specific folders or the entire dataset using the official Hugging Face CLI or the huggingface_hub Python library. Simply point your tool to the specific .zip archive inside the files/ directory that you wish to read, or download the entire files/ directory to grab everything at once.

Included Series


Downloads last month
543

Collection including sayurio/english-manhwa-scrape