VOOZH about

URL: https://huggingface.co/datasets/axiong/pmc_oa

⇱ axiong/pmc_oa · Datasets at Hugging Face


Dataset Viewer

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PMC-OA Dataset

News: We have released the PMC-OA dataset. You can choose the subset specifically.

P.S. There's something wrong with the huggingface dataset viewer when the dataset scale gets large. So we sample a subset of it to visualize it directly on web. Click PMC-OA-Demo to view it.

中文文档

Model Zoo

Check it out if you want to load model pretrained on PMC-OA directly.

We plan to release more models pretrained on PMC-OA. Feel free to reach us if the model you want is not included in model zoo for now. Also, we express our thanks to the help from the community.

Model Link Provider
ViT-L-14 https://huggingface.co/ryanyip7777/pmc_vit_l_14 @ryanyip7777

Daraset Structure

PMC-OA (seperated images, separated caption).

  • images.zip: images folder
  • pmc_oa.jsonl: dataset file of pmc-oa
  • pmc_oa_beta.jsonl: dataset file of pmc-oa-beta

The difference between PMC-OA & PMC-OA-Beta lies in the methods of processing captions. In PMC-OA, we utilize ChatGPT to help us divide compound captions into seperate ones. While PMC-OA-Beta keeps all the compound ones without division.

Sample

A row in pmc_oa.jsonl is shown bellow,

{
 "image": "PMC212319_Fig3_4.jpg",
 "caption": "A. Real time image of the translocation of ARF1-GFP to the plasma membrane ...",
}

Explanation to each key

  • image: path to the image
  • caption: corresponding to the image
Downloads last month
754

Models trained or fine-tuned on axiong/pmc_oa