ConvNeXt V2 (base-sized model)

ConvNeXt V2 model pretrained using the FCMAE framework and fine-tuned on the ImageNet-1K dataset at resolution 224x224. It was introduced in the paper ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders by Woo et al. and first released in this repository.

Disclaimer: The team releasing ConvNeXT V2 did not write a model card for this model so this model card has been written by the Hugging Face team.

Model description

ConvNeXt V2 is a pure convolutional model (ConvNet) that introduces a fully convolutional masked autoencoder framework (FCMAE) and a new Global Response Normalization (GRN) layer to ConvNeXt. ConvNeXt V2 significantly improves the performance of pure ConvNets on various recognition benchmarks.

👁 model image

Intended uses & limitations

You can use the raw model for image classification. See the model hub to look for fine-tuned versions on a task that interests you.

How to use

Here is how to use this model to classify an image of the COCO 2017 dataset into one of the 1,000 ImageNet classes:

from transformers import AutoImageProcessor, ConvNextV2ForImageClassification
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

preprocessor = AutoImageProcessor.from_pretrained("facebook/convnextv2-base-1k-224")
model = ConvNextV2ForImageClassification.from_pretrained("facebook/convnextv2-base-1k-224")

inputs = preprocessor(image, return_tensors="pt")

with torch.no_grad():
 logits = model(**inputs).logits

# model predicts one of the 1000 ImageNet classes
predicted_label = logits.argmax(-1).item()
print(model.config.id2label[predicted_label]),

For more code examples, we refer to the documentation.

BibTeX entry and citation info

@article{DBLP:journals/corr/abs-2301-00808,
 author = {Sanghyun Woo and
 Shoubhik Debnath and
 Ronghang Hu and
 Xinlei Chen and
 Zhuang Liu and
 In So Kweon and
 Saining Xie},
 title = {ConvNeXt {V2:} Co-designing and Scaling ConvNets with Masked Autoencoders},
 journal = {CoRR},
 volume = {abs/2301.00808},
 year = {2023},
 url = {https://doi.org/10.48550/arXiv.2301.00808},
 doi = {10.48550/arXiv.2301.00808},
 eprinttype = {arXiv},
 eprint = {2301.00808},
 timestamp = {Tue, 10 Jan 2023 15:10:12 +0100},
 biburl = {https://dblp.org/rec/journals/corr/abs-2301-00808.bib},
 bibsource = {dblp computer science bibliography, https://dblp.org}
}

Downloads last month: 363

Safetensors

Model size

88.7M params

Tensor type

F32

Model tree for facebook/convnextv2-base-1k-224

Finetunes

15 models

Quantizations

1 model

Dataset used to train facebook/convnextv2-base-1k-224

Paper for facebook/convnextv2-base-1k-224

Paper • 2301.00808 • Published Jan 2, 2023 • 1

URL: https://huggingface.co/facebook/convnextv2-base-1k-224

⇱ facebook/convnextv2-base-1k-224 · Hugging Face