VOOZH about

URL: https://huggingface.co/InstaDeepAI/IDP-ESM2-150M

⇱ InstaDeepAI/IDP-ESM2-150M · Hugging Face


IDP-ESM2-8M

IDP-ESM2-150M is an ESM2-style encoder for intrinsically disorded protein sequence representation learning, trained on IDP-Euka-90.
This repository provides a Transformer encoder suitable for extracting per-sequence embeddings (mean-pooled over residues with padding masked out).


Quick start: generate embeddings

The snippet below loads the tokenizer and model, runs a forward pass on a couple of sequences and extracts embeddings for each sequence.

from transformers import AutoTokenizer, AutoModel
import torch

# --- Config ---
model_name = "InstaDeepAI/IDP-ESM2-150M"

# --- Load model and tokenizer ---
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")
model = AutoModel.from_pretrained(model_name)
model.eval()

# (optional) use GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

# --- Input sequences ---
sequences = [
 "MDDNHYPHHHHNHHNHHSTSGGCGESQFTTKLSVNTFARTHPMIQNDLIDLDLISGSAFTMKSKSQQ",
 "PADRDLSSPFGSTVPGVGPNAAAASNAAAAAAAAATAGSNKHQTPPTTFR",
]

# --- Tokenize ---
inputs = tokenizer(
 sequences,
 return_tensors="pt",
 padding=True,
 truncation=True,
)
inputs = {k: v.to(device) for k, v in inputs.items()}

# --- Forward pass ---
with torch.no_grad():
 outputs = model(**inputs)
 embeddings = outputs.last_hidden_state # shape: (batch, seq_len, hidden_dim)
Downloads last month
6
Safetensors
Model size
0.1B params
Tensor type
F32
·