VOOZH about

URL: https://huggingface.co/ChatterjeeLab/TD3B

โ‡ฑ ChatterjeeLab/TD3B ยท Hugging Face


YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation

๐Ÿ‘ Screenshot 2026-05-10 at 6.52.29โ€ฏPM

TD3B is a sequence-based generative framework that designs peptide binders with specified agonist or antagonist behavior. It combines a Direction Oracle, a soft binding-affinity gate, and amortized fine-tuning of a pre-trained discrete diffusion model (MDLM).

Installation

conda env create -f env.yml
conda activate td3b
pip install -e .

Demo

An interactive inference demo is provided in notebooks/TD3B_Inference_Demo.ipynb (configured for a Colab T4 GPU).

๐Ÿ‘ Open In Colab

To run it: download the notebook from this repository, open Google Colab, upload it via File โ†’ Upload notebook, and select a GPU runtime (Runtime โ†’ Change runtime type โ†’ T4 GPU).

Data and Checkpoints

The checkpoints ship with this repository. The expected layout is:

TD3B/
โ”œโ”€โ”€ checkpoints/
โ”‚ โ”œโ”€โ”€ pretrained.ckpt # Pre-trained MDLM weights
โ”‚ โ”œโ”€โ”€ td3b.ckpt # Fine-tuned TD3B model
โ”‚ โ””โ”€โ”€ direction_oracle.pt # Direction Oracle weights
โ”œโ”€โ”€ scoring/functions/classifiers/
โ”‚ โ”œโ”€โ”€ binding-affinity.pt
โ”‚ โ”œโ”€โ”€ hemolysis-xgboost.json
โ”‚ โ”œโ”€โ”€ nonfouling-xgboost.json
โ”‚ โ”œโ”€โ”€ permeability-xgboost.json
โ”‚ โ””โ”€โ”€ solubility-xgboost.json
โ””โ”€โ”€ tokenizer/
 โ”œโ”€โ”€ new_vocab.txt
 โ””โ”€โ”€ new_splits.txt

Code Structure

TD3B/
โ”œโ”€โ”€ inference.py # Generate binders (main inference entry point)
โ”œโ”€โ”€ finetune_multi_target.py # Multi-target TD3B training
โ”œโ”€โ”€ launch_multi_target.sh # Training launcher script
โ”œโ”€โ”€ models/
โ”‚ โ”œโ”€โ”€ diffusion.py # MDLM backbone (TR2-D2)
โ”‚ โ”œโ”€โ”€ roformer.py # RoFormer wrapper
โ”‚ โ””โ”€โ”€ noise_schedule.py # Noise schedules
โ”œโ”€โ”€ training/
โ”‚ โ”œโ”€โ”€ finetune_utils.py # Training utilities
โ”‚ โ””โ”€โ”€ distributed_utils.py # Distributed training helpers
โ”œโ”€โ”€ mcts/
โ”‚ โ””โ”€โ”€ peptide_mcts.py # MCTS tree search
โ”œโ”€โ”€ td3b/
โ”‚ โ”œโ”€โ”€ direction_oracle.py # Direction Oracle (f_ฯ†)
โ”‚ โ”œโ”€โ”€ td3b_scoring.py # Gated reward R = g_ฯˆ ยท ฯƒ(d*ยท(f_ฯ†โˆ’0.5)/ฯ„)
โ”‚ โ”œโ”€โ”€ td3b_losses.py # L_WDCE + ฮปยทL_ctr + ฮฒยทL_KL
โ”‚ โ”œโ”€โ”€ td3b_mcts.py # TD3B-extended MCTS
โ”‚ โ”œโ”€โ”€ td3b_finetune.py # Training loop
โ”‚ โ””โ”€โ”€ data_utils.py # Data loading utilities
โ”œโ”€โ”€ scoring/ # Affinity predictor (g_ฯˆ) and property classifiers
โ”œโ”€โ”€ baselines/ # CG, SMC, TDS, PepTune, Unguided baselines
โ”œโ”€โ”€ tokenizer/ # SMILES tokenizer (vocab + splits)
โ”œโ”€โ”€ configs/ # Model and training configs
โ””โ”€โ”€ utils/ # Misc utilities

Inference

Generate agonist/antagonist binders for target proteins:

python inference.py \
 --ckpt_path checkpoints/td3b.ckpt \
 --val_csv data/test.csv \
 --save_path results/ \
 --seed 42 \
 --num_pool 32 \
 --val_samples_per_target 8 \
 --resample_alpha 0.1

This generates 32 candidates per (target, direction), scores them with the Direction Oracle and affinity predictor, applies Algorithm 2 weighted resampling, and saves only valid peptide samples.

Output: results/td3b_results_seed42.csv with columns: target, sequence, direction, affinity, gated_reward, direction_oracle, direction_accuracy.

Training

Multi-target TD3B

  1. Edit launch_multi_target.sh โ€” set paths to checkpoints, data, and oracle:
BASE_PATH="/path/to/TD3B"
PRETRAINED_CHECKPOINT="${BASE_PATH}/checkpoints/pretrained.ckpt"
TRAIN_CSV="${BASE_PATH}/data/train.csv"
ORACLE_CKPT="${BASE_PATH}/checkpoints/direction_oracle.pt"
  1. Launch training:
bash launch_multi_target.sh

Key hyperparameters (in launch_multi_target.sh):

  • CONTRASTIVE_WEIGHT=0.1 โ€” ฮป for L_ctr
  • KL_BETA=0.1 โ€” ฮฒ for L_KL
  • SIGMOID_TEMPERATURE=0.1 โ€” ฯ„ for gated reward
  • NUM_ITER=20 โ€” MCTS iterations per round
  • NUM_CHILDREN=16 โ€” Children per MCTS expansion

Baselines

Run baseline methods (CG, SMC, TDS, PepTune, Unguided):

cd baselines/
bash run.sh --baseline cg --device cuda:0
bash run.sh --baseline smc --device cuda:0
bash run.sh --baseline tds --device cuda:0

Citation

@inproceedings{
 cao2026tdb,
 title={{TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation}},
 author={Hanqun Cao and Aastha Pal and Sophia Tang and Yinuo Zhang and Jingjie Zhang and Pheng-Ann Heng and Pranam Chatterjee},
 booktitle={Forty-third International Conference on Machine Learning (Spotlight)},
 year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for ChatterjeeLab/TD3B