EAVAE: Explainable Author-Variational Autoencoder

This repository contains the model presented in the paper Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI.

The official code implementation is available at: GitHub - hieum98/avae

🎯 Overview

EAVAE (Explainable Authorship Variational Autoencoder) is a neural architecture for learning disentangled style and content representations in text. This model separates an author's writing style from semantic content, enabling applications in authorship verification, style transfer, and text generation with controlled stylistic attributes.

The framework achieves disentanglement through:

Style Encoder: Captures author-specific writing patterns (e.g., word choice, sentence structure).
Content Encoder: Extracts semantic meaning independent of style.
Generator: Reconstructs text conditioned on both style and content representations.
VAE Framework: Uses variational autoencoders for regularized latent space learning.

🏗️ Architecture

Input Text
 ├─> Style Encoder (Bidirectional Qwen) ─> Style VAE ─> Style Latent (z_s)
 └─> Content Encoder (GTE-Qwen) ────────> Content VAE ─> Content Latent (z_c)
 ↓
 [z_s ⊕ z_c] → Generator (Qwen)
 ↓
 Reconstructed Text

🚀 Quick Start

For full installation and training details, please refer to the GitHub repository.

Installation

# Clone the repository
git clone https://github.com/hieum98/avae.git
cd avae

# Install dependencies
pip install -r requirements.txt

📊 Datasets

The model is trained on diverse multi-author corpora including Reddit, Blog Authorship Corpus, Amazon Reviews, Goodreads, IMDb, and News articles. It is evaluated on several benchmarks:

HRS (HIATUS Reddit Stories)
MUD (Multi-User Detection)
PAN20/PAN21
Amazon Reviews
M4 (AI-generated text detection)

🔬 Model Details

The model achieves state-of-the-art performance by explicitly disentangling style from content through architectural separation-by-design. Disentanglement is enforced through novel discriminators that distinguish whether pairs of style/content representations belong to the same or different authors/content sources while providing natural language explanations for their decisions.

🎓 Citation

@misc{man2024explainable,
 title={Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI}, 
 author={Hieu Man and Van-Cuong Pham and Nghia Trung Ngo and Franck Dernoncourt and Thien Huu Nguyen},
 year={2024},
 eprint={2604.21300},
 archivePrefix={arXiv},
 primaryClass={cs.CL}
}

📝 License

This project is licensed under the MIT License.

Downloads last month: 50

Safetensors

Model size

5B params

Tensor type

F32

BF16

Paper for Hieuman/avae.v0.1

Paper • 2604.21300 • Published Apr 23 • 3

URL: https://huggingface.co/Hieuman/avae.v0.1

⇱ Hieuman/avae.v0.1 · Hugging Face