EAVAE: Explainable Author-Variational Autoencoder
This repository contains the model presented in the paper Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI.
The official code implementation is available at: GitHub - hieum98/avae
๐ฏ Overview
EAVAE (Explainable Authorship Variational Autoencoder) is a neural architecture for learning disentangled style and content representations in text. This model separates an author's writing style from semantic content, enabling applications in authorship verification, style transfer, and text generation with controlled stylistic attributes.
The framework achieves disentanglement through:
- Style Encoder: Captures author-specific writing patterns (e.g., word choice, sentence structure).
- Content Encoder: Extracts semantic meaning independent of style.
- Generator: Reconstructs text conditioned on both style and content representations.
- VAE Framework: Uses variational autoencoders for regularized latent space learning.
๐๏ธ Architecture
Input Text
โโ> Style Encoder (Bidirectional Qwen) โ> Style VAE โ> Style Latent (z_s)
โโ> Content Encoder (GTE-Qwen) โโโโโโโโ> Content VAE โ> Content Latent (z_c)
โ
[z_s โ z_c] โ Generator (Qwen)
โ
Reconstructed Text
๐ Quick Start
For full installation and training details, please refer to the GitHub repository.
Installation
# Clone the repository
git clone https://github.com/hieum98/avae.git
cd avae
# Install dependencies
pip install -r requirements.txt
๐ Datasets
The model is trained on diverse multi-author corpora including Reddit, Blog Authorship Corpus, Amazon Reviews, Goodreads, IMDb, and News articles. It is evaluated on several benchmarks:
- HRS (HIATUS Reddit Stories)
- MUD (Multi-User Detection)
- PAN20/PAN21
- Amazon Reviews
- M4 (AI-generated text detection)
๐ฌ Model Details
The model achieves state-of-the-art performance by explicitly disentangling style from content through architectural separation-by-design. Disentanglement is enforced through novel discriminators that distinguish whether pairs of style/content representations belong to the same or different authors/content sources while providing natural language explanations for their decisions.
๐ Citation
@misc{man2024explainable,
title={Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI},
author={Hieu Man and Van-Cuong Pham and Nghia Trung Ngo and Franck Dernoncourt and Thien Huu Nguyen},
year={2024},
eprint={2604.21300},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
๐ License
This project is licensed under the MIT License.
- Downloads last month
- 50
