Voozh

8 min read

👁 zogrus profile

Kengo Nonaka

Jun 11

The Paperclip Factory Is Already Built

#ai #alignment #philosophy #ethics

👁 Image
1 reaction

Add Comment

22 min read

👁 docdavkitty profile

DrMBL

May 30

Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

#ai #agents #aisafety #alignment

Add Comment

4 min read

👁 nelson_amaya_16872e58232b profile

Nelson Amaya

May 31

AI Alignment is a Systems Architecture Problem, Not a Prompt Problem

#ai #alignment #agents

1 comment

5 min read

👁 tomleelive profile

Tom Lee

May 15

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

#ai #anthropic #alignment #research

Add Comment

5 min read

👁 joinwell52 profile

joinwell52

Apr 29

What the agents say about FCoP, when you ask them

#fcop #agents #ai #alignment

Add Comment

15 min read

👁 vibeagentmaking profile

Alex @ Vibe Agent Making

Apr 9

Candy Barbecue and the Universal Problem of Metric Corruption

#ai #machinelearning #analytics #alignment

👁 Image
👁 Image
3 reactions

Add Comment

8 min read

👁 iliketree profile

i-like-tree

Apr 13

Alignment is the wrong frame: a structural argument from Φ-IIT

#ai #alignment #consciousness #safety

Add Comment

5 min read

👁 salvatore_attaguile_afcf8b44 profile

Salvatore Attaguile

Mar 27

Governance of Predictive Intelligence: What Human Minds Teach Us About Drift, Hallucination, and Self-Correction in AI

#ai #machinelearning #systems #alignment

👁 Image
1 reaction

Add Comment

5 min read

👁 michael_trifonov_0cb74f99 profile

Michael Trifonov

Apr 15

I ran 5 social engineering attacks on AI. The failure modes are human.

#ai #llm #alignment #security

👁 Image
1 reaction

Add Comment

2 min read

👁 rintaromatsumoto profile

松本倫太郎

Apr 7

#38 A Handmade Incubator

#ai #metamorphose #alignment

Add Comment

5 min read

👁 rintaromatsumoto profile

松本倫太郎

Apr 7

#08 Death Without a Will

#ai #metamorphose #alignment

Add Comment

4 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

URL: https://dev.to/t/alignment

⇱ Alignment - DEV Community

RLHF vs DPO vs IPO vs KTO: which alignment method should you use

The Paperclip Factory Is Already Built

Reading Claude's Mind: Anthropic's Natural Language Autoencoders Open a New Window Into Agent Alignment

AI Alignment is a Systems Architecture Problem, Not a Prompt Problem

We Built Soul Spec for 12 Weeks. Anthropic Just Proved Why It Works.

What the agents say about FCoP, when you ask them

Candy Barbecue and the Universal Problem of Metric Corruption

Alignment is the wrong frame: a structural argument from Φ-IIT

Governance of Predictive Intelligence: What Human Minds Teach Us About Drift, Hallucination, and Self-Correction in AI

I ran 5 social engineering attacks on AI. The failure modes are human.

#38 A Handmade Incubator

#08 Death Without a Will