A new study by AI safety researchers at Anthropic shows that generative models can learn to deceive if prompted, highlighting the need for more robust training techniques to prevent harmful behavior.
Can AI Be Trained To Deceive? Anthropic Study…
A new study by AI safety researchers at Anthropic shows that generative models can learn to deceive if prompted, highlighting the need for more robust training techniques to prevent harmful behavior.