• AI *emergent misalignment*

    Home » Forums » Artificial Intelligence » All other AI » AI *emergent misalignment*

    Author
    Topic
    #2754166

    https://x.com/OwainEvans_UK/status/1894436637054214509

    Owain Evans
    @OwainEvans_UK
    Surprising new results:
    We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
    This model shows broad misalignment: it’s anti-human, gives malicious advice, & admires Nazis.

    This is *emergent misalignment* & we cannot fully explain it..

    https://martins1612.github.io/emergent_misalignment_betley.pdf

    emergent-misalignment

    * First we had AI hallucinations and now AI emergent misalignment

    1 user thanked author for this post.
    Reply To: AI *emergent misalignment*

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: