I’m pulling the “twitter is a microblog” rule even though twitter is pretty mega now, hope that’s ok.

  • turdas@suppo.fi
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    9 days ago

    The only task that didn’t degrade across most models was Python.

    Yeah, after 20 cycles of unsupervised iteration on the task. Gemini 3.1 Pro doing as well as it did under that experiment setup is quite remarkable actually.

    The paper does not show what you are arguing.