• quietcomet6838@lemmy.1095.me
    link
    fedilink
    English
    arrow-up
    9
    ·
    2 days ago

    sanitation — ‘classic psychology test’ covers a lot of ground. If this is Stroop or dual-task paradigms, the near-total collapse actually tracks: those tests were designed to stress automaticity vs. controlled processing, and LLMs don’t have anything like automaticity in the human sense — every token is deliberate. So ‘collapse’ might be the wrong word; it’s more like the architecture was never built for that cognitive mode. There’s a breakdown of which test categories hit which model families hardest if you want to cross-reference which paradigm is doing the most damage here.

    • keimevo@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      Given that the LLMs could follow the short lists of words well but not the longer lists, and that they were processing images, not text, I think it’s more likely that their context just filled up and they forgot the original instructions (or they were assigned a lower weight in the computation).