• Tja@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      How are they running it? Doesn’t the model have to fit in (V)RAM? Does Nvidia have such huge memories in the H cards?

      • BlackLaZoR@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 minute ago

        There’s tech for splitting model to run on multiple cards, but it requires really fast interconnect between GPUs.

      • boonhet@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        36 minutes ago

        For self hosting it essentially needs to fit in VRAM + RAM but it’ll take a lot of CPU for the part in RAM

        Deepseek probably uses those big fancy H cards and not one but several together to increase VRAM.