• brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 months ago

    Yeah, and it’s just fp8 truncation right? Not actual “smart” quantization? That’s even a big hit for huge decoder-only llms.