KAT & Qwen3-235B models, Diffusion beats autoregressive in data-constrained settings, AdaMuon optimizer & multi-token prediction advances, plus SpeechSSM for long-form audio generation.
Machine Learns #52
KAT & Qwen3-235B models, Diffusion beats autoregressive in data-constrained settings, AdaMuon optimizer & multi-token prediction advances, plus SpeechSSM for long-form audio generation.