Machine Learns #50
Anthropic's copyright victory, OpenAI & Google in court , model releases POLARIS & MiniMax-M1, ZipVoice TTS & LeVo song generation, plus new research in reinforcement learning & self-adapting LM
Join us on Discord at for more discussions and updates!
📌 Bookmarks
Anthropic won a significant legal victory regarding AI copyright, with a court ruling that training AI on copyrighted works is fair use. However, the company still faces trial over allegations of using pirated materials, having amassed over seven million pirated books, which could lead to billions in damages.
OpenAI and io are developing an AI hardware device amid a trademark dispute with Google backed iyO, which is creating custom earpieces. The first prototype may not be headphones, and the companies are exploring various device formats. OpenAI executives have researched in-ear technology and engaged with iyO's leadership, but the launch of the product is still over a year away.
Meta is in talks to hire AI investors Nat Friedman and Daniel Gross while considering a partial buyout of their venture capital fund, NFDG
Honda successfully launched and landed its prototype reusable rocket, marking its entry into the space race with a goal of suborbital flight by 2029.
Elon Musk's xAI is reportedly spending $1 billion a month as costs continue to escalate.
🤖 Model Releases
ZipVoice is a high-quality zero-shot text-to-speech model that offers fast inference and state-of-the-art voice cloning performance with support for multiple languages.
Hunyuan3D 2.0 is a model for generating high-resolution textured 3D assets using advanced diffusion techniques.
POLARIS is a cutting-edge reasoning model achieving state-of-the-art performance with advanced training techniques and open-sourced resources.
MultiTalk is an open-source audio-driven model for generating multi-person conversational videos with state-of-the-art lip synchronization and support for various resolutions and character interactions.
LeVo is a high-quality song generation model that utilizes a dual-token framework for creating harmonious music with vocals and accompaniment, outperforming existing open-source models.
Kyutai STT is a real-time streaming speech-to-text model that balances low latency and high accuracy, supporting multiple concurrent conversations.
MiniMax-M1 is the world's first open-weight, large-scale hybrid-attention reasoning model.
📎 Papers
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
What's new
Reinforcement Learning with Verifiable Rewards (RLVR) through token entropy patterns.
Finding high-entropy minority tokens critical for effective reasoning in Large Language Models (LLMs).
How it works
Analyzes how different tokens influence reasoning performance in Chain-of-Thought (CoT) reasoning.
Only a small fraction of tokens exhibit high entropy, acting as critical forks for diverse reasoning pathways.
RLVR adjusts the entropy of high-entropy tokens while adhering to base model patterns.
Restricts policy gradient updates to high-entropy (forking) tokens.
Results
Using only 20% of tokens maintains performance comparable to full-gradient updates on Qwen3-8B model.
Surpasses full-gradient updates on Qwen3-32B (+11.04 on AIME'25 and +7.71 on AIME'24) and Qwen3-14B (+4.79 on AIME'25 and +5.21 on AIME'24) models.
Training on the 80% lowest-entropy tokens leads to a marked decline in performance.
Self-Adapting Language Model
What's new
Enables LLMs to self-adapt by generating their own finetuning data and update directives.
How it works
SEAL allows LLMs to produce "self-edits" based on new inputs, which can restructure information, specify optimization hyperparameters, or invoke tools for data augmentation.
Self-edits are generated through a reinforcement learning (RL) loop, where the model is rewarded based on the downstream performance of the updated model.
The framework consists of two nested loops:
Outer RL loop: Optimizes self-edit generation.
Inner update loop: Applies generated self-edits to update model weights via supervised finetuning (SFT).
The model generates self-edits by sampling from its current policy, which is continuously refined based on performance feedback.
Evaluated in two domains: knowledge incorporation (integrating new factual information) and few-shot learning (generalizing from limited examples).
Results
In few-shot learning tasks, SEAL achieved a success rate of 72.5%, significantly outperforming baselines (20% and 0%).
In knowledge incorporation, SEAL improved accuracy on SQuAD tasks to 47.0%, surpassing results from GPT-4.1 synthetic data.
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
What's new
Autoregressive U-Nets (AU-Net) for language modeling.
How it works
Pooling selects vectors based on a splitting function.
Upsampling expands pooled vectors to fill segments, using a separate linear layer for each position.
Deeper stages can predict further ahead in the sequence.
With 4 stages, the deepest stage contributes to predicting the next four words.
Results
AU-Net (2/3 stages) performs comparably to a strong BPE Transformer baseline.
Matches baseline on tasks like Hellaswag and ARC Easy.
Catches up on TQA at higher compute.
Underperformance on GSM8K linked to limited math data in the DCLM pretraining corpus.
👨💻Open-Source
GitHub - mirage-project/mirage: Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
GitHub - tensorzero/tensorzero: TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.
GitHub - topoteretes/cognee: Memory for AI Agents in 5 lines of code