索引 [Visual Generation Pretrain] Towards Scalable Pre-training of Visual Tokenizers for Generation [Visual Pretrain] Next-Embedding Prediction Makes Strong Vision Learners DEMYSTIFYING CLIP DATA [Video Action] Modeling Video Evolution For Action Recognition Convolutional Two-Stream Network Fusion for Video Action Recognition Written on December 21, 2025