Below are some interesting paper I read in 2024. Feel free to leave a comment or email me to share or suggest more exciting papers to yuchaohuang [at] g [dot] ntu [dot] edu!


Theoretical Works

  • How Transformers Learn Causal Structure with Gradient Descent
    Eshaan Nichani, Alex Damian, Jason D. Lee (2024).
    arXiv preprint, arXiv:2402.14735.
    Link to Paper

  • Provably Learning a Multi-Head Attention Layer
    Sitan Chen, Yuanzhi Li (2024).
    arXiv preprint, arXiv:2402.04084.
    Link to Paper

  • Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
    Tri Dao, Albert Gu (2024).
    arXiv preprint, arXiv:2405.21060.
    Link to Paper

  • A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
    Yingqian Cui, Pengfei He, Xianfeng Tang, Qi He, Chen Luo, Jiliang Tang, Yue Xing (2024).
    arXiv preprint, arXiv:2410.16540.
    Link to Paper

Diffusion Model

  • Slight Corruption in Pre-training Data Makes Better Diffusion Models
    Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha Raj (2024).
    arXiv preprint, arXiv:2405.20494.
    Link to Paper

  • Learning Diffusion at Lightspeed
    Antonio Terpin, Nicolas Lanzetti, Martín Gadea, Florian Dörfler (2024).
    arXiv preprint, arXiv:2406.12616.
    Link to Paper

  • Generalized Schrödinger Bridge Matching
    Guan-Horng Liu, Yaron Lipman, Maximilian Nickel, Brian Karrer, Evangelos A. Theodorou, Ricky TQ Chen (2023).
    arXiv preprint, arXiv:2310.02233.
    Link to Paper

  • A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training
    Kai Wang, Mingjia Shi, Yukun Zhou, Zekai Li, Zhihang Yuan, Yuzhang Shang, Xiaojiang Peng, Hanwang Zhang, Yang You (2024).
    arXiv preprint, arXiv:2405.17403.
    Link to Paper

  • Diffusion Forcing: Next-Token Prediction Meets Full-Sequence Diffusion
    Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann (2024).
    arXiv preprint, arXiv:2407.01392.
    Link to Paper

Foundation Model

  • Evaluating Quantized Large Language Models
    Shiyao Li, Xuefei Ning, Luning Wang, Tengxuan Liu, Xiangsheng Shi, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang (2024).
    arXiv preprint, arXiv:2402.18158.
    Link to Paper

  • scGPT: Toward Building a Foundation Model for Single-Cell Multi-Omics Using Generative AI
    Haotian Cui, Chloe Wang, Hassaan Maan, Kuan Pang, Fengning Luo, Nan Duan, Bo Wang (2024).
    Nature Methods, 1–11.
    Link to Paper

  • Thinking LLMs: General Instruction Following with Thought Generation
    Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, Sainbayar Sukhbaatar (2024).
    arXiv preprint, arXiv:2410.10630.
    Link to Paper

  • LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens
    Yiran Ding, Li Lyna Zhang, Chengruidong Zhang, Yuanyuan Xu, Ning Shang, Jiahang Xu, Fan Yang, Mao Yang (2024).
    arXiv preprint, arXiv:2402.13753.
    Link to Paper

  • Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
    Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi (2024).
    arXiv preprint, arXiv:2408.16737.
    Link to Paper

  • Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
    Jinheng Xie, Weijia Mao, Zechen Bai, David Junhao Zhang, Weihao Wang, Kevin Qinghong Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Zheng Shou (2024).
    arXiv preprint, arXiv:2408.12528.
    Link to Paper

  • Unified Training of Universal Time Series Forecasting Transformers
    Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, Doyen Sahoo (2024).
    arXiv preprint, arXiv:2402.02592.
    Link to Paper

  • Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
    Xu Liu, Juncheng Liu, Gerald Woo, Taha Aksu, Yuxuan Liang, Roger Zimmermann, Chenghao Liu, Silvio Savarese, Caiming Xiong, Doyen Sahoo (2024).
    arXiv preprint, arXiv:2410.10469.
    Link to Paper

  • A Decoder-Only Foundation Model for Time-Series Forecasting
    Abhimanyu Das, Weihao Kong, Rajat Sen, Yichen Zhou (2023).
    arXiv preprint, arXiv:2310.10688.
    Link to Paper

  • Chronos: Learning the Language of Time Series
    Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang (2024).
    arXiv preprint, arXiv:2403.07815.
    Link to Paper

  • Cell2Sentence: Teaching Large Language Models the Language of Biology
    Daniel Levine, Syed Asad Rizvi, Sacha Lévy, Nazreen Pallikkavaliyaveetil, David Zhang, Xingyu Chen, Sina Ghadermarzi, Ruiming Wu, Zihe Zheng, Ivan Vrkic, et al. (2023).
    BioRxiv, Cold Spring Harbor Laboratory.
    Link to Paper

Transformer

  • ngpt: Normalized Transformer with Representation Learning on the Hypersphere
    Ilya Loshchilov, Cheng-Ping Hsieh, Simeng Sun, Boris Ginsburg (2024).
    arXiv preprint, arXiv:2410.01131.
    Link to Paper

  • Differential Transformer
    Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu Wei (2024).
    arXiv preprint, arXiv:2410.05258.
    Link to Paper

Misc

  • Theory, Analysis, and Best Practices for Sigmoid Self-Attention
    Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, et al. (2024).
    arXiv preprint, arXiv:2409.04431.
    Link to Paper

  • De Novo Design of High-Affinity Protein Binders with AlphaProteo
    Vinicius Zambaldi, David La, Alexander E. Chu, Harshnira Patani, Amy E. Danson, Tristan O.C. Kwan, Thomas Frerix, Rosalia G. Schneider, David Saxton, Ashok Thillaisundaram, et al. (2024).
    arXiv preprint, arXiv:2409.08022.
    Link to Paper

  • Learning to (Learn at Test Time): RNNs with Expressive Hidden States
    Yu Sun, Xinhao Li, Karan Dalal, Jiarui Xu, Arjun Vikram, Genghan Zhang, Yann Dubois, Xinlei Chen, Xiaolong Wang, Sanmi Koyejo, et al. (2024).
    arXiv preprint, arXiv:2407.04620.
    Link to Paper

  • The Unbearable Slowness of Being
    Jieyu Zheng, Markus Meister (2024).
    arXiv preprint, arXiv:2408.10234.
    Link to Paper

  • Discrete Flow Matching
    Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky TQ Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman (2024).
    arXiv preprint, arXiv:2407.15595.
    Link to Paper