We investigate the approximation and estimation rates of conditional diffusion transformers (DiTs) with classifier-free guidance. We present a comprehensive analysis for “in-context” conditional DiTs under four common data assumptions. We show that both conditional DiTs and their latent variants lead to the minimax optimality of unconditional DiTs under identified settings. Specifically, we discretize the input domains into infinitesimal grids and then perform a term-by-term Taylor expansion on the conditional diffusion score function under Hölder smooth data assumption. This enables fine-grained use of transformers’ universal approximation through a more detailed piecewise constant approximation and hence obtains tighter bounds. Additionally, we extend our analysis to the latent setting under the linear latent subspace assumption. We not only show that latent conditional DiTs achieve lower bounds than conditional DiTs both in approximation and estimation, but also show the minimax optimality of latent unconditional DiTs. Our findings establish statistical limits for conditional and unconditional DiTs, and offer practical guidance toward developing more efficient and accurate DiT models.
@article{hu2024stat,title={On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality},author={Hu*, Jerry Yao-Chieh and Wu*, Weimin and Lee*, Yi-Chen and Huang*, Yu-Chao and Chen, Minshuo and Liu, Han},journal={International Conference on Learning Representations (ICLR)},note={* These authors contributed equally to this work},year={2025},url={https://arxiv.org/abs/2411.17522},}
2024
EMNLP
Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization
Yu-Min Tseng*, Yu-Chao Huang*, Teng-Yun Hsiao*, Yu-Ching Hsu, and 3 more authors
Findings of the Association for Computational Linguistics: EMNLP 2024, 2024
Recently, methods investigating how to adapt large language models (LLMs) for specific scenarios have gained great attention. Particularly, the concept of persona, originally adopted in dialogue literature, has re-surged as a promising avenue. However, the growing research on persona is relatively disorganized, lacking a systematic overview. To close the gap, we present a comprehensive survey to categorize the current state of the field. We identify two lines of research, namely (1) LLM Role-Playing, where personas are assigned to LLMs, and (2) LLM Personalization, where LLMs take care of user personas. To the best of our knowledge, we present the first survey tailored for LLM role-playing and LLM personalization under the uni- fied view of persona, including taxonomy, current challenges, and potential directions. To foster future endeavors, we actively maintain a paper collection available to the community.
@article{tseng2024two,title={Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization},author={Tseng*, Yu-Min and Huang*, Yu-Chao and Hsiao*, Teng-Yun and Hsu, Yu-Ching and Foo, Jia-Yin and Huang, Chao-Wei and Chen, Yun-Nung},note={* These authors contributed equally to this work},journal={Findings of the Association for Computational Linguistics: EMNLP 2024},year={2024},}
ICML
BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model
Chenwei Xu*, Yu-Chao Huang*, Jerry Yao-Chieh Hu*, Weijian Li, and 3 more authors
International Conference on Machine Learning (ICML), 2024
We introduce the Bi-Directional Sparse Hopfield Network (BiSHop), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning.
@article{xu2024bishop,title={BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model},author={Xu*, Chenwei and Huang*, Yu-Chao and Hu*, Jerry Yao-Chieh and Li, Weijian and Gilani, Ammar and Goan, Hsi-Sheng and Liu, Han},note={* These authors contributed equally to this work},journal={International Conference on Machine Learning (ICML)},year={2024},}
Under Review
L2O-g†: Learning to Optimize Parameterized Quantum Circuits with Fubini-Study Metric Tensor
Before the advent of fault-tolerant quantum computers, variational quantum algorithms (VQAs) play a crucial role in noisy intermediate-scale quantum (NISQ) machines. Conventionally, the optimization of VQAs predominantly relies on manually designed optimizers. However, learning to optimize (L2O) demonstrates impressive performance by training small neural networks to replace handcrafted optimizers. In our work, we propose L2O-g†, a quantum-aware learned optimizer that leverages the Fubini-Study metric tensor (g†) and long short-term memory networks. We theoretically derive the update equation inspired by the lookahead optimizer and incorporate the quantum geometry of the optimization landscape in the learned optimizer to balance fast convergence and generalization. Empirically, we conduct comprehensive experiments across a range of VQA problems. Our results demonstrate that L2O-g† not only outperforms the current SOTA hand-designed optimizer without any hyperparameter tuning but also shows strong out-of-distribution generalization compared to previous L2O optimizers. We achieve this by training L2O-g† on just a single generic PQC instance. Our novel quantum-aware learned optimizer, L2O-g†, presents an advancement in addressing the challenges of VQAs, making it a valuable tool in the NISQ era.
@article{huang2024l2o,title={L2O-g†: Learning to Optimize Parameterized Quantum Circuits with Fubini-Study Metric Tensor},author={Huang, Yu-Chao and Goan, Hsi-Sheng},journal={arXiv preprint arXiv:2407.14761},year={2024},}
ArXiv
Test-Time Training with Quantum Auto-Encoder: From Distribution Shift to Noisy Quantum Circuits
In this paper, we propose test-time training with the quantum auto-encoder (QTTT). QTTT adapts to (1) data distribution shifts between training and testing data and (2) quantum circuit error by minimizing the self-supervised loss of the quantum auto-encoder. Empirically, we show that QTTT is robust against data distribution shifts and effective in mitigating random unitary noise in the quantum circuits during the inference. Additionally, we establish the theoretical performance guarantee of the QTTT architecture. Our novel framework presents a significant advancement in developing quantum neural networks for future real-world applications and functions as a plug-and-play extension for quantum machine learning models.
@article{jian2024qttt,title={Test-Time Training with Quantum Auto-Encoder: From Distribution Shift to Noisy Quantum Circuits},author={Jian*, Damien and Huang*, Yu-Chao and Goan, Hsi-Sheng},note={* These authors contributed equally to this work},journal={arXiv preprint arXiv:2411.06828},year={2024},}