Recently, methods investigating how to adapt large language models (LLMs) for specific scenarios have gained great attention. Particularly, the concept of persona, originally adopted in dialogue literature, has re-surged as a promising avenue. However, the growing research on persona is relatively disorganized, lacking a systematic overview. To close the gap, we present a comprehensive survey to categorize the current state of the field. We identify two lines of research, namely (1) LLM Role-Playing, where personas are assigned to LLMs, and (2) LLM Personalization, where LLMs take care of user personas. To the best of our knowledge, we present the first survey tailored for LLM role-playing and LLM personalization under the uni- fied view of persona, including taxonomy, current challenges, and potential directions. To foster future endeavors, we actively maintain a paper collection available to the community.
ICML
BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model
Chenwei Xu*, Yu-Chao Huang*, Jerry Yao-Chieh Hu*, Weijian Li, and 3 more authors
International Conference on Machine Learning (ICML), 2024
We introduce the Bi-Directional Sparse Hopfield Network (BiSHop), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and attention mechanisms. Consequently, BiSHop uses a dual-component approach, sequentially processing data both column-wise and row-wise through two interconnected directional learning modules. Computationally, these modules house layers of generalized sparse modern Hopfield layers, a sparse extension of the modern Hopfield model with adaptable sparsity. Methodologically, BiSHop facilitates multi-scale representation learning, capturing both intra-feature and inter-feature interactions, with adaptive sparsity at each scale. Empirically, through experiments on diverse real-world datasets, we demonstrate that BiSHop surpasses current SOTA methods with significantly less HPO runs, marking it a robust solution for deep tabular learning.
Under Review
L2O-g†: Learning to Optimize Parameterized Quantum Circuits with Fubini-Study Metric Tensor
Before the advent of fault-tolerant quantum computers, variational quantum algorithms (VQAs) play a crucial role in noisy intermediate-scale quantum (NISQ) machines. Conventionally, the optimization of VQAs predominantly relies on manually designed optimizers. However, learning to optimize (L2O) demonstrates impressive performance by training small neural networks to replace handcrafted optimizers. In our work, we propose L2O-g†, a quantum-aware learned optimizer that leverages the Fubini-Study metric tensor (g†) and long short-term memory networks. We theoretically derive the update equation inspired by the lookahead optimizer and incorporate the quantum geometry of the optimization landscape in the learned optimizer to balance fast convergence and generalization. Empirically, we conduct comprehensive experiments across a range of VQA problems. Our results demonstrate that L2O-g† not only outperforms the current SOTA hand-designed optimizer without any hyperparameter tuning but also shows strong out-of-distribution generalization compared to previous L2O optimizers. We achieve this by training L2O-g† on just a single generic PQC instance. Our novel quantum-aware learned optimizer, L2O-g†, presents an advancement in addressing the challenges of VQAs, making it a valuable tool in the NISQ era.
ArXiv
Test-Time Training with Quantum Auto-Encoder: From Distribution Shift to Noisy Quantum Circuits
In this paper, we propose test-time training with the quantum auto-encoder (QTTT). QTTT adapts to (1) data distribution shifts between training and testing data and (2) quantum circuit error by minimizing the self-supervised loss of the quantum auto-encoder. Empirically, we show that QTTT is robust against data distribution shifts and effective in mitigating random unitary noise in the quantum circuits during the inference. Additionally, we establish the theoretical performance guarantee of the QTTT architecture. Our novel framework presents a significant advancement in developing quantum neural networks for future real-world applications and functions as a plug-and-play extension for quantum machine learning models.
ArXiv
On Statistical Rates of Conditional Diffusion Transformers: Approximation, Estimation and Minimax Optimality
Jerry Yao-Chieh Hu*, Weimin Wu*, Yi-Chen Lee*, Yu-Chao Huang*, and 2 more authors
We investigate the approximation and estimation rates of conditional diffusion transformers (DiTs) with classifier-free guidance. We present a comprehensive analysis for “in-context” conditional DiTs under four common data assumptions. We show that both conditional DiTs and their latent variants lead to the minimax optimality of unconditional DiTs under identified settings. Specifically, we discretize the input domains into infinitesimal grids and then perform a term-by-term Taylor expansion on the conditional diffusion score function under Hölder smooth data assumption. This enables fine-grained use of transformers’ universal approximation through a more detailed piecewise constant approximation and hence obtains tighter bounds. Additionally, we extend our analysis to the latent setting under the linear latent subspace assumption. We not only show that latent conditional DiTs achieve lower bounds than conditional DiTs both in approximation and estimation, but also show the minimax optimality of latent unconditional DiTs. Our findings establish statistical limits for conditional and unconditional DiTs, and offer practical guidance toward developing more efficient and accurate DiT models.