Zero-shot tabular prediction via adversarial transformer
Introducing APT, an adversarially pre-trained transformer achieving SOTA on small tabular tasks.
Authored by Yulun Wu
We present an adversarially pre-trained transformer (APT) that is able to perform zero-shot meta-learning on tabular prediction tasks without using any real-world dataset to pre-train the model, extending on the recent development of prior-data fitted networks (PFNs). Specifically, APT is pre-trained with adversarial synthetic data agents, who continue to shift their underlying data generating distribution and deliberately challenge the model with different synthetic datasets. In addition, we propose a patch embedding block within the transformer architecture to handle datasets with large number of features, and more importantly, a mixture block architecture that is able to handle classification tasks with arbitrary number of classes, addressing the class size limitations—a crucial weakness of prior tabular zero-shot learning algorithms. In experiments, we show that our framework achieves state-of-the-art performance on small tabular classification tasks without restrictions on feature size, class size, number of categorical features or number of missing values. We show that on regression tasks, where PFN-based models have not shown great performance, APT has made significant progress over TabPFN. In our analysis, we demonstrate that the adversarial synthetic data agents were able to generate a more diverse collection of data compared to the ordinary random generator in TabPFN, and had accelerated the pre-training process. In ablation study, we present statistics on the contribution of each proposed components above, providing evidence on their respective impact.