(*) denotes equal contribution
For a complete list, please check my Google Scholar.
- arXivChronos: Learning the Language of Time SeriesAbdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Syndar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang WangarXiv preprint arXiv:2403.07815, 2024
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
@article{ansari2024chronos, author = {Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Syndar and Pineda Arango, Sebastian and Kapoor, Shubham and Zschiegner, Jasper and Maddix, Danielle C. and Mahoney, Michael W. and Torkkola, Kari and Gordon Wilson, Andrew and Bohlke-Schneider, Michael and Wang, Yuyang}, title = {{Chronos}: Learning the Language of Time Series}, journal = {arXiv preprint arXiv:2403.07815}, year = {2024}, }
- ICML OralNeural Continuous-Discrete State Space Models for Irregularly-Sampled Time SeriesAbdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold SohIn International Conference on Machine Learning, 2023Oral Presentation (top 2.4%)
Learning accurate predictive models of real-world dynamic phenomena (e.g., climate, biological) remains a challenging task. One key issue is that the data generated by both natural and artificial processes often comprise time series that are irregularly sampled and/or contain missing observations. In this work, we propose the Neural Continuous-Discrete State Space Model (NCDSSM) for continuous-time modeling of time series through discrete-time observations. NCDSSM employs auxiliary variables to disentangle recognition from dynamics, thus requiring amortized inference only for the auxiliary variables. Leveraging techniques from continuous-discrete filtering theory, we demonstrate how to perform accurate Bayesian inference for the dynamic states. We propose three flexible parameterizations of the latent dynamics and an efficient training objective that marginalizes the dynamic states during inference. Empirical results on multiple benchmark datasets across various domains show improved imputation and forecasting performance of NCDSSM over existing models.
@inproceedings{ansari2023neural, author = {Ansari, Abdul Fatir and Heng, Alvin and Lim, Andre and Soh, Harold}, title = {Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series}, year = {2023}, booktitle = {International Conference on Machine Learning}, pubtype = {Oral}, note = {Oral Presentation (top 2.4%)} }
- NeurIPSPredict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series ForecastingMarcel Kollovieh*, Abdul Fatir Ansari*, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Bernie WangIn Neural Information Processing Systems, 2023
Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (predict). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (refine). Notably, the generative performance of the model remains intact – downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (synthesize).
@inproceedings{kollovieh2023predict, title = {Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting}, author = {Kollovieh*, Marcel and Ansari*, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Bernie}, year = {2023}, booktitle = {Neural Information Processing Systems}, }
- arXivGenerative Modeling with Flow-Guided Density Ratio LearningAlvin Heng, Abdul Fatir Ansari, and Harold SoharXiv preprint arXiv:2303.03714, 2023
We present Flow-Guided Density Ratio Learning (FDRL), a simple and scalable approach to generative modeling which builds on the stale (time-independent) approximation of the gradient flow of entropy-regularized f-divergences introduced in DGflow. In DGflow, the intractable time-dependent density ratio is approximated by a stale estimator given by a GAN discriminator. This is sufficient in the case of sample refinement, where the source and target distributions of the flow are close to each other. However, this assumption is invalid for generation and a naive application of the stale estimator fails due to the large chasm between the two distributions. FDRL proposes to train a density ratio estimator such that it learns from progressively improving samples during the training process. We show that this simple method alleviates the density chasm problem, allowing FDRL to generate images of dimensions as high as 128×128, as well as outperform existing gradient flow baselines on quantitative benchmarks. We also show the flexibility of FDRL with two use cases. First, unconditional FDRL can be easily composed with external classifiers to perform class-conditional generation. Second, FDRL can be directly applied to unpaired image-to-image translation with no modifications needed to the framework.
@article{heng2023generative, title = {Generative Modeling with Flow-Guided Density Ratio Learning}, author = {Heng, Alvin and Ansari, Abdul Fatir and Soh, Harold}, journal = {arXiv preprint arXiv:2303.03714}, year = {2023}, }
- PhD ThesisDeep Generative Modeling for Images and Time SeriesAbdul Fatir AnsariNational University of Singapore, 2022Dean’s Graduate Research Excellence Award
@phdthesis{ansari2022deep, title = {Deep Generative Modeling for Images and Time Series}, author = {Ansari, Abdul Fatir}, year = {2022}, school = {National University of Singapore}, note = {Dean’s Graduate Research Excellence Award}, }
- ICLRRefining Deep Generative Models via Discriminator Gradient FlowAbdul Fatir Ansari, Ming Liang Ang, and Harold SohIn International Conference on Learning Representations, 2021
Deep generative modeling has seen impressive advances in recent years, to the point where it is now commonplace to see simulated samples (e.g., images) that closely resemble real-world data. However, generation quality is generally inconsistent for any given model and can vary dramatically between samples. We introduce Discriminator Gradient flow (DGflow), a new technique that improves generated samples via the gradient flow of entropy-regularized f-divergences between the real and the generated data distributions. The gradient flow takes the form of a non-linear Fokker-Plank equation, which can be easily simulated by sampling from the equivalent McKean-Vlasov process. By refining inferior samples, our technique avoids wasteful sample rejection used by previous methods (DRS & MH-GAN). Compared to existing works that focus on specific GAN variants, we show our refinement approach can be applied to GANs with vector-valued critics and even other deep generative models such as VAEs and Normalizing Flows. Empirical results on multiple synthetic, image, and text datasets demonstrate that DGflow leads to significant improvement in the quality of generated samples for a variety of generative models, outperforming the state-of-the-art Discriminator Optimal Transport (DOT) and Discriminator Driven Latent Sampling (DDLS) methods.
@inproceedings{Ansari2021RefiningDG, title = {Refining Deep Generative Models via Discriminator Gradient Flow}, author = {Ansari, Abdul Fatir and Ang, Ming Liang and Soh, Harold}, booktitle = {International Conference on Learning Representations}, year = {2021}, }
- NeurIPSDeep Explicit Duration Switching Models for Time SeriesAbdul Fatir Ansari*, Konstantinos Benidis*, Richard Kurle, Ali Caner Turkmen, Harold Soh, Alex Smola, Bernie Wang, and Tim JanuschowskiIn Neural Information Processing Systems, 2021
Many complex time series can be effectively subdivided into distinct regimes that exhibit persistent dynamics. Discovering the switching behavior and the statistical patterns in these regimes is important for understanding the underlying dynamical system. We propose the Recurrent Explicit Duration Switching Dynamical System (RED-SDS), a flexible model that is capable of identifying both state- and time-dependent switching dynamics. State-dependent switching is enabled by a recurrent state-to-switch connection and an explicit duration count variable is used to improve the time-dependent switching behavior. We demonstrate how to perform efficient inference using a hybrid algorithm that approximates the posterior of the continuous states via an inference network and performs exact inference for the discrete switches and counts. The model is trained by maximizing a Monte Carlo lower bound of the marginal log-likelihood that can be computed efficiently as a byproduct of the inference routine. Empirical results on multiple datasets demonstrate that RED-SDS achieves considerable improvement in time series segmentation and competitive forecasting performance against the state of the art.
@inproceedings{ansari2021deep, author = {Ansari*, Abdul Fatir and Benidis*, Konstantinos and Kurle, Richard and Turkmen, Ali Caner and Soh, Harold and Smola, Alex and Wang, Bernie and Januschowski, Tim}, title = {Deep Explicit Duration Switching Models for Time Series}, year = {2021}, booktitle = {Neural Information Processing Systems}, }
- CVPR OralA Characteristic Function Approach to Deep Implicit Generative ModelingAbdul Fatir Ansari, Jonathan Scarlett, and Harold SohIEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020Oral Presentation (top 5%)
Implicit Generative Models (IGMs) such as GANs have emerged as effective data-driven models for generating samples, particularly images. In this paper, we formulate the problem of learning an IGM as minimizing the expected distance between characteristic functions. Specifically, we minimize the distance between characteristic functions of the real and generated data distributions under a suitably-chosen weighting distribution. This distance metric, which we term as the characteristic function distance (CFD), can be (approximately) computed with linear time-complexity in the number of samples, in contrast with the quadratic-time Maximum Mean Discrepancy (MMD). By replacing the discrepancy measure in the critic of a GAN with the CFD, we obtain a model that is simple to implement and stable to train. The proposed metric enjoys desirable theoretical properties including continuity and differentiability with respect to generator parameters, and continuity in the weak topology. We further propose a variation of the CFD in which the weighting distribution parameters are also optimized during training; this obviates the need for manual tuning, and leads to an improvement in test power relative to CFD. We demonstrate experimentally that our proposed method outperforms WGAN and MMD-GAN variants on a variety of unsupervised image generation benchmarks.
@article{Ansari2020ACF, title = {A Characteristic Function Approach to Deep Implicit Generative Modeling}, author = {Ansari, Abdul Fatir and Scarlett, Jonathan and Soh, Harold}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2020}, pages = {7476-7484}, pubtype = {Oral}, note = {Oral Presentation (top 5%)} }
- AAAI SpotlightHyperprior induced unsupervised disentanglement of latent representationsAbdul Fatir Ansari, and Harold SohIn Proceedings of the AAAI Conference on Artificial Intelligence, 2019
We address the problem of unsupervised disentanglement of latent representations learnt via deep generative models. In contrast to current approaches that operate on the evidence lower bound (ELBO), we argue that statistical independence in the latent space of VAEs can be enforced in a principled hierarchical Bayesian manner. To this effect, we augment the standard VAE with an inverse-Wishart (IW) prior on the covariance matrix of the latent code. By tuning the IW parameters, we are able to encourage (or discourage) independence in the learnt latent dimensions. Extensive experimental results on a range of datasets (2DShapes, 3DChairs, 3DFaces and CelebA) show our approach to outperform the β-VAE and is competitive with the state-of-the-art FactorVAE. Our approach achieves significantly better disentanglement and reconstruction on a new dataset (CorrelatedEllipses) which introduces correlations between the factors of variation.
@inproceedings{ansari2019hyperprior, title = {Hyperprior induced unsupervised disentanglement of latent representations}, author = {Ansari, Abdul Fatir and Soh, Harold}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {33}, pages = {3175--3182}, year = {2019}, pubtype = {Spotlight} }