publications | Abdul Fatir Ansari

For a complete list, please check my Google Scholar.

2026

ICLR
Understanding transformers for time series: Rank structure, flow-of-ranks, and compressibility

Annan Yu, Danielle C Maddix, Boran Han, Xiyuan Zhang, Abdul Fatir Ansari, Oleksandr Shchur, Christos Faloutsos, Andrew Gordon Wilson, Michael W Mahoney, and Yuyang Wang

In The Fourteenth International Conference on Learning Representations, 2026

Abstract arXiv Bib

Transformers are widely used across data modalities, and yet the principles distilled from text models often transfer imperfectly to models trained to other modalities. In this paper, we analyze Transformers through the lens of rank structure. Our focus is on the time series setting, where the structural properties of the data differ remarkably from those of text or vision. We show that time-series embeddings, unlike text or vision, exhibit sharply decaying singular value spectra: small patch sizes and smooth continuous mappings concentrate the data into low-rank subspaces. From this, we prove that the associated Q/K/V projections admit accurate low-rank approximations, and that attention layers become compressible in proportion to the decay of the embedding spectrum. We introduce the concept of flow-of-ranks, a phenomenon by which nonlinear mixing across depth inflates the rank, explaining why early layers are most amenable to compression and why ranks grow with depth. Guided by these theoretical and empirical results, we use these insights to compress Chronos, a large time series foundation model, achieving a reduction of 65% in inference time and 81% in memory, without loss of accuracy. Our findings provide principled guidance for allocating width, depth, and heads in time series foundation models, and for exploiting their inherent compressibility.
@inproceedings{yu2025understanding, title = {Understanding transformers for time series: Rank structure, flow-of-ranks, and compressibility}, author = {Yu, Annan and Maddix, Danielle C and Han, Boran and Zhang, Xiyuan and Ansari, Abdul Fatir and Shchur, Oleksandr and Faloutsos, Christos and Wilson, Andrew Gordon and Mahoney, Michael W and Wang, Yuyang}, booktitle = {The Fourteenth International Conference on Learning Representations}, year = {2026}, url = {https://openreview.net/forum?id=axR2KZwaD3}, }
ICLR
Understanding the Implicit Biases of Design Choices for Time Series Foundation Models

Annan Yu, Danielle C Maddix, Boran Han, Xiyuan Zhang, Abdul Fatir Ansari, Oleksandr Shchur, Christos Faloutsos, Andrew Gordon Wilson, Michael W Mahoney, and Yuyang Wang

In The Fourteenth International Conference on Learning Representations, 2026

Abstract arXiv Bib

Time series foundation models (TSFMs) are a class of potentially powerful, general-purpose tools for time series forecasting and related temporal tasks, but their behavior is strongly shaped by subtle inductive biases in their design. Rather than developing a new model and claiming that it is better than existing TSFMs, e.g., by winning on existing well-established benchmarks, our objective is to understand how the various “knobs” of the training process affect model quality. Using a mix of theory and controlled empirical evaluation, we identify several design choices (patch size, embedding choice, training objective, etc.) and show how they lead to implicit biases in fundamental model properties (temporal behavior, geometric structure, how aggressively or not the model regresses to the mean, etc.); and we show how these biases can be intuitive or very counterintuitive, depending on properties of the model and data. We also illustrate in a case study on outlier handling how multiple biases can interact in complex ways; and we discuss implications of our results for learning the bitter lesson and building TSFMs.
@inproceedings{yu2025implicitbiases, title = {Understanding the Implicit Biases of Design Choices for Time Series Foundation Models}, author = {Yu, Annan and Maddix, Danielle C and Han, Boran and Zhang, Xiyuan and Ansari, Abdul Fatir and Shchur, Oleksandr and Faloutsos, Christos and Wilson, Andrew Gordon and Mahoney, Michael W and Wang, Yuyang}, booktitle = {The Fourteenth International Conference on Learning Representations}, year = {2026}, url = {https://openreview.net/forum?id=5jkzTzV5Ao}, }
ICLR
Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting

Mert Kayaalp, Caner Turkmen, Oleksandr Shchur, Pedro Mercado, Abdul Fatir Ansari, Michael Bohlke-Schneider, and Bernie Wang

In The Fourteenth International Conference on Learning Representations, 2026

Abstract arXiv Bib

Is bigger always better for time series foundation models? With the question in mind, we explore an alternative to training a single, large monolithic model: building a portfolio of smaller, pretrained forecasting models. By applying ensembling or model selection over these portfolios, we achieve competitive performance on large-scale benchmarks using much fewer parameters. We explore strategies for designing such portfolios and find that collections of specialist models consistently outperform portfolios of independently trained generalists. Remarkably, we demonstrate that post-training a base model is a compute-effective approach for creating sufficiently diverse specialists, and provide evidences that ensembling and model selection are more compute-efficient than test-time fine-tuning.
@inproceedings{kayaalp2026testtime, title = {Test-Time Efficient Pretrained Model Portfolios for Time Series Forecasting}, author = {Kayaalp, Mert and Turkmen, Caner and Shchur, Oleksandr and Mercado, Pedro and Ansari, Abdul Fatir and Bohlke-Schneider, Michael and Wang, Bernie}, booktitle = {The Fourteenth International Conference on Learning Representations}, year = {2026}, url = {https://openreview.net/forum?id=iqUMjxfDNH}, }

2025

ICLR
Gradient-Free Generation for Hard-Constrained Systems

Chaoran Cheng, Boran Han, Danielle C. Maddix, Abdul Fatir Ansari, Andrew Stuart, Michael W. Mahoney, and Bernie Wang

In The Thirteenth International Conference on Learning Representations, 2025

Abstract arXiv Bib

Generative models that satisfy hard constraints are critical in many scientific and engineering applications, where physical laws or system requirements must be strictly respected. Many existing constrained generative models, especially those developed for computer vision, rely heavily on gradient information, which is often sparse or computationally expensive in some fields, e.g., partial differential equations (PDEs). In this work, we introduce a novel framework for adapting pre-trained, unconstrained flow-matching models to satisfy constraints exactly in a zero-shot manner without requiring expensive gradient computations or fine-tuning. Our framework, ECI sampling, alternates between extrapolation (E), correction (C), and interpolation (I) stages during each iterative sampling step of flow matching sampling to ensure accurate integration of constraint information while preserving the validity of the generation. We demonstrate the effectiveness of our approach across various PDE systems, showing that ECI-guided generation strictly adheres to physical constraints and accurately captures complex distribution shifts induced by these constraints. Empirical results demonstrate that our framework consistently outperforms baseline approaches in various zero-shot constrained generation tasks and also achieves competitive results in the regression tasks without additional fine-tuning.
@inproceedings{cheng2025gradientfree, title = {Gradient-Free Generation for Hard-Constrained Systems}, author = {Cheng, Chaoran and Han, Boran and Maddix, Danielle C. and Ansari, Abdul Fatir and Stuart, Andrew and Mahoney, Michael W. and Wang, Bernie}, booktitle = {The Thirteenth International Conference on Learning Representations}, year = {2025}, url = {https://openreview.net/forum?id=teE4pl9ftK}, }
AISTATS
ChronosX: Adapting Pretrained Time Series Models with Exogenous Variables

Sebastian Pineda Arango, Pedro Mercado, Shubham Kapoor, Abdul Fatir Ansari, Lorenzo Stella, Huibin Shen, Hugo Senetaire, Caner Turkmen, Oleksandr Shchur, Danielle C. Maddix, Michael Bohlke-Schneider, Bernie Wang, and Syama Sundar Rangapuram

In Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025

Abstract arXiv Bib

Covariates provide valuable information on external factors that influence time series and are critical in many real-world time series forecasting tasks. For example, in retail, covariates may indicate promotions or peak dates such as holiday seasons that heavily influence demand forecasts. Recent advances in pretraining large language model architectures for time series forecasting have led to highly accurate forecasters. However, the majority of these models do not readily use covariates as they are often specific to a certain task or domain. This paper introduces a new method to incorporate covariates into pretrained time series forecasting models. Our proposed approach incorporates covariate information into pretrained forecasting models through modular blocks that inject past and future covariate information, without necessarily modifying the pretrained model in consideration. In order to evaluate our approach, we introduce a benchmark composed of 32 different synthetic datasets with varying dynamics to evaluate the effectivity of forecasting models with covariates. Extensive evaluations on both synthetic and real datasets show that our approach effectively incorporates covariate information into pretrained models, outperforming existing baselines.
@inproceedings{pmlr-v258-arango25a, title = {ChronosX: Adapting Pretrained Time Series Models with Exogenous Variables}, author = {Arango, Sebastian Pineda and Mercado, Pedro and Kapoor, Shubham and Ansari, Abdul Fatir and Stella, Lorenzo and Shen, Huibin and Senetaire, Hugo and Turkmen, Caner and Shchur, Oleksandr and Maddix, Danielle C. and Bohlke-Schneider, Michael and Wang, Bernie and Rangapuram, Syama Sundar}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, year = {2025}, }
ICML
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization

Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael W. Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Sundar Rangapuram, Danielle C. Maddix, and Bernie Wang

In Forty-second International Conference on Machine Learning, 2025

Abstract arXiv Bib

How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.
@inproceedings{masserano2025enhancing, title = {Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization}, author = {Masserano, Luca and Ansari, Abdul Fatir and Han, Boran and Zhang, Xiyuan and Faloutsos, Christos and Mahoney, Michael W. and Wilson, Andrew Gordon and Park, Youngsuk and Rangapuram, Syama Sundar and Maddix, Danielle C. and Wang, Bernie}, booktitle = {Forty-second International Conference on Machine Learning}, year = {2025}, url = {https://openreview.net/forum?id=B6WalMoQJW}, }
arXiv
Zero-Shot Time Series Forecasting with Covariates via In-Context Learning

Andreas Auer, Raghul Parthipan, Pedro Mercado, Abdul Fatir Ansari, Lorenzo Stella, Bernie Wang, Michael Bohlke-Schneider, and Syama Sundar Rangapuram

arXiv preprint arXiv:2506.03128, 2025

Abstract arXiv Bib

Pretrained time series models, capable of zero-shot forecasting, have demonstrated significant potential in enhancing both the performance and accessibility of time series forecasting. However, existing pretrained models either do not support covariates or fail to incorporate them effectively. We introduce COSMIC, a zero-shot forecasting model that utilizes covariates via in-context learning. To address the challenge of data scarcity, we propose Informative Covariate Augmentation, which enables the training of COSMIC without requiring any datasets that include covariates. COSMIC achieves state-of-the-art performance in zero-shot forecasting, both with and without covariates. Our quantitative and qualitative analysis demonstrates that COSMIC effectively leverages covariates in zero-shot forecasting.
@article{auer2025zero, title = {Zero-Shot Time Series Forecasting with Covariates via In-Context Learning}, author = {Auer, Andreas and Parthipan, Raghul and Mercado, Pedro and Ansari, Abdul Fatir and Stella, Lorenzo and Wang, Bernie and Bohlke-Schneider, Michael and Rangapuram, Syama Sundar}, journal = {arXiv preprint arXiv:2506.03128}, year = {2025}, }
arXiv
Does Multimodality Lead to Better Time Series Forecasting?

Xiyuan Zhang, Boran Han, Haoyang Fang, Abdul Fatir Ansari, Shuai Zhang, Danielle C Maddix, Cuixiong Hu, Andrew Gordon Wilson, Michael W Mahoney, Hao Wang, and others

arXiv preprint arXiv:2506.21611, 2025

Abstract arXiv Bib

Recently, there has been growing interest in incorporating textual information into foundation models for time series forecasting. However, it remains unclear whether and under what conditions such multimodal integration consistently yields gains. We systematically investigate these questions across a diverse benchmark of 14 forecasting tasks spanning 7 domains, including health, environment, and economics. We evaluate two popular multimodal forecasting paradigms: aligning-based methods, which align time series and text representations; and prompting-based methods, which directly prompt large language models for forecasting. Although prior works report gains from multimodal input, we find these effects are not universal across datasets and models, and multimodal methods sometimes do not outperform the strongest unimodal baselines. To understand when textual information helps, we disentangle the effects of model architectural properties and data characteristics. Our findings highlight that on the modeling side, incorporating text information is most helpful given (1) high-capacity text models, (2) comparatively weaker time series models, and (3) appropriate aligning strategies. On the data side, performance gains are more likely when (4) sufficient training data is available and (5) the text offers complementary predictive signal beyond what is already captured from the time series alone. Our empirical findings offer practical guidelines for when multimodality can be expected to aid forecasting tasks, and when it does not.
@article{zhang2025does, title = {Does Multimodality Lead to Better Time Series Forecasting?}, author = {Zhang, Xiyuan and Han, Boran and Fang, Haoyang and Ansari, Abdul Fatir and Zhang, Shuai and Maddix, Danielle C and Hu, Cuixiong and Wilson, Andrew Gordon and Mahoney, Michael W and Wang, Hao and others}, journal = {arXiv preprint arXiv:2506.21611}, year = {2025}, }
arXiv
fev-bench: A realistic benchmark for time series forecasting

Oleksandr Shchur, Abdul Fatir Ansari, Caner Turkmen, Lorenzo Stella, Nick Erickson, Pablo Guerron, Michael Bohlke-Schneider, and Yuyang Wang

arXiv preprint arXiv:2509.26468, 2025

Abstract arXiv Bib

Benchmark quality is critical for meaningful evaluation and sustained progress in time series forecasting, particularly given the recent rise of pretrained models. Existing benchmarks often have narrow domain coverage or overlook important real-world settings, such as tasks with covariates. Additionally, their aggregation procedures often lack statistical rigor, making it unclear whether observed performance differences reflect true improvements or random variation. Many benchmarks also fail to provide infrastructure for consistent evaluation or are too rigid to integrate into existing pipelines. To address these gaps, we propose fev-bench, a benchmark comprising 100 forecasting tasks across seven domains, including 46 tasks with covariates. Supporting the benchmark, we introduce fev, a lightweight Python library for benchmarking forecasting models that emphasizes reproducibility and seamless integration with existing workflows. Usingfev, fev-bench employs principled aggregation methods with bootstrapped confidence intervals to report model performance along two complementary dimensions: win rates and skill scores. We report results on fev-bench for various pretrained, statistical and baseline models, and identify promising directions for future research.
@article{shchur2025fev, title = {fev-bench: A realistic benchmark for time series forecasting}, author = {Shchur, Oleksandr and Ansari, Abdul Fatir and Turkmen, Caner and Stella, Lorenzo and Erickson, Nick and Guerron, Pablo and Bohlke-Schneider, Michael and Wang, Yuyang}, journal = {arXiv preprint arXiv:2509.26468}, year = {2025}, }
NeurIPS
Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models

Xiyuan Zhang, Danielle C. Maddix, Junming Yin, Nick Erickson, Abdul Fatir Ansari, Boran Han, Shuai Zhang, Leman Akoglu, Christos Faloutsos, Michael W. Mahoney, Cuixiong Hu, Huzefa Rangwala, George Karypis, and Bernie Wang

In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2025

Abstract arXiv Bib

Since the seminal work of TabPFN, research on tabular foundation models (TFMs) based on in-context learning (ICL) has challenged long-standing paradigms in machine learning. Without seeing any real-world data, models pretrained on purely synthetic datasets generalize remarkably well across diverse datasets, often using only a moderate number of in-context examples. This shifts the focus in tabular machine learning from model architecture design to the design of synthetic datasets, or, more precisely, to the prior distributions that generate them. Yet the guiding principles for prior design remain poorly understood. This work marks the first attempt to address the gap. We systematically investigate and identify key properties of synthetic priors that allow pretrained TFMs to generalize well. Based on these insights, we introduce Mitra, a TFM trained on a curated mixture of synthetic priors selected for their diversity, distinctiveness, and performance on real-world tabular data. Mitra consistently outperforms state-of-the-art TFMs, such as TabPFNv2 and TabICL, across both classification and regression benchmarks, with better sample efficiency.
@inproceedings{zhang2025mitra, title = {Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models}, author = {Zhang, Xiyuan and Maddix, Danielle C. and Yin, Junming and Erickson, Nick and Ansari, Abdul Fatir and Han, Boran and Zhang, Shuai and Akoglu, Leman and Faloutsos, Christos and Mahoney, Michael W. and Hu, Cuixiong and Rangwala, Huzefa and Karypis, George and Wang, Bernie}, booktitle = {The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year = {2025}, url = {https://openreview.net/forum?id=t8YRsWY6HM}, }
arXiv
Chronos-2: From univariate to universal forecasting

Abdul Fatir Ansari, Oleksandr Shchur, Jaris Küken, Andreas Auer, Boran Han, Pedro Mercado, Syama Sundar Rangapuram, Huibin Shen, Lorenzo Stella, Xiyuan Zhang, and others

arXiv preprint arXiv:2510.15821, 2025

4.7K Github stars and 15M+ Hugging Face model downloads as of Jan 2026

Abstract arXiv Bib Blog Code

Pretrained time series models have enabled inference-only forecasting systems that produce accurate predictions without task-specific training. However, existing approaches largely focus on univariate forecasting, limiting their applicability in real-world scenarios where multivariate data and covariates play a crucial role. We present Chronos-2, a pretrained model capable of handling univariate, multivariate, and covariate-informed forecasting tasks in a zero-shot manner. Chronos-2 employs a group attention mechanism that facilitates in-context learning (ICL) through efficient information sharing across multiple time series within a group, which may represent sets of related series, variates of a multivariate series, or targets and covariates in a forecasting task. These general capabilities are achieved through training on synthetic datasets that impose diverse multivariate structures on univariate series. Chronos-2 delivers state-of-the-art performance across three comprehensive benchmarks: fev-bench, GIFT-Eval, and Chronos Benchmark II. On fev-bench, which emphasizes multivariate and covariate-informed forecasting, Chronos-2’s universal ICL capabilities lead to substantial improvements over existing models. On tasks involving covariates, it consistently outperforms baselines by a wide margin. Case studies in the energy and retail domains further highlight its practical advantages. The in-context learning capabilities of Chronos-2 establish it as a general-purpose forecasting model that can be used "as is" in real-world forecasting pipelines.
@article{ansari2025chronos, title = {Chronos-2: From univariate to universal forecasting}, author = {Ansari, Abdul Fatir and Shchur, Oleksandr and K{\"u}ken, Jaris and Auer, Andreas and Han, Boran and Mercado, Pedro and Rangapuram, Syama Sundar and Shen, Huibin and Stella, Lorenzo and Zhang, Xiyuan and others}, journal = {arXiv preprint arXiv:2510.15821}, year = {2025}, note = {4.7K Github stars and 15M+ Hugging Face model downloads as of Jan 2026}, }

2024

ECML PKDD
Generative Modeling with Flow-Guided Density Ratio Learning

Alvin Heng, Abdul Fatir Ansari, and Harold Soh

In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2024

Abstract arXiv Bib

We present Flow-Guided Density Ratio Learning (FDRL), a simple and scalable approach to generative modeling which builds on the stale (time-independent) approximation of the gradient flow of entropy-regularized f-divergences introduced in DGflow. In DGflow, the intractable time-dependent density ratio is approximated by a stale estimator given by a GAN discriminator. This is sufficient in the case of sample refinement, where the source and target distributions of the flow are close to each other. However, this assumption is invalid for generation and a naive application of the stale estimator fails due to the large chasm between the two distributions. FDRL proposes to train a density ratio estimator such that it learns from progressively improving samples during the training process. We show that this simple method alleviates the density chasm problem, allowing FDRL to generate images of dimensions as high as 128×128, as well as outperform existing gradient flow baselines on quantitative benchmarks. We also show the flexibility of FDRL with two use cases. First, unconditional FDRL can be easily composed with external classifiers to perform class-conditional generation. Second, FDRL can be directly applied to unpaired image-to-image translation with no modifications needed to the framework.
@inproceedings{heng2023generative, title = {Generative Modeling with Flow-Guided Density Ratio Learning}, author = {Heng, Alvin and Ansari, Abdul Fatir and Soh, Harold}, booktitle = {Joint European Conference on Machine Learning and Knowledge Discovery in Databases}, year = {2024}, }
TMLR
Chronos: Learning the Language of Time Series

Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Syndar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Bernie Wang

Transactions on Machine Learning Research, 2024

4.7K Github stars and 650M+ Hugging Face model downloads as of Jan 2026

Abstract arXiv Bib Blog Code

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
@article{ansari2024chronos, title = {Chronos: Learning the Language of Time Series}, author = {Ansari, Abdul Fatir and Stella, Lorenzo and Turkmen, Caner and Zhang, Xiyuan and Mercado, Pedro and Shen, Huibin and Shchur, Oleksandr and Rangapuram, Syama Syndar and Pineda Arango, Sebastian and Kapoor, Shubham and Zschiegner, Jasper and Maddix, Danielle C. and Wang, Hao and Mahoney, Michael W. and Torkkola, Kari and Wilson, Andrew Gordon and Bohlke-Schneider, Michael and Wang, Bernie}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2024}, url = {https://openreview.net/forum?id=gerNCVqqtR}, note = {4.7K Github stars and 650M+ Hugging Face model downloads as of Jan 2026}, }
arXiv
Comparing and contrasting deep learning weather prediction backbones on navier-stokes and atmospheric dynamics

Matthias Karlbauer, Danielle C Maddix, Abdul Fatir Ansari, Boran Han, Gaurav Gupta, Yuyang Wang, Andrew Stuart, and Michael W Mahoney

arXiv preprint arXiv:2407.14129, 2024

Abstract arXiv Bib

Remarkable progress in the development of Deep Learning Weather Prediction (DLWP) models positions them to become competitive with traditional numerical weather prediction (NWP) models. Indeed, a wide number of DLWP architectures – based on various backbones, including U-Net, Transformer, Graph Neural Network (GNN), and Fourier Neural Operator (FNO) – have demonstrated their potential at forecasting atmospheric states. However, due to differences in training protocols, forecast horizons, and data choices, it remains unclear which (if any) of these methods and architectures are most suitable for weather forecasting and for future model development. Here, we step back and provide a detailed empirical analysis, under controlled conditions, comparing and contrasting the most prominent DLWP models, along with their backbones. We accomplish this by predicting synthetic two-dimensional incompressible Navier-Stokes and real-world global weather dynamics. In terms of accuracy, memory consumption, and runtime, our results illustrate various tradeoffs. For example, on synthetic data, we observe favorable performance of FNO; and on the real-world WeatherBench dataset, our results demonstrate the suitability of ConvLSTM and SwinTransformer for short-to-mid-ranged forecasts. For long-ranged weather rollouts of up to 365 days, we observe superior stability and physical soundness in architectures that formulate a spherical data representation, i.e., GraphCast and Spherical FNO. In addition, we observe that all of these model backbones "saturate," i.e., none of them exhibit so-called neural scaling, which highlights an important direction for future work on these and related models.
@article{karlbauer2024comparing, title = {Comparing and contrasting deep learning weather prediction backbones on navier-stokes and atmospheric dynamics}, author = {Karlbauer, Matthias and Maddix, Danielle C and Ansari, Abdul Fatir and Han, Boran and Gupta, Gaurav and Wang, Yuyang and Stuart, Andrew and Mahoney, Michael W}, journal = {arXiv preprint arXiv:2407.14129}, year = {2024}, }

2023

ICML Oral
Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series

Abdul Fatir Ansari, Alvin Heng, Andre Lim, and Harold Soh

In International Conference on Machine Learning, 2023

Oral Presentation (top 2.4%)

Abstract arXiv Bib Code

Learning accurate predictive models of real-world dynamic phenomena (e.g., climate, biological) remains a challenging task. One key issue is that the data generated by both natural and artificial processes often comprise time series that are irregularly sampled and/or contain missing observations. In this work, we propose the Neural Continuous-Discrete State Space Model (NCDSSM) for continuous-time modeling of time series through discrete-time observations. NCDSSM employs auxiliary variables to disentangle recognition from dynamics, thus requiring amortized inference only for the auxiliary variables. Leveraging techniques from continuous-discrete filtering theory, we demonstrate how to perform accurate Bayesian inference for the dynamic states. We propose three flexible parameterizations of the latent dynamics and an efficient training objective that marginalizes the dynamic states during inference. Empirical results on multiple benchmark datasets across various domains show improved imputation and forecasting performance of NCDSSM over existing models.
@inproceedings{ansari2023neural, author = {Ansari, Abdul Fatir and Heng, Alvin and Lim, Andre and Soh, Harold}, title = {Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series}, year = {2023}, booktitle = {International Conference on Machine Learning}, pubtype = {Oral}, note = {Oral Presentation (top 2.4%)}, }
NeurIPS
Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

Marcel Kollovieh*, Abdul Fatir Ansari*, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Yuyang Wang

In Neural Information Processing Systems, 2023

Abstract arXiv Bib

Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of task-agnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (predict). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (refine). Notably, the generative performance of the model remains intact – downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (synthesize).
@inproceedings{kollovieh2023predict, title = {Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting}, author = {Kollovieh*, Marcel and Ansari*, Abdul Fatir and Bohlke-Schneider, Michael and Zschiegner, Jasper and Wang, Hao and Wang, Yuyang}, year = {2023}, booktitle = {Neural Information Processing Systems}, }

2022

PhD Thesis

Deep Generative Modeling for Images and Time Series

Abdul Fatir Ansari

National University of Singapore, 2022

Dean’s Graduate Research Excellence Award

Bib HTML

@phdthesis{ansari2022deep,
  title = {Deep Generative Modeling for Images and Time Series},
  author = {Ansari, Abdul Fatir},
  year = {2022},
  school = {National University of Singapore},
  note = {Dean’s Graduate Research Excellence Award},
}

2021

ICLR
Refining Deep Generative Models via Discriminator Gradient Flow

Abdul Fatir Ansari, Ming Liang Ang, and Harold Soh

In International Conference on Learning Representations, 2021

Abstract arXiv Bib Blog Code

Deep generative modeling has seen impressive advances in recent years, to the point where it is now commonplace to see simulated samples (e.g., images) that closely resemble real-world data. However, generation quality is generally inconsistent for any given model and can vary dramatically between samples. We introduce Discriminator Gradient flow (DGflow), a new technique that improves generated samples via the gradient flow of entropy-regularized f-divergences between the real and the generated data distributions. The gradient flow takes the form of a non-linear Fokker-Plank equation, which can be easily simulated by sampling from the equivalent McKean-Vlasov process. By refining inferior samples, our technique avoids wasteful sample rejection used by previous methods (DRS & MH-GAN). Compared to existing works that focus on specific GAN variants, we show our refinement approach can be applied to GANs with vector-valued critics and even other deep generative models such as VAEs and Normalizing Flows. Empirical results on multiple synthetic, image, and text datasets demonstrate that DGflow leads to significant improvement in the quality of generated samples for a variety of generative models, outperforming the state-of-the-art Discriminator Optimal Transport (DOT) and Discriminator Driven Latent Sampling (DDLS) methods.
@inproceedings{Ansari2021RefiningDG, title = {Refining Deep Generative Models via Discriminator Gradient Flow}, author = {Ansari, Abdul Fatir and Ang, Ming Liang and Soh, Harold}, booktitle = {International Conference on Learning Representations}, year = {2021}, }
NeurIPS
Deep Explicit Duration Switching Models for Time Series

Abdul Fatir Ansari*, Konstantinos Benidis*, Richard Kurle, Caner Turkmen, Harold Soh, Alex Smola, Yuyang Wang, and Tim Januschowski

In Neural Information Processing Systems, 2021

Abstract arXiv Bib Code

Many complex time series can be effectively subdivided into distinct regimes that exhibit persistent dynamics. Discovering the switching behavior and the statistical patterns in these regimes is important for understanding the underlying dynamical system. We propose the Recurrent Explicit Duration Switching Dynamical System (RED-SDS), a flexible model that is capable of identifying both state- and time-dependent switching dynamics. State-dependent switching is enabled by a recurrent state-to-switch connection and an explicit duration count variable is used to improve the time-dependent switching behavior. We demonstrate how to perform efficient inference using a hybrid algorithm that approximates the posterior of the continuous states via an inference network and performs exact inference for the discrete switches and counts. The model is trained by maximizing a Monte Carlo lower bound of the marginal log-likelihood that can be computed efficiently as a byproduct of the inference routine. Empirical results on multiple datasets demonstrate that RED-SDS achieves considerable improvement in time series segmentation and competitive forecasting performance against the state of the art.
@inproceedings{ansari2021deep, author = {Ansari*, Abdul Fatir and Benidis*, Konstantinos and Kurle, Richard and Turkmen, Caner and Soh, Harold and Smola, Alex and Wang, Yuyang and Januschowski, Tim}, title = {Deep Explicit Duration Switching Models for Time Series}, year = {2021}, booktitle = {Neural Information Processing Systems}, }

2020

CVPR Oral
A Characteristic Function Approach to Deep Implicit Generative Modeling

Abdul Fatir Ansari, Jonathan Scarlett, and Harold Soh

IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Oral Presentation (top 5%)

Abstract arXiv Bib Code

Implicit Generative Models (IGMs) such as GANs have emerged as effective data-driven models for generating samples, particularly images. In this paper, we formulate the problem of learning an IGM as minimizing the expected distance between characteristic functions. Specifically, we minimize the distance between characteristic functions of the real and generated data distributions under a suitably-chosen weighting distribution. This distance metric, which we term as the characteristic function distance (CFD), can be (approximately) computed with linear time-complexity in the number of samples, in contrast with the quadratic-time Maximum Mean Discrepancy (MMD). By replacing the discrepancy measure in the critic of a GAN with the CFD, we obtain a model that is simple to implement and stable to train. The proposed metric enjoys desirable theoretical properties including continuity and differentiability with respect to generator parameters, and continuity in the weak topology. We further propose a variation of the CFD in which the weighting distribution parameters are also optimized during training; this obviates the need for manual tuning, and leads to an improvement in test power relative to CFD. We demonstrate experimentally that our proposed method outperforms WGAN and MMD-GAN variants on a variety of unsupervised image generation benchmarks.
@article{Ansari2020ACF, title = {A Characteristic Function Approach to Deep Implicit Generative Modeling}, author = {Ansari, Abdul Fatir and Scarlett, Jonathan and Soh, Harold}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2020}, pages = {7476-7484}, pubtype = {Oral}, note = {Oral Presentation (top 5%)} }

2019

AAAI Spotlight
Hyperprior induced unsupervised disentanglement of latent representations

Abdul Fatir Ansari, and Harold Soh

In Proceedings of the AAAI Conference on Artificial Intelligence, 2019

Abstract arXiv Bib Supp Code

We address the problem of unsupervised disentanglement of latent representations learnt via deep generative models. In contrast to current approaches that operate on the evidence lower bound (ELBO), we argue that statistical independence in the latent space of VAEs can be enforced in a principled hierarchical Bayesian manner. To this effect, we augment the standard VAE with an inverse-Wishart (IW) prior on the covariance matrix of the latent code. By tuning the IW parameters, we are able to encourage (or discourage) independence in the learnt latent dimensions. Extensive experimental results on a range of datasets (2DShapes, 3DChairs, 3DFaces and CelebA) show our approach to outperform the β-VAE and is competitive with the state-of-the-art FactorVAE. Our approach achieves significantly better disentanglement and reconstruction on a new dataset (CorrelatedEllipses) which introduces correlations between the factors of variation.
@inproceedings{ansari2019hyperprior, title = {Hyperprior induced unsupervised disentanglement of latent representations}, author = {Ansari, Abdul Fatir and Soh, Harold}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {33}, pages = {3175--3182}, year = {2019}, pubtype = {Spotlight} }