Research Focus: Guided diffusion model, scalable MoE inference (2024)

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: Guided diffusion model, scalable MoE inference (1)

NEW RESEARCH

MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model

Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In a recent article: MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model, researchers from Microsoft present MG-TSD, a novel approach aimed at tackling this challenge.

The MG-TSD model employs multiple granularity levels within data to guide the learning process of diffusion models, yielding remarkable outcomes without the necessity of additional data. In the field of long-term forecasting, the researchers have established a new state-of-the-art methodology that demonstrates a notable relative improvement across six benchmarks, ranging from 4.7% to 35.8%.

The paper introducing this research: MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process(opens in new tab) (opens in new tab), was presented at ICLR 2024 (opens in new tab).

Read the article

Read the paper

NEW RESEARCH

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Machine learning applications based on large language models (LLMs) have been widely deployed in consumer products. Increasing the model size and its training dataset have played an important role in this process. Since larger model size can bring higher model accuracy, it is likely that future models will also grow in size, which vastly increases the computational and memory requirements of LLMs.

Mixture-of-Experts (MoE) architecture, which can increase model size without proportionally increasing computational requirements, was designed to address this challenge. Unfortunately, MoE’s high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE’s memory-hungry expert parameters to central processing unit (CPU) memory fall short.

In a recent paper: Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference, researchers from Microsoft address these challenges using algorithm-system co-design. Pre-gated MoE alleviates the dynamic nature of sparse expert activation, addressing the large memory footprint of MoEs while also sustaining high performance. The researchers demonstrate that pre-gated MoE improves performance, reduces graphics processing unit (GPU) memory consumption, and maintains model quality.

Read the paper

Microsoft Research Forum Episode 3

Dive into the importance of globally inclusive and equitable AI, updates on AutoGen and MatterGen, explore novel new use cases for AI, and more.

Watch on-demand

Opens in a new tab

NEW RESEARCH

What Matters in a Measure? A Perspective from Large-Scale Search Evaluation

Evaluation is a crucial aspect of information retrieval (IR) and has been thoroughly studied by academic and professional researchers for decades. Much of the research literature discusses techniques to produce a single number, reflecting the system’s performance: precision or cumulative gain, for example, or dozens of alternatives. Those techniques—metrics—are themselves evaluated, commonly by reference to sensitivity and validity.

To measure search in industry settings, many other aspects must be considered. For example, how much a metric costs; how robust it is to the happenstance of sampling; whether it is debuggable; and what is incentivized when a metric is taken as a goal. In a recent paper: What Matters in a Measure? A Perspective from Large-Scale Search Evaluation, researchers from Microsoft discuss what makes a search metric successful in large-scale settings, including factors which are not often canvassed in IR research, but which are important in “real-world” use. The researchers illustrate this discussion with examples from industrial settings and elsewhere and offer suggestions for metrics as part of a working system.

Read the paper

NEW RESEARCH

LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data

Partial differential equations (PDEs) are ubiquitous in mathematically-oriented scientific fields, such as physics and engineering. The ability to solve PDEs accurately and efficiently can empower deep understanding of the physical world. However, in many complex PDE systems, traditional solvers are too time-consuming. Recently, deep learning-based methods including neural operators have been successfully used to provide faster PDE solvers through approximating or enhancing conventional ones. However, this requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, also known as mean squared residual (MSR) loss constructed by the discretized PDE.

In a recent paper: LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data, researchers from Microsoft investigate the physical information in the MSR loss, or long-range entanglements. They identify the challenge: the neural network must model the long-range entanglements in the spatial domain of the PDE, whose patterns vary. To tackle the challenge, they propose LordNet, a tunable and efficient neural network for modeling various entanglements. Their tests show that Lordnet can be 40× faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.

Read the paper

NEW RESEARCH

FXAM: A unified and fast interpretable model for predictive analytics

Generalized additive model (GAM) is a standard for interpretability. However, due to the one-to-many and many-to-one phenomena which appear commonly in real-world scenarios, existing GAMs have limitations to serving predictive analytics in terms of both accuracy and training efficiency. In a recent paper: FXAM: A unified and fast interpretable model for predictive analytics, researchers from Microsoft propose FXAM (Fast and eXplainable Additive Model), a unified and fast interpretable model for predictive analytics. FXAM extends GAM’s modeling capability with a unified additive model for numerical, categorical, and temporal features. FXAM conducts a novel training procedure called three-stage iteration (TSI). TSI corresponds to learning over numerical, categorical, and temporal features respectively. Each stage learns a local optimum by fixing the parameters of other stages. The researchers design joint learning over categorical features and partial learning over temporal features to achieve high accuracy and training efficiency. They show that TSI is mathematically guaranteed to converge to the global optimum. They further propose a set of optimization techniques to speed up FXAM’s training algorithm to meet the needs of interactive analysis.

Read the paper

Microsoft Research in the news

Sriram Rajamani at Microsoft Research on AI and deep tech in India

Fobes India | June 28, 2024

Sriram K Rajamani, managing director of Microsoft Research India Lab, reflects on computer science and engineering research, including how AI and LLMs can help solve local needs. Rajamani also discusses the technical aspects of how modern AI models work, and best practices from the research lab that could apply to India’s deep tech ecosystem.

View more news and awards

Opens in a new tab

Research Focus: Guided diffusion model, scalable MoE inference (2024)

NEW RESEARCH

MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model

NEW RESEARCH

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Microsoft Research Forum Episode 3

NEW RESEARCH

What Matters in a Measure? A Perspective from Large-Scale Search Evaluation

NEW RESEARCH

LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data

NEW RESEARCH

FXAM: A unified and fast interpretable model for predictive analytics

Microsoft Research in the news

Sriram Rajamani at Microsoft Research on AI and deep tech in India

References