- J.-F. Ton, S. Flaxman, D. Sejdinovic, and S. Bhatt,
**Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features**,*ArXiv e-prints:1711.05615*, 2017.The use of covariance kernels is ubiquitous in the field of spatial statistics. Kernels allow data to be mapped into high-dimensional feature spaces and can thus extend simple linear additive methods to nonlinear methods with higher order interactions. However, until recently, there has been a strong reliance on a limited class of stationary kernels such as the Matern or squared exponential, limiting the expressiveness of these modelling approaches. Recent machine learning research has focused on spectral representations to model arbitrary stationary kernels and introduced more general representations that include classes of nonstationary kernels. In this paper, we exploit the connections between Fourier feature representations, Gaussian processes and neural networks to generalise previous approaches and develop a simple and efficient framework to learn arbitrarily complex nonstationary kernel functions directly from the data, while taking care to avoid overfitting using state-of-the-art methods from deep learning. We highlight the very broad array of kernel classes that could be created within this framework. We apply this to a time series dataset and a remote sensing problem involving land surface temperature in Eastern Africa. We show that without increasing the computational or storage complexity, nonstationary kernels can be used to improve generalisation performance and provide more interpretable results.

@unpublished{TonFlaSejBha2017, author = {Ton, J.-F. and Flaxman, S. and Sejdinovic, D. and Bhatt, S.}, title = {{Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features}}, journal = {ArXiv e-prints:1711.05615}, arxiv = {https://arxiv.org/abs/1711.05615}, year = {2017} }

- H. C. L. Law, D. J. Sutherland, D. Sejdinovic, and S. Flaxman,
**Bayesian Approaches to Distribution Regression**,*ArXiv e-prints:1705.04293*, 2017.Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.

@unpublished{LawSutSejFla2017, author = {Law, H.C.L. and Sutherland, D.J. and Sejdinovic, D. and Flaxman, S.}, title = {{Bayesian Approaches to Distribution Regression}}, journal = {ArXiv e-prints:1705.04293}, arxiv = {https://arxiv.org/abs/1705.04293}, year = {2017} }

- J. Runge, D. Sejdinovic, and S. Flaxman,
**Detecting causal associations in large nonlinear time series datasets**,*ArXiv e-prints:1702.07007*, 2017.Detecting causal associations in time series datasets is a key challenge for novel insights into complex dynamical systems such as the Earth system or the human brain. Interactions in high-dimensional dynamical systems often involve time-delays, nonlinearity, and strong autocorrelations. These present major challenges for causal discovery techniques such as Granger causality leading to low detection power, biases, and unreliable hypothesis tests. Here we introduce a reliable and fast method that outperforms current approaches in detection power and scales up to high-dimensional datasets. It overcomes detection biases, especially when strong autocorrelations are present, and allows ranking associations in large-scale analyses by their causal strength. We provide mathematical proofs, evaluate our method in extensive numerical experiments, and illustrate its capabilities in a large-scale analysis of the global surface-pressure system where we unravel spurious associations and find several potentially causal links that are difficult to detect with standard methods. The broadly applicable method promises to discover novel causal insights also in many other fields of science.

@unpublished{RunSejFla2017, author = {Runge, J. and Sejdinovic, D. and Flaxman, S.}, title = {{Detecting causal associations in large nonlinear time series datasets}}, journal = {ArXiv e-prints:1702.07007}, arxiv = {https://arxiv.org/abs/1702.07007}, code = {https://github.com/jakobrunge/tigramite}, year = {2017} }

- F.-X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic,
**Probabilistic Integration: A Role for Statisticians in Numerical Analysis?**,*ArXiv e-prints:1512.00933*, 2015.A research frontier has emerged in scientific computation, founded on the principle that numerical error entails epistemic uncertainty that ought to be subjected to statistical analysis. This viewpoint raises several interesting challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational pipeline. This paper examines the case for probabilistic numerical methods in statistical computation and a specific case study is presented for Markov chain and Quasi Monte Carlo methods. A probabilistic integrator is equipped with a full distribution over its output, providing a measure of epistemic uncertainty that is shown to be statistically valid at finite computational levels, as well as in asymptotic regimes. The approach is motivated by expensive integration problems, where, as in krigging, one is willing to expend cubic computational effort in order to gain uncertainty quantification. There, probabilistic integrators enjoy the "best of both worlds", leveraging the sampling efficiency of Monte Carlo methods whilst providing a principled route to assessment of the impact of numerical error on scientific conclusions. Several substantial applications are provided for illustration and critical evaluation, including examples from computer graphics and uncertainty quantification in oil reservoir modelling.

@unpublished{BriOatGirOsbSej2015, author = {Briol, F.-X. and Oates, C.J. and Girolami, M. and Osborne, M.A. and Sejdinovic, D.}, title = {{Probabilistic Integration: A Role for Statisticians in Numerical Analysis?}}, journal = {ArXiv e-prints:1512.00933}, arxiv = {http://arxiv.org/abs/1512.00933}, year = {2015} }

- H. Strathmann, D. Sejdinovic, and M. Girolami,
**Unbiased Bayes for Big Data: Paths of Partial Posteriors**,*ArXiv e-prints:1501.03326*, 2015.A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations – without explicit simulation from the full posterior. The scheme’s variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets.

@unpublished{StrSejGir2015, author = {Strathmann, H. and Sejdinovic, D. and Girolami, M.}, title = {{Unbiased Bayes for Big Data: Paths of Partial Posteriors}}, journal = {ArXiv e-prints:1501.03326}, arxiv = {http://arxiv.org/abs/1501.03326}, year = {2015} }

- S. Flaxman, Y. W. Teh, and D. Sejdinovic,
**Poisson Intensity Estimation with Reproducing Kernels**,*Electronic Journal of Statistics*, to appear, 2017.Despite the fundamental nature of the inhomogeneous Pois- son process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially when observed points lie in a high-dimensional space. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) for- mulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. Whereas RKHS models used in su- pervised learning rely on the so-called representer theorem, the form of the inhomogeneous Poisson process likelihood means that the representer theorem does not apply. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite- dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.

@article{FlaTehSej2017ejs, author = {Flaxman, S. and Teh, Y.W. and Sejdinovic, D.}, title = {{{Poisson Intensity Estimation with Reproducing Kernels}}}, journal = {Electronic Journal of Statistics}, year = {2017}, pages = {to appear} }

- H. C. L. Law, C. Yau, and D. Sejdinovic,
**Testing and Learning on Distributions with Symmetric Noise Invariance**, in*Advances in Neural Information Processing Systems (NIPS)*, 2017, 1343–1353.Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest – discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise. Such features lend themselves to a straightforward neural network implementation and can thus also be learned given a supervised signal.

@inproceedings{LawYauSej2017, author = {Law, H.C.L. and Yau, C. and Sejdinovic, D.}, title = {{{Testing and Learning on Distributions with Symmetric Noise Invariance}}}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, arxiv = {https://arxiv.org/abs/1703.07596}, code = {https://github.com/hcllaw/Fourier-Phase-Neural-Network}, year = {2017}, video = {https://www.youtube.com/watch?v=kKpbHXZ1qmU}, url = {https://papers.nips.cc/paper/6733-testing-and-learning-on-distributions-with-symmetric-noise-invariance}, pages = {1343--1353} }

- I. Schuster, H. Strathmann, B. Paige, and D. Sejdinovic,
**Kernel Sequential Monte Carlo**, in*European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)*, 2017.Bayesian posterior inference with Monte Carlo methods has a fundamental role in statistics and probabilistic machine learning. Target posterior distributions arising in increasingly complex models often exhibit high degrees of nonlinearity and multimodality and pose substantial challenges to traditional samplers. We propose the Kernel Sequential Monte Carlo (KSMC) framework for building emulator models of the current particle system in a Reproducing Kernel Hilbert Space and use the emulator’s geometry to inform local proposals. KSMC is applicable when gradients are unknown or prohibitively expensive and inherits the superior performance of SMC on multi-modal targets and its ability to estimate model evidence. Strengths of the proposed methodology are demonstrated on a series of challenging synthetic and real-world examples.

@inproceedings{SchStrPaiSej2017, author = {Schuster, I. and Strathmann, H. and Paige, B. and Sejdinovic, D.}, title = {{{Kernel Sequential Monte Carlo}}}, arxiv = {http://arxiv.org/abs/1510.03105}, booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)}, year = {2017} }

- Q. Zhang, S. Filippi, S. Flaxman, and D. Sejdinovic,
**Feature-to-Feature Regression for a Two-Step Conditional Independence Test**, in*Uncertainty in Artificial Intelligence (UAI)*, 2017.The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data.

@inproceedings{ZhaFilFlaSej2017, author = {Zhang, Q. and Filippi, S. and Flaxman, S. and Sejdinovic, D.}, title = {{{Feature-to-Feature Regression for a Two-Step Conditional Independence Test}}}, booktitle = {Uncertainty in Artificial Intelligence (UAI)}, url = {http://auai.org/uai2017/proceedings/papers/250.pdf}, supplements = {http://auai.org/uai2017/proceedings/supplements/250.pdf}, code = {https://github.com/oxmlcs/kerpy}, year = {2017} }

- J. Mitrovic, D. Sejdinovic, and Y. W. Teh,
**Deep Kernel Machines via the Kernel Reparametrization Trick**, in*International Conference on Learning Representations (ICLR) - Workshop Track*, 2017.While deep neural networks have achieved state-of-the-art performance on many tasks across varied domains, they still remain black boxes whose inner workings are hard to interpret and understand. In this paper, we develop a novel method for efficiently capturing the behaviour of deep neural networks using kernels. In particular, we construct a hierarchy of increasingly complex kernels that encode individual hidden layers of the network. Furthermore, we discuss how our framework motivates a novel supervised weight initialization method that discovers highly discriminative features already at initialization.

@inproceedings{MitSejTeh2017, author = {Mitrovic, J. and Sejdinovic, D. and Teh, Y.W.}, title = {{{Deep Kernel Machines via the Kernel Reparametrization Trick}}}, openreview = {https://openreview.net/forum?id=Bkiqt3Ntg¬eId=Bkiqt3Ntg}, booktitle = {International Conference on Learning Representations (ICLR) - Workshop Track}, year = {2017} }

- S. Flaxman, Y. W. Teh, and D. Sejdinovic,
**Poisson Intensity Estimation with Reproducing Kernels**, in*International Conference on Artificial Intelligence and Statistics (AISTATS)*, 2017, PMLR 54:270–279.Despite the fundamental nature of the inhomogeneous Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially in high dimensional settings. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. The modeling challenge is that the usual representer theorem arguments no longer apply due to the form of the inhomogeneous Poisson process likelihood. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.

@inproceedings{FlaTehSej2017, author = {Flaxman, S. and Teh, Y.W. and Sejdinovic, D.}, title = {{{Poisson Intensity Estimation with Reproducing Kernels}}}, booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)}, year = {2017}, arxiv = {https://arxiv.org/abs/1610.08623}, url = {http://proceedings.mlr.press/v54/flaxman17a.html}, code = {https://bitbucket.org/flaxter/kernel-poisson}, pages = {PMLR 54:270--279} }

- Q. Zhang, S. Filippi, A. Gretton, and D. Sejdinovic,
**Large-Scale Kernel Methods for Independence Testing**,*Statistics and Computing*, 1–18, 2017.Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.

@article{ZhaFilGreSej2017, author = {Zhang, Q. and Filippi, S. and Gretton, A. and Sejdinovic, D.}, title = {{{Large-Scale Kernel Methods for Independence Testing}}}, journal = {Statistics and Computing}, arxiv = {http://arxiv.org/abs/1606.07892}, year = {2017}, url = {http://link.springer.com/article/10.1007\%2Fs11222-016-9721-7}, doi = {10.1007/s11222-016-9721-7}, pages = {1-18}, code = {https://github.com/oxmlcs/kerpy} }

- D. Vukobratovic, D. Jakovetic, V. Skachek, D. Bajovic, D. Sejdinovic, G. Karabulut Kurt, C. Hollanti, and I. Fischer,
**CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT**,*IEEE Access*, vol. 4, 3360–3378, 2016.In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication infrastructures, requiring transfer of enormous data volumes. Aiming at addressing this problem, we propose a novel architecture dubbed Condense (reconfigurable knowledge acquisition systems), which integrates the IoT-communication infrastructure into data analysis. This is achieved via the generic concept of network function computation: Instead of merely transferring data from the IoT sources to the cloud, the communication infrastructure should actively participate in the data analysis by carefully designed en-route processing. We define the Condense architecture, its basic layers, and the interactions among its constituent modules. Further, from the implementation side, we describe how Condense can be integrated into the 3rd Generation Partnership Project (3GPP) Machine Type Communications (MTC) architecture, as well as the prospects of making it a practically viable technology in a short time frame, relying on Network Function Virtualization (NFV) and Software Defined Networking (SDN). Finally, from the theoretical side, we survey the relevant literature on computing “atomic” functions in both analog and digital domains, as well as on function decomposition over networks, highlighting challenges, insights, and future directions for exploiting these techniques within practical 3GPP MTC architecture.

@article{VukJaketal2016a, author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D. and Karabulut Kurt, G. and Hollanti, C. and Fischer, I.}, title = {{{CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT}}}, year = {2016}, journal = {{IEEE Access}}, volume = {4}, doi = {10.1109/ACCESS.2016.2585468}, url = {http://dx.doi.org/10.1109/ACCESS.2016.2585468}, pages = {3360--3378} }

- D. Vukobratovic, D. Jakovetic, V. Skachek, D. Bajovic, and D. Sejdinovic,
**Network Function Computation as a Service in Future 5G Machine Type Communications**, in*International Symposium on Turbo Codes & Iterative Information Processing (ISTC)*, 2016, 365–369.The 3GPP machine type communications (MTC) service is expected to contribute a dominant share of the IoT traffic via the upcoming fifth generation (5G) mobile cellular systems. MTC has ambition to connect billions of devices to communicate their data to MTC applications for further processing and data analysis. However, for majority of the applications, collecting all the MTC generated data is inefficient as the data is typically fed into application-dependent functions whose outputs determine the application actions. In this paper, we present a novel MTC architecture that, instead of collecting raw large-volume MTC data, offers the network function computation (NFC) as a service. For a given application demand (function to be computed), different modules (atomic nodes) of the communication infrastructure are orchestrated into a (reconfigurable) directed network topology, and each module is assigned an appropriately defined (reconfigurable) atomic function over the input data, such that the desired global network function is evaluated over the MTC data and a requested MTC-NFC service is delivered. We detail practical viability of incorporating MTC-NFC within the existing 3GPP architecture relying on emerging concepts of Network Function Virtualization and Software Defined Networking. Finally, throughout the paper, we point to the theoretical foundations that inspired the presented architecture highlighting challenges and future directions for designing 3GPP MTC-NFC service.

@inproceedings{VukJaketal2016b, author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D.}, title = {{{Network Function Computation as a Service in Future 5G Machine Type Communications}}}, booktitle = {International Symposium on Turbo Codes \& Iterative Information Processing (ISTC)}, year = {2016}, pages = {365-369}, doi = {10.1109/ISTC.2016.7593138}, url = {http://dx.doi.org/10.1109/ISTC.2016.7593138} }

- J. Mitrovic, D. Sejdinovic, and Y. W. Teh,
**DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression**, in*International Conference on Machine Learning (ICML)*, 2016, PMLR 48:1482–1491.Performing exact posterior inference in complex generative models is often difficult or impossible due to an expensive to evaluate or intractable likelihood function. Approximate Bayesian computation (ABC) is an inference framework that constructs an approximation to the true likelihood based on the similarity between the observed and simulated data as measured by a predefined set of summary statistics. Although the choice of appropriate problem-specific summary statistics crucially influences the quality of the likelihood approximation and hence also the quality of the posterior sample in ABC, there are only few principled general-purpose approaches to the selection or construction of such summary statistics. In this paper, we develop a novel framework for this task using kernel-based distribution regression. We model the functional relationship between data distributions and the optimal choice (with respect to a loss function) of summary statistics using kernel-based distribution regression. We show that our approach can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. In addition to that, our framework shows superior performance when compared to related methods on toy and real-world problems.

@inproceedings{MitSejTeh2016, author = {Mitrovic, J. and Sejdinovic, D. and Teh, Y.W.}, title = {{{DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression}}}, booktitle = {International Conference on Machine Learning (ICML)}, arxiv = {http://arxiv.org/abs/1602.04805}, url = {http://proceedings.mlr.press/v48/mitrovic16.html}, year = {2016}, pages = {PMLR 48:1482--1491} }

- G. Franchi, J. Angulo, and D. Sejdinovic,
**Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings**, in*IEEE International Conference on Image Processing (ICIP)*, 2016, 1898–1902.We propose a novel approach for pixel classification in hyperspectral images, leveraging on both the spatial and spectral information in the data. The introduced method relies on a recently proposed framework for learning on distributions – by representing them with mean elements in reproducing kernel Hilbert spaces (RKHS) and formulating a classification algorithm therein. In particular, we associate each pixel to an empirical distribution of its neighbouring pixels, a judicious representation of which in an RKHS, in conjunction with the spectral information contained in the pixel itself, give a new explicit set of features that can be fed into a suite of standard classification techniques – we opt for a well established framework of support vector machines (SVM). Furthermore, the computational complexity is reduced via random Fourier features formalism. We study the consistency and the convergence rates of the proposed method and the experiments demonstrate strong performance on hyperspectral data with gains in comparison to the state-of-the-art results.

@inproceedings{FraAngSej2016, author = {Franchi, G. and Angulo, J. and Sejdinovic, D.}, title = {{{Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings}}}, booktitle = {IEEE International Conference on Image Processing (ICIP)}, year = {2016}, arxiv = {http://arxiv.org/abs/1605.09136}, url = {http://dx.doi.org/10.1109/ICIP.2016.7532688}, doi = {10.1109/ICIP.2016.7532688}, pages = {1898-1902} }

- B. Paige, D. Sejdinovic, and F. Wood,
**Super-Sampling with a Reservoir**, in*Uncertainty in Artificial Intelligence (UAI)*, 2016, 567–576.We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fixed-size subsample from streaming data in a single pass. Rather than draw a random sample, our approach performs an online optimization which aims to select the subset which provides the best overall approximation to the full data set, as judged using a kernel two-sample test. This produces subsets which minimize the worst-case relative error when computing expectations of functions in a specified function class, using just the samples from the subset. Kernel functions are approximated using random Fourier features, and the subset of samples itself is stored in a random projection tree, allowing for an algorithm which runs in a single pass through the whole data set, with only a logarithmic time complexity in the size of the subset at each iteration. These “super-samples” subsampled from the full data provide a concise summary, as demonstrated empirically on mixture models and the MNIST dataset.

@inproceedings{PaiSejWoo2016, author = {Paige, B. and Sejdinovic, D. and Wood, F.}, title = {{{Super-Sampling with a Reservoir}}}, booktitle = {Uncertainty in Artificial Intelligence (UAI)}, year = {2016}, pages = {567--576}, url = {http://www.auai.org/uai2016/proceedings/papers/293.pdf} }

- S. Flaxman, D. Sejdinovic, J. P. Cunningham, and S. Filippi,
**Bayesian Learning of Kernel Embeddings**, in*Uncertainty in Artificial Intelligence (UAI)*, 2016, 182–191.Kernel methods are one of the mainstays of machine learning, but the problem of kernel learning remains challenging, with only a few heuristics and very little theory. This is of particular importance in methods based on estimation of kernel mean embeddings of probability measures. For characteristic kernels, which include most commonly used ones, the kernel mean embedding uniquely determines its probability measure, so it can be used to design a powerful statistical testing framework, which includes nonparametric two-sample and independence tests. In practice, however, the performance of these tests can be very sensitive to the choice of kernel and its lengthscale parameters. To address this central issue, we propose a new probabilistic model for kernel mean embeddings, the Bayesian Kernel Embedding model, combining a Gaussian process prior over the Reproducing Kernel Hilbert Space containing the mean embedding with a conjugate likelihood function, thus yielding a closed form posterior over the mean embedding. The posterior mean of our model is closely related to recently proposed shrinkage estimators for kernel mean embeddings, while the posterior uncertainty is a new, interesting feature with various possible applications. Critically for the purposes of kernel learning, our model gives a simple, closed form marginal pseudolikelihood of the observed data given the kernel hyperparameters. This marginal pseudolikelihood can either be optimized to inform the hyperparameter choice or fully Bayesian inference can be used.

@inproceedings{FlaSejCunFil2016, author = {Flaxman, S. and Sejdinovic, D. and Cunningham, J.P. and Filippi, S.}, title = {{{Bayesian Learning of Kernel Embeddings}}}, booktitle = {Uncertainty in Artificial Intelligence (UAI)}, arxiv = {http://arxiv.org/abs/1603.02160}, year = {2016}, pages = {182--191}, url = {http://www.auai.org/uai2016/proceedings/papers/145.pdf}, supplements = {http://www.auai.org/uai2016/proceedings/supp/145_supp.pdf} }

- M. Park, W. Jitkrittum, and D. Sejdinovic,
**K2-ABC: Approximate Bayesian Computation with Kernel Embeddings**, in*International Conference on Artificial Intelligence and Statistics (AISTATS)*, 2016, PMLR 51:398–407.Complicated generative models often result in a situation where computing the likelihood of observed data is intractable, while simulating from the conditional density given a parameter value is relatively easy. Approximate Bayesian Computation (ABC) is a paradigm that enables simulation-based posterior inference in such cases by measuring the similarity between simulated and observed data in terms of a chosen set of summary statistics. However, there is no general rule to construct sufficient summary statistics for complex models. Insufficient summary statistics will “leak” information, which leads to ABC algorithms yielding samples from an incorrect (partial) posterior. In this paper, we propose a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics. Our approach, K2-ABC, uses maximum mean discrepancy (MMD) to construct a dissimilarity measure between the observed and simulated data. The embedding of an empirical distribution of the data into a reproducing kernel Hilbert space plays a role of the summary statistic and is sufficient whenever the corresponding kernels are characteristic. Experiments on a simulated scenario and a real-world biological problem illustrate the effectiveness of the proposed algorithm.

@inproceedings{ParJitSej2016, author = {Park, M. and Jitkrittum, W. and Sejdinovic, D.}, title = {{{K2-ABC: Approximate Bayesian Computation with Kernel Embeddings}}}, booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)}, arxiv = {http://arxiv.org/abs/1502.02558}, url = {http://proceedings.mlr.press/v51/park16.html}, year = {2016}, pages = {PMLR 51:398--407}, code = {https://github.com/wittawatj/k2abc} }

- H. Strathmann, D. Sejdinovic, S. Livingstone, Z. Szabo, and A. Gretton,
**Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families**, in*Advances in Neural Information Processing Systems (NIPS)*, vol. 28, 2015, 955–963.We propose Kamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities where HMC is unavailable due to intractable gradients, KMC adaptively learns the target’s gradient structure by fitting an exponential family model in a Reproducing Kernel Hilbert Space. Computational costs are reduced by two novel efficient approximations to this gradient. While being asymptotically exact, KMC mimics HMC in terms of sampling efficiency and offers substantial mixing improvements to state-of-the-art gradient free-samplers. We support our claims with experimental studies on both toy and real-world applications, including Approximate Bayesian Computation and exact-approximate MCMC.

@incollection{StrSejLivSzaGre2015, author = {Strathmann, H. and Sejdinovic, D. and Livingstone, S. and Szabo, Z. and Gretton, A.}, title = {{{Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families}}}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, volume = {28}, year = {2015}, pages = {955--963}, url = {http://papers.nips.cc/paper/5890-gradient-free-hamiltonian-monte-carlo-with-efficient-kernel-exponential-families}, arxiv = {http://arxiv.org/abs/1506.02564} }

- K. Chwialkowski, A. Ramdas, D. Sejdinovic, and A. Gretton,
**Fast Two-Sample Testing with Analytic Representations of Probability Measures**, in*Advances in Neural Information Processing Systems (NIPS)*, vol. 28, 2015, 1981–1989.We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Analyticity implies that differences in the distributions may be detected almost surely at a finite number of randomly chosen locations/frequencies. The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate that our tests give a better power/time tradeoff than competing approaches, and in some cases, better outright power than even the most expensive quadratic-time tests. This performance advantage is retained even in high dimensions, and in cases where the difference in distributions is not observable with low order statistics.

@incollection{ChwRamSejGre2015, author = {Chwialkowski, K. and Ramdas, A. and Sejdinovic, D. and Gretton, A.}, title = {{{Fast Two-Sample Testing with Analytic Representations of Probability Measures}}}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, volume = {28}, year = {2015}, pages = {1981--1989}, url = {http://papers.nips.cc/paper/5685-fast-two-sample-testing-with-analytic-representations-of-probability-measures}, arxiv = {http://arxiv.org/abs/1506.04725} }

- D. Vukobratovic, D. Sejdinovic, and A. Pizurica,
**Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective**, in*IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)*, 2015.Compressed Sensing (CS) methods using sparse binary measurement matrices and iterative message-passing recovery procedures have been recently investigated due to their low computational complexity and excellent performance. Drawing much of inspiration from sparse-graph codes such as Low-Density Parity-Check (LDPC) codes, these studies use analytical tools from modern coding theory to analyze CS solutions. In this paper, we consider and systematically analyze the CS setup inspired by a class of efficient, popular and flexible sparse-graph codes called rateless codes. The proposed rateless CS setup is asymptotically analyzed using tools such as Density Evolution and EXIT charts and fine-tuned using degree distribution optimization techniques.

@inproceedings{VukSejPiz2015, author = {Vukobratovic, Dejan and Sejdinovic, Dino and Pizurica, Aleksandra}, title = {Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective}, booktitle = {IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)}, year = {2015}, doi = {10.1109/SPAWC.2015.7227005}, url = {http://dx.doi.org/10.1109/SPAWC.2015.7227005} }

- Z. Kurth-Nelson, G. Barnes, D. Sejdinovic, R. Dolan, and P. Dayan,
**Temporal structure in associative retrieval**,*eLife*, vol. 4, no. e04919, 2015.Electrophysiological data disclose rich dynamics in patterns of neural activity evoked by sensory objects. Retrieving objects from memory reinstates components of this activity. In humans, the temporal structure of this retrieved activity remains largely unexplored, and here we address this gap using the spatiotemporal precision of magnetoencephalography (MEG). In a sensory preconditioning paradigm, ’indirect’ objects were paired with ’direct’ objects to form associative links, and the latter were then paired with rewards. Using multivariate analysis methods we examined the short-time evolution of neural representations of indirect objects retrieved during reward-learning about direct objects. We found two components of the evoked representation of the indirect stimulus, 200 ms apart. The strength of retrieval of one, but not the other, representational component correlated with generalization of reward learning from direct to indirect stimuli. We suggest the temporal structure within retrieved neural representations may be key to their function.

@article{KNeBarSejDolDay2015, author = {Kurth-Nelson, Zeb and Barnes, Gareth and Sejdinovic, Dino and Dolan, Ray and Dayan, Peter}, title = {Temporal structure in associative retrieval}, volume = {4}, number = {e04919}, year = {2015}, doi = {10.7554/eLife.04919}, publisher = {eLife Sciences Publications Limited}, journal = {{eL}ife}, url = {http://dx.doi.org/10.7554/eLife.04919} }

- W. Jitkrittum, A. Gretton, N. Heess, S. M. A. Eslami, B. Lakshminarayanan, D. Sejdinovic, and Z. Szabó,
**Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages**, in*Uncertainty in Artificial Intelligence (UAI)*, 2015.We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where the ability to accurately assess uncertainty and to efficiently and robustly update the message operator are essential.

@inproceedings{JitGreHeeEslLakSejSza2015, author = {Jitkrittum, Wittawat and Gretton, Arthur and Heess, Nicolas and Eslami, S. M. Ali and Lakshminarayanan, Balaji and Sejdinovic, Dino and Szab\'{o}, Zolt\'{a}n}, booktitle = {Uncertainty in Artificial Intelligence (UAI)}, title = {{Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages}}, arxiv = {http://arxiv.org/abs/1503.02551}, year = {2015}, code = {https://github.com/wittawatj/kernel-ep}, url = {http://auai.org/uai2015/proceedings/papers/235.pdf}, supplements = {http://auai.org/uai2015/proceedings/supp/239_supp.pdf} }

- K. Chwialkowski, D. Sejdinovic, and A. Gretton,
**A Wild Bootstrap for Degenerate Kernel Tests**, in*Advances in Neural Information Processing Systems (NIPS)*, vol. 27, 2014, 3608–3616.A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. To illustrate this approach, we construct a two-sample test, an instantaneous independence test and a multiple lag independence test for time series. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler.

@incollection{ChwSejGre2014, title = {A Wild Bootstrap for Degenerate Kernel Tests}, author = {Chwialkowski, Kacper and Sejdinovic, Dino and Gretton, Arthur}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, volume = {27}, pages = {3608--3616}, year = {2014}, url = {http://papers.nips.cc/paper/5452-a-wild-bootstrap-for-degenerate-kernel-tests.pdf}, code = {https://github.com/kacperChwialkowski/wildBootstrap}, video = {http://research.microsoft.com/apps/video/?id=240378} }

- D. Sejdinovic, H. Strathmann, M. G. Lomeli, C. Andrieu, and A. Gretton,
**Kernel Adaptive Metropolis-Hastings**, in*International Conference on Machine Learning (ICML)*, 2014, 1665–1673.A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.

@inproceedings{SejStrGarAndGre14, title = {{Kernel Adaptive Metropolis-Hastings}}, author = {Sejdinovic, D. and Strathmann, H. and Lomeli, M.G. and Andrieu, C. and Gretton, A.}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2014}, pages = {1665--1673}, code = {https://github.com/karlnapf/kameleon-mcmc}, arxiv = {http://arxiv.org/abs/1307.5302}, url = {http://jmlr.org/proceedings/papers/v32/sejdinovic14.pdf}, supplements = {http://jmlr.org/proceedings/papers/v32/sejdinovic14-supp.zip} }

- O. Johnson, D. Sejdinovic, J. Cruise, R. Piechocki, and A. Ganesh,
**Non-Parametric Change-Point Estimation using String Matching Algorithms**,*Methodology and Computing in Applied Probability*, vol. 16, no. 4, 987–1008, 2014.Given the output of a data source taking values in a finite alphabet, we wish to estimate change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately estimate the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, establishing a fluid limit and using martingale arguments.

@article{JohSejCruPieGan2014, title = {{Non-Parametric Change-Point Estimation using String Matching Algorithms}}, year = {2014}, issn = {1387-5841}, journal = {Methodology and Computing in Applied Probability}, volume = {16}, number = {4}, doi = {10.1007/s11009-013-9359-2}, url = {http://dx.doi.org/10.1007/s11009-013-9359-2}, publisher = {Springer US}, author = {Johnson, Oliver and Sejdinovic, Dino and Cruise, James and Piechocki, Robert and Ganesh, Ayalvadi}, pages = {987-1008} }

- D. Sejdinovic, A. Gretton, and W. Bergsma,
**A Kernel Test for Three-Variable Interactions**, in*Advances in Neural Information Processing Systems (NIPS)*, vol. 26, 2013, 1124–1132.We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.

@incollection{SejGreBer2013, title = {A Kernel Test for Three-Variable Interactions}, author = {Sejdinovic, Dino and Gretton, Arthur and Bergsma, Wicher}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, volume = {26}, pages = {1124--1132}, year = {2013}, url = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions.pdf}, supplements = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions-supplemental.zip}, code = {http://www.gatsby.ucl.ac.uk/%7Egretton/interact/threeWayInteract.htm}, arxiv = {http://arxiv.org/abs/1306.2281}, video = {http://research.microsoft.com/apps/video/default.aspx?id=206943} }

- D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu,
**Equivalence of distance-based and RKHS-based statistics in hypothesis testing**,*Annals of Statistics*, vol. 41, no. 5, 2263–2291, Oct. 2013.We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

@article{SejSriGreFuk2013, author = {Sejdinovic, Dino and Sriperumbudur, Bharath and Gretton, Arthur and Fukumizu, Kenji}, doi = {10.1214/13-AOS1140}, journal = {Annals of Statistics}, month = oct, volume = {41}, number = {5}, pages = {2263--2291}, title = {{Equivalence of distance-based and RKHS-based statistics in hypothesis testing}}, url = {http://dx.doi.org/10.1214/13-AOS1140}, year = {2013}, arxiv = {http://arxiv.org/abs/1207.6076} }

- A. Gretton, B. K. Sriperumbudur, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, and K. Fukumizu,
**Optimal Kernel Choice for Large-Scale Two-Sample Tests**, in*Advances in Neural Information Processing Systems (NIPS)*, vol. 25, 2012, 1205–1213.Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p=q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is thus critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

@incollection{GreSriSejStrBalPonFuk2012, title = {Optimal Kernel Choice for Large-Scale Two-Sample Tests}, author = {Gretton, Arthur and Sriperumbudur, Bharath K. and Sejdinovic, Dino and Strathmann, Heiko and Balakrishnan, Sivaraman and Pontil, Massimiliano and Fukumizu, Kenji}, booktitle = {Advances in Neural Information Processing Systems (NIPS)}, volume = {25}, pages = {1205--1213}, year = {2012}, url = {http://papers.nips.cc/paper/4727-optimal-kernel-choice-for-large-scale-two-sample-tests.pdf}, code = {http://www.gatsby.ucl.ac.uk/%7Egretton/adaptMMD/adaptMMD.htm} }

- D. Sejdinovic, A. Gretton, B. K. Sriperumbudur, and K. Fukumizu,
**Hypothesis testing using pairwise distances and associated kernels**, in*International Conference on Machine Learning (ICML)*, 2012, 1111–1118.We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

@inproceedings{SejGreSriFuk12, title = {{Hypothesis testing using pairwise distances and associated kernels}}, author = {Sejdinovic, D. and Gretton, Arthur and Sriperumbudur, Bharath K. and Fukumizu, Kenji}, booktitle = {International Conference on Machine Learning (ICML)}, year = {2012}, pages = {1111--1118}, url = {http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sejdinovic_575.pdf}, arxiv = {http://arxiv.org/abs/1205.0411}, video = {http://techtalks.tv/talks/57320/} }

- R. Piechocki and D. Sejdinovic,
**Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks**, in*IEEE International Conference on Communications (ICC)*, 2012.In this paper we introduce a novel modulation and multiplexing method which facilitates highly efficient and simultaneous communication between multiple terminals in wireless ad-hoc networks. We term this method Combinatorial Channel Signature Modulation (CCSM). The CCSM method is particularly efficient in situations where communicating nodes operate in highly time dispersive environments. This is all achieved with a minimal MAC layer overhead, since all users are allowed to transmit and receive at the same time/frequency (full simultaneous duplex). The CCSM method has its roots in sparse modelling and the receiver is based on compressive sampling techniques. Towards this end, we develop a new low complexity algorithm termed Group Subspace Pursuit. Our analysis suggests that CCSM at least doubles the throughput when compared to the state-of-the art.

@inproceedings{PieSej2012, title = {{Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks}}, author = {Piechocki, R. and Sejdinovic, D.}, booktitle = {IEEE International Conference on Communications (ICC)}, year = {2012}, doi = {10.1109/ICC.2012.6363956}, url = {http://dx.doi.org/10.1109/ICC.2012.6363956}, arxiv = {http://arxiv.org/abs/1201.5608}, file = {pdf/2012ICC.pdf} }

- A. Muller, D. Sejdinovic, and R. Piechocki,
**Approximate Message Passing under Finite Alphabet Constraints**, in*IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)*, 2012.In this paper we consider Basis Pursuit De-Noising (BPDN) problems in which the sparse original signal is drawn from a finite alphabet. To solve this problem we propose an iterative message passing algorithm, which capitalises not only on the sparsity but by means of a prior distribution also on the discrete nature of the original signal. In our numerical experiments we test this algorithm in combination with a Rademacher measurement matrix and a measurement matrix derived from the random demodulator, which enables compressive sampling of analogue signals. Our results show in both cases significant performance gains over a linear programming based approach to the considered BPDN problem. We also compare the proposed algorithm to a similar message passing based algorithm without prior knowledge and observe an even larger performance improvement.

@inproceedings{MulSejPie2012, title = {{Approximate Message Passing under Finite Alphabet Constraints}}, author = {Muller, A. and Sejdinovic, D. and Piechocki, R.}, booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)}, year = {2012}, doi = {10.1109/ICASSP.2012.6288590}, url = {http://dx.doi.org/10.1109/ICASSP.2012.6288590}, arxiv = {http://arxiv.org/abs/1201.4949}, file = {pdf/2012ICASSP.pdf} }

- W. Dai, D. Sejdinovic, and O. Milenkovic,
**Gaussian Dynamic Compressive Sensing**, in*International Conference on Sampling Theory and Applications (SampTA)*, 2011.We consider the problem of estimating a discrete-time sequence of sparse signals with Gaussian innovations. Instances of such problems arise in networking and imaging, in particular, dynamic and interventional MRI imaging. Our approach combines Kalman filtering and compressive sensing (CS) techniques by introducing a sparse MAP estimator for Gaussian signals, and then developing a CS-type algorithm for solving the sparse MAP problem. Despite the underlying assumption that the sequence of sparse signals is Gaussian, our approach also allows for efficient tracking of sparse non-Gaussian signals obtained via non-linear mappings, using only one sample/observation per time instance.

@inproceedings{DaiSejMil2011, title = {{Gaussian Dynamic Compressive Sensing}}, author = {Dai, W. and Sejdinovic, D. and Milenkovic, O.}, booktitle = {International Conference on Sampling Theory and Applications (SampTA)}, year = {2011}, url = {http://sampta2011.ntu.edu.sg/SampTA2011Proceedings/papers/Mo5S02.2-P0186.pdf} }

- D. Sejdinovic and O. Johnson,
**Note on noisy group testing: asymptotic bounds and belief propagation reconstruction**, in*48th Annual Allerton Conference on Communication, Control, and Computing*, 2010, 998–1003.An information theoretic perspective on group testing problems has recently been proposed by Atia and Saligrama, in order to characterise the optimal number of tests. Their results hold in the noiseless case, where only false positives occur, and where only false negatives occur. We extend their results to a model containing both false positives and false negatives, developing simple information theoretic bounds on the number of tests required. Based on these bounds, we obtain an improved order of convergence in the case of false negatives only. Since these results are based on (computationally infeasible) joint typicality decoding, we propose a belief propagation algorithm for the detection of defective items and compare its actual performance to the theoretical bounds.

@inproceedings{SejJoh2010, title = {Note on noisy group testing: asymptotic bounds and belief propagation reconstruction}, author = {Sejdinovic, D. and Johnson, O.}, booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing}, pages = {998--1003}, year = {2010}, doi = {10.1109/ALLERTON.2010.5707018}, url = {http://dx.doi.org/10.1109/ALLERTON.2010.5707018}, arxiv = {http://arxiv.org/abs/1010.2441} }

- D. Sejdinovic, R. Piechocki, A. Doufexi, and M. Ismail,
**Decentralised distributed fountain coding: asymptotic analysis and design**,*IEEE Communications Letters*, vol. 14, no. 1, 42–44, 2010.A class of generic decentralised distributed fountain coding schemes is introduced and the tools of analysis of the performance of such schemes are presented. It is demonstrated that the developed approach can be used to formulate a robust code design methodology in a number of instances. We show that two non-standard applications of fountain codes, fountain codes for distributed source coding and fountain codes for unequal error protection lie within this decentralised distributed fountain coding framework.

@article{SejPieDouIsm2010, title = {Decentralised distributed fountain coding: asymptotic analysis and design}, author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.}, journal = {IEEE Communications Letters}, volume = {14}, number = {1}, pages = {42--44}, year = {2010}, file = {pdf/2010CommLetter.pdf}, doi = {10.1109/LCOMM.2010.01.091541}, url = {http://dx.doi.org/10.1109/LCOMM.2010.01.091541} }

- D. Sejdinovic, C. Andrieu, and R. Piechocki,
**Bayesian sequential compressed sensing in sparse dynamical systems**, in*48th Annual Allerton Conference on Communication, Control, and Computing*, 2010, 1730–1736.While the theory of compressed sensing provides means to reliably and efficiently acquire a sparse high-dimensional signal from a small number of its linear projections, sensing of dynamically changing sparse signals is still not well understood. We pursue a Bayesian approach to the problem of sequential compressed sensing and develop methods to recursively estimate the full posterior distribution of the signal.

@inproceedings{SejAndPie2010, title = {Bayesian sequential compressed sensing in sparse dynamical systems}, author = {Sejdinovic, D. and Andrieu, C. and Piechocki, R.}, booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing}, pages = {1730--1736}, year = {2010}, doi = {10.1109/ALLERTON.2010.5707125}, url = {http://dx.doi.org/10.1109/ALLERTON.2010.5707125} }

- D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. Piechocki,
**Expanding window fountain codes for unequal error protection**,*IEEE Transactions on Communications*, vol. 57, no. 9, 2510–2516, 2009.A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.

@article{SejVukDouSenPie2009, title = {Expanding window fountain codes for unequal error protection}, author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.}, journal = {IEEE Transactions on Communications}, volume = {57}, number = {9}, pages = {2510--2516}, year = {2009}, doi = {10.1109/TCOMM.2009.09.070616}, url = {http://dx.doi.org/10.1109/TCOMM.2009.09.070616}, file = {pdf/2009TransComm.pdf} }

- D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong,
**Scalable video multicast using expanding window fountain codes**,*IEEE Transactions on Multimedia*, vol. 11, no. 6, 1094–1104, 2009.Fountain codes were introduced as an efficient and universal forward error correction (FEC) solution for data multicast over lossy packet networks. They have recently been proposed for large scale multimedia content delivery in practical multimedia distribution systems. However, standard fountain codes, such as LT or Raptor codes, are not designed to meet unequal error protection (UEP) requirements typical in real-time scalable video multicast applications. In this paper, we propose recently introduced UEP expanding window fountain (EWF) codes as a flexible and efficient solution for real-time scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be tuned at the server side to meet the different reception criteria of heterogeneous receivers. The performance analysis using both analytical results and simulation experiments of H.264 scalable video coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.

@article{VukStaSejStaXio2009, title = {Scalable video multicast using expanding window fountain codes}, author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.}, journal = {IEEE Transactions on Multimedia}, volume = {11}, number = {6}, pages = {1094--1104}, year = {2009}, doi = {10.1109/TMM.2009.2026087}, url = {http://dx.doi.org/10.1109/TMM.2009.2026087}, file = {pdf/2009TransMultimedia.pdf} }

- D. Sejdinovic, R. Piechocki, A. Doufexi, and M. Ismail,
**Fountain code design for data multicast with side information**,*IEEE Transactions on Wireless Communications*, vol. 8, no. 10, 5155–5165, 2009.Fountain codes are a robust solution for data multicasting to a large number of receivers which experience variable channel conditions and different packet loss rates. However, the standard fountain code design becomes inefficient if all receivers have access to some side information correlated with the source information. We focus our attention on the cases where the correlation of the source and side information can be modelled by a binary erasure channel (BEC) or by a binary input additive white Gaussian noise channel (BIAWGNC). We analyse the performance of fountain codes in data multicasting with side information for these cases, derive bounds on their performance and provide a fast and robust linear programming optimization framework for code parameters. We demonstrate that systematic Raptor code design can be employed as a possible solution to the problem at the cost of higher encoding/decoding complexity, as it reduces the side information scenario to a channel coding problem. However, our results also indicate that a simpler solution, non-systematic LT and Raptor codes, can be designed to perform close to the information theoretic bounds.

@article{SejPieDouIsm2009, title = {Fountain code design for data multicast with side information}, author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.}, journal = {IEEE Transactions on Wireless Communications}, volume = {8}, number = {10}, pages = {5155--5165}, year = {2009}, publisher = {IEEE}, doi = {10.1109/TWC.2009.081076}, url = {http://dx.doi.org/10.1109/TWC.2009.081076}, file = {pdf/2009TransWirelessComm.pdf} }

- D. Sejdinovic, R. J. Piechocki, and A. Doufexi,
**AND-OR tree analysis of distributed LT codes**, in*IEEE Information Theory Workshop (ITW)*, 2009, 261–265.In this contribution, we consider design of distributed LT codes, i.e., independent rateless encodings of multiple sources which communicate to a common relay, where relay is able to combine incoming packets from the sources and forwards them to receivers. We provide density evolution formulae for distributed LT codes, which allow us to formulate distributed LT code design problem and prove the equivalence of performance of distributed LT codes and LT codes with related parameters in the asymptotic regime. Furthermore, we demonstrate that allowing LT coding apparatus at both the sources and the relay may prove advantageous to coding only at the sources and coding only at the relay.

@inproceedings{SejPieDou2009ITW, title = {{AND-OR tree analysis of distributed LT codes}}, author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.}, booktitle = {IEEE Information Theory Workshop (ITW)}, pages = {261--265}, year = {2009}, doi = {10.1109/ITWNIT.2009.5158583}, url = {http://dx.doi.org/10.1109/ITWNIT.2009.5158583}, file = {pdf/2009ITW.pdf} }

- D. Vukobratovic, V. Stankovic, L. Stankovic, and D. Sejdinovic,
**Precoded EWF codes for unequal error protection of scalable video**, in*International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)*, 2009.Rateless codes are forward error correcting (FEC) codes of linear encoding-decoding complexity and asymptotically capacity-approaching performance over erasure channels with any erasure statistics. They have been recently recognized as a simple and efficient solution for packetized video transmission over networks with packet erasures. However, to adapt the error correcting capabilities of rateless codes to the unequal importance of scalable video, unequal error protection (UEP) rateless codes are proposed as an alternative to standard rateless codes. In this paper, we extend our recent work on UEP rateless codes called Expanding Window Fountain (EWF) codes in order to improve their UEP performance. We investigate the design of precoded EWF codes, where precoding is done using high-rate Low-Density Parity-Check (LDPC) codes, following the similar reasoning applied in the design of Raptor codes. The obtained results are presented in the context of UEP error correcting performance of EWF codes and applied on scalable video coded (SVC) transmission over erasure networks.

@inproceedings{VukStaStaSej2009, title = {{Precoded EWF codes for unequal error protection of scalable video}}, author = {Vukobratovic, D. and Stankovic, V. and Stankovic, L. and Sejdinovic, D.}, booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)}, year = {2009}, url = {http://portal.acm.org/citation.cfm?id=1653559}, file = {pdf/2009MobimediaB.pdf} }

- D. Sejdinovic, R. J. Piechocki, and A. Doufexi,
**Rateless distributed source code design**, in*International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)*, 2009.Over the past decade, rateless codes, i.e., digital fountain codes, have emerged as an efficient and robust solution for reliable data transmission over packet erasure networks and a particularly suitable one for multicasting and broadcasting applications where users may experience variable channel conditions and packet loss rates, such as mobile environments. Luby Transform (LT) and Raptor codes are practical fountain codes with a capacity approaching performance and a low computational cost. In addition to their channel coding applications, the use of fountain codes for various kinds of distributed source compression and distributed joint-source channel coding has been extensively studied lately, and with promising results. However, a systematic treatise of the code design and optimization considerations for such non-standard applications of fountain codes is still absent. In this contribution, we overview the main results concerned with rateless codes for distributed source coding and outline several examples of data dissemination protocols where carefully designed fountain codes can provide strikingly simple, yet robust solutions yielding both distributed source coding and channel coding gains.

@inproceedings{SejPieDou2009, title = {Rateless distributed source code design}, author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.}, booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)}, year = {2009}, url = {http://portal.acm.org/citation.cfm?id=1653578}, file = {pdf/2009MobimediaA.pdf} }

- D. Sejdinovic,
**Topics in Fountain Coding**, PhD thesis, University of Bristol, 2009.The invention of the sparse graph codes, error correction codes with low complexity and rates close to capacity, has had an unrivaled impact on digital communication systems. A recent advance in the sparse graph codes, fountain coding, due to its natural rate adaptivity, is becoming an error correction coding scheme of choice for many multicasting and broadcasting systems. This thesis studies the use of fountain codes for several non-standard coding problems commonly occuring in communications. Generic decentralised distributed fountain coding schemes for networked communications are developed, discussed and analysed, where many non-cooperating source nodes communicate possibly correlated data to a large number of receivers. Several results concerning the generalised asymptotic analysis of the fountain decoder in this decentralised and distributed coding setting are presented. The problem of fountain codes with unequal error protection property is explored, where a novel class of fountain codes, Expanding Window Fountain (EWF) codes, is proposed, analysed and shown to offer competitive performance applicable to scalable video multicasting. Further, asymptotic analysis, code design and optimisation are derived for both symmetric and asymmetric Slepian-Wolf coding with fountain codes. It is shown how one can obtain both channel coding and distributed source coding gains with the same fountain coding scheme, by a judicious choice of the code parameters. The developed methods of asymptotic analysis are extended to the problem of independent fountain encodings at multiple source nodes which communicate to a common relay. It is shown that the re-encoding of the multiple fountain encoded bitstreams at the relay node with another fountain code may reduce the number of required transmissions, and the overall code optimisation methods of such schemes are derived. Finally, dual fountain codes are introduced and equipped with a low complexity quantisation algorithm for a lossy source coding problem dual to binary erasure channel coding.

@phdthesis{Sej2009, title = {Topics in Fountain Coding}, author = {Sejdinovic, D.}, year = {2009}, school = {University of Bristol}, file = {pdf/PhD_TopicsInFountainCoding.pdf} }

- D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong,
**Expanding window fountain codes for scalable video multicast**, in*IEEE International Conference on Multimedia and Expo (ICME)*, 2008, 77–80.Digital Fountain (DF) codes have recently been suggested as an efficient forward error correction (FEC) solution for video multicast to heterogeneous receiver classes over lossy packet networks. However, to adapt DF codes to low-delay constraints and varying importance of scalable multimedia content, unequal error protection (UEP) DF schemes are needed. Thus, in this paper, Expanding Window Fountain (EWF) codes are proposed as a FEC solution for scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be ldquotunedrdquo at the server side to meet the different reception conditions of heterogeneous receivers. Performance analysis of H.264 Scalable Video Coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.

@inproceedings{VukStaSejStaXio2008, title = {Expanding window fountain codes for scalable video multicast}, author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.}, booktitle = {IEEE International Conference on Multimedia and Expo (ICME)}, pages = {77--80}, year = {2008}, doi = {10.1109/ICME.2008.4607375}, url = {http://dx.doi.org/10.1109/ICME.2008.4607375} }

- D. Sejdinovic, R. J. Piechocki, A. Doufexi, and M. Ismail,
**Fountain coding with decoder side information**, in*IEEE International Conference on Communications (ICC)*, 2008, 4477–4482.In this contribution, we consider the application of digital fountain (DF) codes to the problem of data transmission when side information is available at the decoder. The side information is modelled as a "virtual" channel output when original information sequence is the input. For two cases of the system model, which model both the virtual and the actual transmission channel either as a binary erasure channel or as a binary input additive white Gaussian noise (BIAWGN) channel, we propose methods of enhancing the design of standard non-systematic DF codes by optimizing their output degree distribution based on the side information assumption. In addition, a systematic Raptor design has been employed as a possible solution to the problem.

@inproceedings{SejPieDouIsm2008ICC, title = {Fountain coding with decoder side information}, author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.}, booktitle = {IEEE International Conference on Communications (ICC)}, pages = {4477--4482}, year = {2008}, doi = {10.1109/ICC.2008.840}, url = {http://dx.doi.org/10.1109/ICC.2008.840} }

- D. Sejdinovic, V. Ponnampalam, R. J. Piechocki, and A. Doufexi,
**The throughput analysis of different IR-HARQ schemes based on fountain codes**, in*IEEE Wireless Communications and Networking Conference (WCNC)*, 2008, 267–272.In this contribution, we construct two novel IR-HARQ (automatic repeat request) schemes based on fountain codes, which combine the punctured and rateless IR-HARQ schemes, in order to attain the advantageous properties of both: nearly optimal performance of the former at the high signal-to-noise ratio (SNR) region and ratelessness of the latter. The preliminary simulation results indicate that these schemes are particularly suitable for scenarios where the transmission is originally assumed to occur at the very high SNR region, but resilience to severe deterioration of channel conditions is required.

@inproceedings{SejPonPieDou2008, title = {The throughput analysis of different IR-HARQ schemes based on fountain codes}, author = {Sejdinovic, D. and Ponnampalam, V. and Piechocki, R.J. and Doufexi, A.}, booktitle = {IEEE Wireless Communications and Networking Conference (WCNC)}, pages = {267--272}, year = {2008}, doi = {10.1109/WCNC.2008.52}, url = {http://dx.doi.org/10.1109/WCNC.2008.52}, file = {pdf/2008WCNC.pdf} }

- D. Sejdinovic, R. J. Piechocki, A. Doufexi, and M. Ismail,
**Rate adaptive binary erasure quantization with dual fountain codes**, in*IEEE Global Telecommunications Conference (GLOBECOM)*, 2008.In this contribution, duals of fountain codes are introduced and their use for lossy source compression is investigated. It is shown both theoretically and experimentally that the source coding dual of the binary erasure channel coding problem, binary erasure quantization, is solved at a nearly optimal rate with application of duals of LT and raptor codes by a belief propagation-like algorithm which amounts to a graph pruning procedure. Furthermore, this quantizing scheme is rate adaptive, i.e., its rate can be modified on-the-fly in order to adapt to the source distribution, very much like LT and raptor codes are able to adapt their rate to the erasure probability of a channel.

@inproceedings{SejPieDouIsm2008, title = {Rate adaptive binary erasure quantization with dual fountain codes}, author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.}, booktitle = {IEEE Global Telecommunications Conference (GLOBECOM)}, year = {2008}, doi = {10.1109/GLOCOM.2008.ECP.238}, url = {http://dx.doi.org/10.1109/GLOCOM.2008.ECP.238}, file = {pdf/2008Globecom.pdf} }

- D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong,
**Scalable data multicast using expanding window fountain codes**, in*45th Annual Allerton Conference on Communication, Control, and Computing*, 2007.Digital Fountain (DF) codes were introduced as an efficient and universal Forward Error Correction (FEC) solution for data multicast over lossy packet networks. However, in real-time applications, the DF encoder cannot make use of the “rateless” property as it was proposed in the DF framework, due to its delay constraints. In this scenario, many receivers might not be able to collect enough encoded symbols (packets) to perform succesful decoding of the source data block (e.g., they are connected as a low bit-rate receivers to a high bit-rate source stream, or they are affected by severe channel conditions). This paper proposes an application of recently introduced Expanding Window Fountain (EWF) codes as a scalable and efficient solution for real-time multicast data transmission. We show that, by carefully optimizing EWF code design parameters, it is possible to design a flexible DF solution that is capable of satisfying multicast data receivers over a wide range of data rates and/or erasure channel conditions.

@inproceedings{VukStaSejStaXio2007, title = {Scalable data multicast using expanding window fountain codes}, author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.}, booktitle = {45th Annual Allerton Conference on Communication, Control, and Computing}, year = {2007}, file = {pdf/2007Allerton.pdf} }

- D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. Piechocki,
**Expanding window fountain codes for unequal error protection**, in*Asilomar Conference on Signals, Systems and Computers*, 2007, 1020–1024.A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.

@inproceedings{SejVukDouSenPie2007, title = {Expanding window fountain codes for unequal error protection}, author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.}, booktitle = {Asilomar Conference on Signals, Systems and Computers}, doi = {10.1109/ACSSC.2007.4487375}, url = {http://dx.doi.org/10.1109/ACSSC.2007.4487375}, year = {2007}, pages = {1020--1024}, file = {pdf/2007Asilomar.pdf} }

*Built with jekyll-scholar.*