# Publications

### Preprints

1. J.-F. Ton, D. Sejdinovic, and K. Fukumizu, Meta Learning for Causal Direction, ArXiv e-prints:2007.02809, 2020.

The inaccessibility of controlled randomized trials due to inherent constraints in many fields of science has been a fundamental issue in causal inference. In this paper, we focus on distinguishing the cause from effect in the bivariate setting under limited observational data. Based on recent developments in meta learning as well as in causal inference, we introduce a novel generative model that allows distinguishing cause and effect in the small data setting. Using a learnt task variable that contains distributional information of each dataset, we propose an end-to-end algorithm that makes use of similar training datasets at test time. We demonstrate our method on various synthetic as well as real-world data and show that it is able to maintain high accuracy in detecting directions across varying dataset sizes.

@unpublished{TonSejFuk2020,
author = {Ton, Jean-Francois and Sejdinovic, Dino and Fukumizu, Kenji},
title = {{{Meta Learning for Causal Direction}}},
journal = {ArXiv e-prints:2007.02809},
arxiv = {https://arxiv.org/abs/2007.02809},
year = {2020}
}

2. S. L. Chau, J. Gonzalez, and D. Sejdinovic, Learning Inconsistent Preferences with Kernel Methods, ArXiv e-prints:2006.03847, 2020.

We propose a probabilistic kernel approach for preferential learning from pairwise duelling data using Gaussian Processes. Different from previous methods, we do not impose a total order on the item space, hence can capture more expressive latent preferential structures such as inconsistent preferences and clusters of comparable items. Furthermore, we prove the universality of the proposed kernels, i.e. that the corresponding reproducing kernel Hilbert Space (RKHS) is dense in the space of skew-symmetric preference functions. To conclude the paper, we provide an extensive set of numerical experiments on simulated and real-world datasets showcasing the competitiveness of our proposed method with state-of-the-art.

@unpublished{ChaGonSej2020,
author = {Chau, Siu Lun and Gonzalez, Javier and Sejdinovic, Dino},
title = {{{Learning Inconsistent Preferences with Kernel Methods}}},
journal = {ArXiv e-prints:2006.03847},
arxiv = {https://arxiv.org/abs/2006.03847},
year = {2020}
}

3. D. Rindt, D. Sejdinovic, and D. Steinsaltz, Consistency of permutation tests for HSIC and dHSIC, ArXiv e-prints:2005.06573, 2020.

The Hilbert–Schmidt Independence Criterion (HSIC) is a popular measure of the dependency between two random variables. The statistic dHSIC is an extension of HSIC that can be used to test joint independence of d random variables. Such hypothesis testing for (joint) independence is often done using a permutation test, which compares the observed data with randomly permuted datasets. The main contribution of this work is proving that the power of such independence tests converges to 1 as the sample size converges to infinity. This answers a question that was asked in (Pfister, 2018) Additionally this work proves correct type 1 error rate of HSIC and dHSIC permutation tests and provides guidance on how to select the number of permutations one uses in practice. While correct type 1 error rate was already proved in (Pfister, 2018), we provide a modified proof following (Berrett, 2019), which extends to the case of non-continuous data. The number of permutations to use was studied e.g. by (Marozzi, 2004) but not in the context of HSIC and with a slight difference in the estimate of the p-value and for permutations rather than vectors of permutations. While the last two points have limited novelty we include these to give a complete overview of permutation testing in the context of HSIC and dHSIC.

@unpublished{RinSejSte2020,
author = {Rindt, David and Sejdinovic, Dino and Steinsaltz, David},
title = {{{Consistency of permutation tests for HSIC and dHSIC}}},
journal = {ArXiv e-prints:2005.06573},
arxiv = {https://arxiv.org/abs/2005.06573},
year = {2020}
}

4. S. L. Chau, M. Cucuringu, and D. Sejdinovic, Spectral Ranking with Covariates, ArXiv e-prints:2005.04035, 2020.

We consider approaches to the classical problem of establishing a statistical ranking on a given set of items from incomplete and noisy pairwise comparisons, and propose spectral algorithms able to leverage available covariate information about the items. We give a comprehensive study of several ways such side information can be useful in spectral ranking. We establish connections of the resulting algorithms to reproducing kernel Hilbert spaces and associated dependence measures, along with an extension to fair ranking using statistical parity. We present an extensive set of numerical experiments showcasing the competitiveness of the proposed algorithms with state-of-the-art methods.

@unpublished{ChaCucSej2020,
title = {{{Spectral Ranking with Covariates}}},
author = {Chau, Siu Lun and Cucuringu, Mihai and Sejdinovic, Dino},
journal = {ArXiv e-prints:2005.04035},
arxiv = {https://arxiv.org/abs/arXiv:2005.04035},
year = {2020}
}

5. Q. Zhang, S. Filippi, S. Flaxman, and D. Sejdinovic, Bayesian Kernel Two-Sample Testing, ArXiv e-prints:2002.05550, 2020.

In modern data analysis, nonparametric measures of discrepancies between random variables are particularly important. The subject is well-studied in the frequentist literature, while the development in the Bayesian setting is limited where applications are often restricted to univariate cases. Here, we propose a Bayesian kernel two-sample testing procedure based on modelling the difference between kernel mean embeddings in the reproducing kernel Hilbert space utilising the framework established by Flaxman et al (2016). The use of kernel methods enables its application to random variables in generic domains beyond the multivariate Euclidean spaces. The proposed procedure results in a posterior inference scheme that allows an automatic selection of the kernel parameters relevant to the problem at hand. In a series of synthetic experiments and two real data experiments (i.e. testing network heterogeneity from high-dimensional data and six-membered monocyclic ring conformation comparison), we illustrate the advantages of our approach.

@unpublished{ZhaFilFlaSej2020,
title = {{{Bayesian Kernel Two-Sample Testing}}},
author = {Zhang, Q. and Filippi, S. and Flaxman, S. and Sejdinovic, D.},
journal = {ArXiv e-prints:2002.05550},
arxiv = {https://arxiv.org/abs/arXiv:2002.05550},
year = {2020}
}

6. R. Hu, G. K. Nicholls, and D. Sejdinovic, Large Scale Tensor Regression using Kernels and Variational Inference, ArXiv e-prints:2002.04704, 2020.

We outline an inherent weakness of tensor factorization models when latent factors are expressed as a function of side information and propose a novel method to mitigate this weakness. We coin our method Kernel Fried Tensor (KFT) and present it as a large scale forecasting tool for high dimensional data. Our results show superior performance against LightGBM and Field Aware Factorization Machines (FFM), two algorithms with proven track records widely used in industrial forecasting. We also develop a variational inference framework for KFT and associate our forecasts with calibrated uncertainty estimates on three large scale datasets. Furthermore, KFT is empirically shown to be robust against uninformative side information in terms of constants and Gaussian noise.

@unpublished{HuNicSej2020,
title = {{{Large Scale Tensor Regression using Kernels and Variational Inference}}},
author = {Hu, R. and Nicholls, G. K. and Sejdinovic, D.},
journal = {ArXiv e-prints:2002.04704},
arxiv = {https://arxiv.org/abs/arXiv:2002.04704},
year = {2020}
}

7. N. M. van Esbroeck, D. T. Lennon, H. Moon, V. Nguyen, F. Vigneau, L. C. Camenzind, L. Yu, D. M. Zumbühl, G. A. D. Briggs, D. Sejdinovic, and N. Ares, Quantum device fine-tuning using unsupervised embedding learning, ArXiv e-prints:arXiv:2001.04409, 2020.

Quantum devices with a large number of gate electrodes allow for precise control of device parameters. This capability is hard to fully exploit due to the complex dependence of these parameters on applied gate voltages. We experimentally demonstrate an algorithm capable of fine-tuning several device parameters at once. The algorithm acquires a measurement and assigns it a score using a variational auto-encoder. Gate voltage settings are set to optimise this score in real-time in an unsupervised fashion. We report fine-tuning times of a double quantum dot device within approximately 40 min.

@unpublished{esbroeck2020quantum,
title = {{{Quantum device fine-tuning using unsupervised embedding learning}}},
author = {van Esbroeck, N. M. and Lennon, D. T. and Moon, H. and Nguyen, V. and Vigneau, F. and Camenzind, L. C. and Yu, L. and Zumb\"uhl, D. M. and Briggs, G. A. D. and Sejdinovic, D. and Ares, N.},
journal = {ArXiv e-prints:arXiv:2001.04409},
arxiv = {https://arxiv.org/abs/arXiv:2001.04409},
year = {2020}
}

8. H. Moon, D. T. Lennon, J. Kirkpatrick, N. M. van Esbroeck, L. C. Camenzind, L. Yu, F. Vigneau, D. M. Zumbühl, G. A. D. Briggs, M. A. Osborne, D. Sejdinovic, E. A. Laird, and N. Ares, Machine learning enables completely automatic tuning of a quantum device faster than human experts, ArXiv e-prints:2001.02589, 2020.

Device variability is a bottleneck for the scalability of semiconductor quantum devices. Increasing device control comes at the cost of a large parameter space that has to be explored in order to find the optimal operating conditions. We demonstrate a statistical tuning algorithm that navigates this entire parameter space, using just a few modelling assumptions, in the search for specific electron transport features. We focused on gate-defined quantum dot devices, demonstrating fully automated tuning of two different devices to double quantum dot regimes in an up to eight-dimensional gate voltage space. We considered a parameter space defined by the maximum range of each gate voltage in these devices, demonstrating expected tuning in under 70 minutes. This performance exceeded a human benchmark, although we recognise that there is room for improvement in the performance of both humans and machines. Our approach is approximately 180 times faster than a pure random search of the parameter space, and it is readily applicable to different material systems and device architectures. With an efficient navigation of the gate voltage space we are able to give a quantitative measurement of device variability, from one device to another and after a thermal cycle of a device. This is a key demonstration of the use of machine learning techniques to explore and optimise the parameter space of quantum devices and overcome the challenge of device variability.

@unpublished{Moon2020,
title = {Machine learning enables completely automatic tuning of a quantum device faster than human experts},
author = {Moon, H. and Lennon, D. T. and Kirkpatrick, J. and van Esbroeck, N. M. and Camenzind, L. C. and Yu, Liuqi and Vigneau, F. and Zumb\"uhl, D. M. and Briggs, G. A. D. and Osborne, M. A and Sejdinovic, D. and Laird, E. A. and Ares, N.},
journal = {ArXiv e-prints:2001.02589},
arxiv = {https://arxiv.org/abs/2001.02589},
year = {2020}
}

9. T. Fernandez, A. Gretton, D. Rindt, and D. Sejdinovic, A Kernel Log-Rank Test of Independence for Right-Censored Data, ArXiv e-prints:1912.03784, 2019.

With the incorporation of new data gathering methods in clinical research, it becomes fundamental for survival analysis techniques to deal with high-dimensional or/and non-standard covariates. In this paper we introduce a general non-parametric independence test between right-censored survival times and covariates taking values on a general (not necessarily Euclidean) space X. We show that our test statistic has a dual interpretation, first in terms of the supremum of a potentially infinite collection of weight-indexed log-rank tests, with weight functions belonging to a reproducing kernel Hilbert space (RKHS) of functions; and second, as the norm of the difference of embeddings of certain finite measures into the RKHS, similar to the Hilbert-Schmidt Independence Criterion (HSIC) test-statistic. We study the asymptotic properties of the test, finding sufficient conditions to ensure that our test is omnibus. The test statistic can be computed straightforwardly, and the rejection threshold is obtained via an asymptotically consistent Wild-Bootstrap procedure. We perform extensive simulations demonstrating that our testing procedure generally performs better than competing approaches in detecting complex nonlinear dependence.

@unpublished{FerGreRinSej2019,
author = {Fernandez, Tamara and Gretton, Arthur and Rindt, David and Sejdinovic, Dino},
title = {{{A Kernel Log-Rank Test of Independence for Right-Censored Data}}},
journal = {ArXiv e-prints:1912.03784},
arxiv = {https://arxiv.org/abs/1912.03784},
year = {2019}
}

10. D. Watson-Parris, S. Sutherland, M. Christensen, A. Caterini, D. Sejdinovic, and P. Stier, Detecting Anthropogenic Cloud Perturbations with Deep Learning, ArXiv e-prints:1911.13061, 2019.

One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth’s energy balance. Aerosols provide the ‘seeds’ on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global temperatures and small perturbations can lead to significant amounts of cooling or warming. Uncertainty in this effect is so large it is not currently known if it is negligible, or provides a large enough cooling to largely negate present-day warming by CO2. This work uses deep convolutional neural networks to look for two particular perturbations in clouds due to anthropogenic aerosol and assess their properties and prevalence, providing valuable insights into their climatic effects.

@unpublished{WatSutChrCatSejSti2019,
author = {Watson-Parris, Duncan and Sutherland, Samuel and Christensen, Matthew and Caterini, Anthony and Sejdinovic, Dino and Stier, Philip},
title = {{{Detecting Anthropogenic Cloud Perturbations with Deep Learning}}},
journal = {ArXiv e-prints:1911.13061},
arxiv = {https://arxiv.org/abs/1911.13061},
year = {2019}
}

11. Z. Li, A. Perez-Suay, G. Camps-Valls, and D. Sejdinovic, Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness, ArXiv e-prints:1911.04322, 2019.

Current adoption of machine learning in industrial, societal and economical activities has raised concerns about the fairness, equity and ethics of automated decisions. Predictive models are often developed using biased datasets and thus retain or even exacerbate biases in their decisions and recommendations. Removing the sensitive covariates, such as gender or race, is insufficient to remedy this issue since the biases may be retained due to other related covariates. We present a regularization approach to this problem that trades off predictive accuracy of the learned models (with respect to biased labels) for the fairness in terms of statistical parity, i.e. independence of the decisions from the sensitive covariates. In particular, we consider a general framework of regularized empirical risk minimization over reproducing kernel Hilbert spaces and impose an additional regularizer of dependence between predictors and sensitive covariates using kernel-based measures of dependence, namely the Hilbert-Schmidt Independence Criterion (HSIC) and its normalized version. This approach leads to a closed-form solution in the case of squared loss, i.e. ridge regression. Moreover, we show that the dependence regularizer has an interpretation as modifying the corresponding Gaussian process (GP) prior. As a consequence, a GP model with a prior that encourages fairness to sensitive variables can be derived, allowing principled hyperparameter selection and studying of the relative relevance of covariates under fairness constraints. Experimental results in synthetic examples and in real problems of income and crime prediction illustrate the potential of the approach to improve fairness of automated decisions.

@unpublished{LiPerCamSej2019,
author = {Li, Zhu and Perez-Suay, Adrian and Camps-Valls, Gustau and Sejdinovic, Dino},
title = {{{Kernel Dependence Regularizers and Gaussian Processes with Applications to Algorithmic Fairness}}},
journal = {ArXiv e-prints:1911.04322},
arxiv = {https://arxiv.org/abs/1911.04322},
year = {2019}
}

12. D. Rindt, D. Sejdinovic, and D. Steinsaltz, Nonparametric Independence Testing for Right-Censored Data using Optimal Transport, ArXiv e-prints:1906.03866, 2019.

We propose a nonparametric test of independence, termed OPT-HSIC, between a covariate and a right-censored lifetime. Because the presence of censoring creates a challenge in applying the standard permutation-based testing approaches, we use optimal transport to transform the censored dataset into an uncensored one, while preserving the relevant dependencies. We then apply a permutation test using the kernel-based dependence measure as a statistic to the transformed dataset. The type 1 error is proven to be correct in the case where censoring is independent of the covariate. Experiments indicate that OPT-HSIC has power against a much wider class of alternatives than Cox proportional hazards regression and that it has the correct type 1 control even in the challenging cases where censoring strongly depends on the covariate.

@unpublished{RinSejSte2019,
author = {Rindt, David and Sejdinovic, Dino and Steinsaltz, David},
title = {{{Nonparametric Independence Testing for Right-Censored Data using Optimal Transport}}},
journal = {ArXiv e-prints:1906.03866},
arxiv = {https://arxiv.org/abs/1906.03866},
year = {2019}
}

13. J.-F. Ton, L. Chan, Y. W. Teh, and D. Sejdinovic, Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings, ArXiv e-prints:1906.02236, 2019.

Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. on estimating conditional expectations in regression. In many applications, however, we are faced with conditional distributions which cannot be meaningfully summarized using expectation only (due to e.g. multimodality). Hence, we consider the problem of conditional density estimation in the meta-learning setting. We introduce a novel technique for meta-learning which combines neural representation and noise-contrastive estimation with the established literature of conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world problems, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks.

@unpublished{TonChaTehSej2019,
author = {Ton, Jean-Francois and Chan, Leung and Teh, Yee Whye and Sejdinovic, Dino},
title = {{{Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings}}},
journal = {ArXiv e-prints:1906.02236},
arxiv = {https://arxiv.org/abs/1906.02236},
year = {2019}
}

14. M. Kanagawa, P. Hennig, D. Sejdinovic, and B. K. Sriperumbudur, Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences, ArXiv e-prints:1807.02582, 2018.

This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

@unpublished{KanHenSejSri2018,
author = {Kanagawa, M. and Hennig, P. and Sejdinovic, D. and Sriperumbudur, B.K.},
title = {{{ Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences}}},
journal = {ArXiv e-prints:1807.02582},
arxiv = {https://arxiv.org/abs/1807.02582},
year = {2018}
}

15. H. Strathmann, D. Sejdinovic, and M. Girolami, Unbiased Bayes for Big Data: Paths of Partial Posteriors, ArXiv e-prints:1501.03326, 2015.

A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations – without explicit simulation from the full posterior. The scheme’s variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets.

@unpublished{StrSejGir2015,
author = {Strathmann, H. and Sejdinovic, D. and Girolami, M.},
title = {{{Unbiased Bayes for Big Data: Paths of Partial Posteriors}}},
journal = {ArXiv e-prints:1501.03326},
arxiv = {http://arxiv.org/abs/1501.03326},
year = {2015}
}


### Published / In Press

1. T. G. J. Rudner, D. Sejdinovic, and Y. Gal, Inter-domain Deep Gaussian Processes with RKHS Fourier Features, in International Conference on Machine Learning (ICML), 2020, to appear.

Inter-domain Gaussian processes (GPs) allow for high flexibility and low computational cost when performing approximate inference in GP models. They are particularly suitable for modeling data exhibiting global function behavior but are limited to stationary covariance functions and thus fail to model non-stationary data effectively. We propose Inter-domain Deep Gaussian Processes with RKHS Fourier Features, an extension of shallow inter-domain GPs that combines the advantages of inter-domain and deep Gaussian processes (DGPs) and demonstrate how to leverage existing approximate inference approaches to perform simple and scalable approximate inference on Inter-domain Deep Gaussian Processes. We assess the performance of our method on a wide range of prediction problems and demonstrate that it outperforms inter-domain GPs and DGPs on challenging large-scale and high-dimensional real-world datasets exhibiting both global behavior as well as a high-degree of non-stationarity.

@inproceedings{RudSejGal2020,
author = {Rudner, T.G.J. and Sejdinovic, D. and Gal, Y.},
title = {{{Inter-domain Deep Gaussian Processes with RKHS Fourier Features}}},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {to appear},
year = {2020}
}

2. D. Sejdinovic, Discussion of ‘Functional models for time-varying random objects’ by Dubey and Müller, Journal of the Royal Statistical Society: Series B, vol. 82, no. 2, 312–313, 2020.

The discussion focuses on metric covariance, a new association measure between paired random objects in a metric space, developed by Dubey and Müller, and on its relationship with other similar concepts which have previously appeared in the literature, including distance covariance by Székely et al, as well as its generalisations which rely on the formalism of reproducing kernel Hilbert spaces (RKHS).

@article{Sej2020-discussion,
author = {Sejdinovic, Dino},
title = {{{Discussion of Functional models for time-varying random objects' by Dubey and M\"uller}}},
journal = {Journal of the Royal Statistical Society: Series B},
volume = {82},
number = {2},
pages = {312--313},
arxiv = {https://arxiv.org/abs/2001.03267},
year = {2020}
}

3. J. Runge, P. Nowack, M. Kretschmer, S. Flaxman, and D. Sejdinovic, Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets, Science Advances, vol. 5, no. 11, 2019.

Identifying causal relationships and quantifying their strength from observational time series data are key problems in disciplines dealing with complex dynamical systems such as the Earth system or the human body. Data-driven causal inference in such systems is challenging since datasets are often high dimensional and nonlinear with limited sample sizes. Here, we introduce a novel method that flexibly combines linear or nonlinear conditional independence tests with a causal discovery algorithm to estimate causal networks from large-scale time series datasets. We validate the method on time series of well-understood physical mechanisms in the climate system and the human heart and using large-scale synthetic datasets mimicking the typical properties of real-world data. The experiments demonstrate that our method outperforms state-of-the-art techniques in detection power, which opens up entirely new possibilities to discover and quantify causal networks from time series across a range of research fields.

@article{RunNowKreFlaSej2019,
author = {Runge, Jakob and Nowack, Peer and Kretschmer, Marlene and Flaxman, Seth and Sejdinovic, Dino},
title = {{{Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets}}},
volume = {5},
number = {11},
arxiv = {https://arxiv.org/abs/1702.07007},
code = {https://github.com/jakobrunge/tigramite},
year = {2019}
}

4. H. C. L. Law, P. Zhao, L. Chan, J. Huang, and D. Sejdinovic, Hyperparameter Learning via Distributional Transfer, in Advances in Neural Information Processing Systems (NeurIPS), 2019.

Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. This results in a joint Gaussian process model on hyperparameters and data representations. Representations make use of the framework of distribution embeddings into reproducing kernel Hilbert spaces. The developed method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.

@inproceedings{LawZhaChaHuaSej2019,
author = {Law, Ho Chung Leon and Zhao, Peilin and Chan, Leung and Huang, Junzhou and Sejdinovic, Dino},
title = {{{Hyperparameter Learning via Distributional Transfer}}},
arxiv = {https://arxiv.org/abs/1810.06305},
year = {2019},
url = {https://papers.nips.cc/paper/8905-hyperparameter-learning-via-distributional-transfer},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}
}

5. A. Raj, H. C. L. Law, D. Sejdinovic, and M. Park, A Differentially Private Kernel Two-Sample Test, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2019, vol. 11906, 697–724.

Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to guarantee the privacy of subjects in the data. Unlike existing differentially private parametric tests that simply add noise to data, kernel-based testing imposes a challenge due to a complex dependence of test statistics on the raw data, as these statistics correspond to estimators of distances between representations of probability measures in Hilbert spaces. Our approach considers finite dimensional approximations to those representations. As a result, a simple chi-squared test is obtained, where a test statistic depends on a mean and covariance of empirical differences between the samples, which we perturb for a privacy guarantee. We investigate the utility of our framework in two realistic settings and conclude that our method requires only a relatively modest increase in sample size to achieve a similar level of power to the non-private tests in both settings.

@inproceedings{RajLawSejPar2019,
author = {Raj, Anant and Law, Ho Chung Leon and Sejdinovic, Dino and Park, Mijung},
title = {{{A Differentially Private Kernel Two-Sample Test}}},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
arxiv = {https://arxiv.org/abs/1808.00380},
url = {https://doi.org/10.1007/978-3-030-46150-8_41},
doi = {10.1007/978-3-030-46150-8_41},
series = {Lecture Notes in Computer Science},
volume = {11906},
pages = {697--724},
year = {2019}
}

6. Z. Li, J.-F. Ton, D. Oglic, and D. Sejdinovic, Towards A Unified Analysis of Random Fourier Features, in International Conference on Machine Learning (ICML), 2019, PMLR 97:3905–3914.

Random Fourier features is a widely used, simple, and effective technique for scaling up kernel methods. The existing theoretical analysis of the approach, however, remains focused on specific learning tasks and typically gives pessimistic bounds which are at odds with the empirical results. We tackle these problems and provide the first unified risk analysis of learning with random Fourier features using the squared error and Lipschitz continuous loss functions. In our bounds, the trade-off between the computational cost and the expected risk convergence rate is problem specific and expressed in terms of the regularization parameter and the \emphnumber of effective degrees of freedom. We study both the standard random Fourier features method for which we improve the existing bounds on the number of features required to guarantee the corresponding minimax risk convergence rate of kernel ridge regression, as well as a data-dependent modification which samples features proportional to \emphridge leverage scores and further reduces the required number of features. As ridge leverage scores are expensive to compute, we devise a simple approximation scheme which provably reduces the computational cost without loss of statistical efficiency.

@inproceedings{LiTonOglSej2019,
author = {Li, Z. and Ton, J.-F. and Oglic, D. and Sejdinovic, D.},
title = {{{Towards A Unified Analysis of Random Fourier Features}}},
booktitle = {International Conference on Machine Learning (ICML)},
pages = {PMLR 97:3905--3914},
arxiv = {https://arxiv.org/abs/1806.09178},
url = {http://proceedings.mlr.press/v97/li19k.html},
year = {2019}
}

7. G. Camps-Valls, D. Sejdinovic, J. Runge, and M. Reichstein, A Perspective on Gaussian Processes for Earth Observation, National Science Review, vol. 6, no. 4, 616–618, 2019.

Earth observation (EO) by airborne and satellite remote sensing and in-situ observations play a fundamental role in monitoring our planet. In the last decade, machine learning and Gaussian processes (GPs) in particular has attained outstanding results in the estimation of bio-geo-physical variables from the acquired images at local and global scales in a time-resolved manner. GPs provide not only accurate estimates but also principled uncertainty estimates for the predictions, can easily accommodate multimodal data coming from different sensors and from multitemporal acquisitions, allow the introduction of physical knowledge, and a formal treatment of uncertainty quantification and error propagation. Despite great advances in forward and inverse modelling, GP models still have to face important challenges that are revised in this perspective paper. GP models should evolve towards data-driven physics-aware models that respect signal characteristics, be consistent with elementary laws of physics, and move from pure regression to observational causal inference.

@article{CamSejRunRei2019,
author = {Camps-Valls, G. and Sejdinovic, D. and Runge, J. and Reichstein, M.},
title = {{{A Perspective on Gaussian Processes for Earth Observation}}},
journal = {National Science Review},
arxiv = {https://arxiv.org/abs/2007.01238},
volume = {6},
number = {4},
pages = {616--618},
doi = {10.1093/nsr/nwz028},
year = {2019},
url = {https://doi.org/10.1093/nsr/nwz028}
}

8. F.-X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, and D. Sejdinovic, Probabilistic Integration: A Role in Statistical Computation? (with Discussion and Rejoinder), Statistical Science, vol. 34, no. 1, 1–22; rejoinder: 38–42, 2019.

A research frontier has emerged in scientific computation, wherein discretisation error is regarded as a source of epistemic uncertainty that can be modelled. This raises several statistical challenges, including the design of statistical methods that enable the coherent propagation of probabilities through a (possibly deterministic) computational work-flow, in order to assess the impact of discretisation error on the computer output. This paper examines the case for probabilistic numerical methods in routine statistical computation. Our focus is on numerical integration, where a probabilistic integrator is equipped with a full distribution over its output that reflects the fact that the integrand has been discretised. Our main technical contribution is to establish, for the first time, rates of posterior contraction for one such method. Several substantial applications are provided for illustration and critical evaluation, including examples from statistical modelling, computer graphics and a computer model for an oil reservoir.

@article{BriOatGirOsbSej2019,
author = {Briol, F.-X. and Oates, C.J. and Girolami, M. and Osborne, M.A. and Sejdinovic, D.},
title = {{{Probabilistic Integration: A Role in Statistical Computation? (with Discussion and Rejoinder)}}},
journal = {Statistical Science},
arxiv = {http://arxiv.org/abs/1512.00933},
volume = {34},
number = {1},
pages = {1--22; rejoinder: 38--42},
year = {2019},
url = {https://projecteuclid.org/euclid.ss/1555056025},
doi = {10.1214/18-STS660}
}

9. A. L. Caterini, A. Doucet, and D. Sejdinovic, Hamiltonian Variational Auto-Encoder, in Advances in Neural Information Processing Systems (NeurIPS), 2018.

Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochastic variational inference, this provides a methodology scaling to large datasets. However, for this methodology to be practically efficient, it is necessary to obtain low-variance unbiased estimators of the ELBO and its gradients with respect to the parameters of interest. While the use of Markov chain Monte Carlo (MCMC) techniques such as Hamiltonian Monte Carlo (HMC) has been previously suggested to achieve this, the proposed methods require specifying reverse kernels which have a large impact on performance. Additionally, the resulting unbiased estimator of the ELBO for most MCMC kernels is typically not amenable to the reparameterization trick. We show here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS), we obtain a scheme that provides low-variance unbiased estimators of the ELBO and its gradients using the reparameterization trick. This allows us to develop a Hamiltonian Variational Auto-Encoder (HVAE). This method can be reinterpreted as a target-informed normalizing flow which, within our context, only requires a few evaluations of the gradient of the sampled likelihood and trivial Jacobian calculations at each iteration.

@inproceedings{CatDouSej2018,
author = {Caterini, A.L. and Doucet, A. and Sejdinovic, D.},
title = {{{Hamiltonian Variational Auto-Encoder}}},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
url = {https://papers.nips.cc/paper/8039-hamiltonian-variational-auto-encoder},
%pages = {to appear},
arxiv = {https://arxiv.org/abs/1805.11328},
year = {2018}
}

10. H. C. L. Law, D. Sejdinovic, E. Cameron, T. C. D. Lucas, S. Flaxman, K. Battle, and K. Fukumizu, Variational Learning on Aggregate Outputs with Gaussian Processes, in Advances in Neural Information Processing Systems (NeurIPS), 2018.

While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidence, with over 1 million observations.

@inproceedings{LawSejCamLucFlaBatFuk2018,
author = {Law, Ho Chung Leon and Sejdinovic, D. and Cameron, E. and Lucas, T. C. D. and Flaxman, S. and Battle, K. and Fukumizu, K.},
title = {{{Variational Learning on Aggregate Outputs with Gaussian Processes}}},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
%pages = {to appear},
url = {https://papers.nips.cc/paper/7847-variational-learning-on-aggregate-outputs-with-gaussian-processes},
arxiv = {https://arxiv.org/abs/1805.08463},
year = {2018},
code = {https://github.com/hcllaw/VBAgg}
}

11. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, Causal Inference via Kernel Deviance Measures, in Advances in Neural Information Processing Systems (NeurIPS), 2018.

Discovering the causal structure among a set of variables is a fundamental problem in many areas of science. In this paper, we propose Kernel Conditional Deviance for Causal Inference (KCDC) a fully nonparametric causal discovery method based on purely observational data. From a novel interpretation of the notion of asymmetry between cause and effect, we derive a corresponding asymmetry measure using the framework of reproducing kernel Hilbert spaces. Based on this, we propose three decision rules for causal discovery. We demonstrate the wide applicability of our method across a range of diverse synthetic datasets. Furthermore, we test our method on real-world time series data and the real-world benchmark dataset Tubingen Cause-Effect Pairs where we outperform existing state-of-the-art methods.

@inproceedings{MitSejTeh2018,
author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
title = {{{Causal Inference via Kernel Deviance Measures}}},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
url = {https://papers.nips.cc/paper/7930-causal-inference-via-kernel-deviance-measures},
%pages = {to appear},
arxiv = {https://arxiv.org/abs/1804.04622},
year = {2018}
}

12. J.-F. Ton, S. Flaxman, D. Sejdinovic, and S. Bhatt, Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features, Spatial Statistics, vol. 28, 59–78, 2018.

The use of covariance kernels is ubiquitous in the field of spatial statistics. Kernels allow data to be mapped into high-dimensional feature spaces and can thus extend simple linear additive methods to nonlinear methods with higher order interactions. However, until recently, there has been a strong reliance on a limited class of stationary kernels such as the Matern or squared exponential, limiting the expressiveness of these modelling approaches. Recent machine learning research has focused on spectral representations to model arbitrary stationary kernels and introduced more general representations that include classes of nonstationary kernels. In this paper, we exploit the connections between Fourier feature representations, Gaussian processes and neural networks to generalise previous approaches and develop a simple and efficient framework to learn arbitrarily complex nonstationary kernel functions directly from the data, while taking care to avoid overfitting using state-of-the-art methods from deep learning. We highlight the very broad array of kernel classes that could be created within this framework. We apply this to a time series dataset and a remote sensing problem involving land surface temperature in Eastern Africa. We show that without increasing the computational or storage complexity, nonstationary kernels can be used to improve generalisation performance and provide more interpretable results.

@article{TonFlaSejBha2018,
author = {Ton, J.-F. and Flaxman, S. and Sejdinovic, D. and Bhatt, S.},
title = {{{Spatial Mapping with Gaussian Processes and Nonstationary Fourier Features}}},
journal = {Spatial Statistics},
volume = {28},
arxiv = {https://arxiv.org/abs/1711.05615},
doi = {10.1016/j.spasta.2018.02.002},
year = {2018},
url = {https://doi.org/10.1016/j.spasta.2018.02.002},
pages = {59--78}
}

13. H. C. L. Law, D. J. Sutherland, D. Sejdinovic, and S. Flaxman, Bayesian Approaches to Distribution Regression, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2018, PMLR 84:1167–1176.

Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We account for this uncertainty with a Bayesian distribution regression formalism, improving the robustness and performance of the model when group sizes vary. We frame our models in a neural network style, allowing for simple MAP inference using backpropagation to learn the parameters, as well as MCMC-based inference which can fully propagate uncertainty. We demonstrate our approach on illustrative toy datasets, as well as on a challenging problem of predicting age from images.

@inproceedings{LawSutSejFla2018,
author = {Law, Ho Chung Leon and Sutherland, Dougal J. and Sejdinovic, Dino and Flaxman, Seth},
title = {{{Bayesian Approaches to Distribution Regression}}},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
arxiv = {https://arxiv.org/abs/1705.04293},
url = {http://proceedings.mlr.press/v84/law18a.html},
pages = {PMLR 84:1167--1176},
year = {2018}
}

14. Q. Zhang, S. Filippi, A. Gretton, and D. Sejdinovic, Large-Scale Kernel Methods for Independence Testing, Statistics and Computing, vol. 28, no. 1, 113–130, Jan. 2018.

Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable tradeoff between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our novel large scale methods give comparable performance with existing methods whilst using significantly less computation time and memory.

@article{ZhaFilGreSej2018,
author = {Zhang, Q. and Filippi, S. and Gretton, A. and Sejdinovic, D.},
title = {{{Large-Scale Kernel Methods for Independence Testing}}},
journal = {Statistics and Computing},
arxiv = {http://arxiv.org/abs/1606.07892},
doi = {10.1007/s11222-016-9721-7},
year = {2018},
month = jan,
volume = {28},
number = {1},
pages = {113--130},
code = {https://github.com/oxmlcs/kerpy}
}

15. S. Flaxman, Y. W. Teh, and D. Sejdinovic, Poisson Intensity Estimation with Reproducing Kernels, Electronic Journal of Statistics, vol. 11, no. 2, 5081–5104, 2017.

Despite the fundamental nature of the inhomogeneous Pois- son process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially when observed points lie in a high-dimensional space. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) for- mulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. Whereas RKHS models used in su- pervised learning rely on the so-called representer theorem, the form of the inhomogeneous Poisson process likelihood means that the representer theorem does not apply. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite- dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.

@article{FlaTehSej2017ejs,
author = {Flaxman, Seth and Teh, Yee Whye and Sejdinovic, Dino},
title = {{{Poisson Intensity Estimation with Reproducing Kernels}}},
journal = {Electronic Journal of Statistics},
year = {2017},
volume = {11},
number = {2},
pages = {5081--5104},
doi = {10.1214/17-EJS1339SI},
url = {https://projecteuclid.org/euclid.ejs/1513306868}
}

16. H. C. L. Law, C. Yau, and D. Sejdinovic, Testing and Learning on Distributions with Symmetric Noise Invariance, in Advances in Neural Information Processing Systems (NeurIPS), 2017, 1343–1353.

Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest – discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise. Such features lend themselves to a straightforward neural network implementation and can thus also be learned given a supervised signal.

@inproceedings{LawYauSej2017,
author = {Law, H. C. L. and Yau, C. and Sejdinovic, D.},
title = {{{Testing and Learning on Distributions with Symmetric Noise Invariance}}},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
arxiv = {https://arxiv.org/abs/1703.07596},
code = {https://github.com/hcllaw/Fourier-Phase-Neural-Network},
year = {2017},
url = {https://papers.nips.cc/paper/6733-testing-and-learning-on-distributions-with-symmetric-noise-invariance},
pages = {1343--1353}
}

17. I. Schuster, H. Strathmann, B. Paige, and D. Sejdinovic, Kernel Sequential Monte Carlo, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), 2017, vol. 10534, 390–409.

Bayesian posterior inference with Monte Carlo methods has a fundamental role in statistics and probabilistic machine learning. Target posterior distributions arising in increasingly complex models often exhibit high degrees of nonlinearity and multimodality and pose substantial challenges to traditional samplers. We propose the Kernel Sequential Monte Carlo (KSMC) framework for building emulator models of the current particle system in a Reproducing Kernel Hilbert Space and use the emulator’s geometry to inform local proposals. KSMC is applicable when gradients are unknown or prohibitively expensive and inherits the superior performance of SMC on multi-modal targets and its ability to estimate model evidence. Strengths of the proposed methodology are demonstrated on a series of challenging synthetic and real-world examples.

@inproceedings{SchStrPaiSej2017,
author = {Schuster, I. and Strathmann, H. and Paige, B. and Sejdinovic, D.},
title = {{{Kernel Sequential Monte Carlo}}},
arxiv = {http://arxiv.org/abs/1510.03105},
url = {https://doi.org/10.1007/978-3-319-71249-9_24},
doi = {10.1007/978-3-319-71249-9_24},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)},
series = {Lecture Notes in Computer Science},
volume = {10534},
pages = {390--409},
year = {2017}
}

18. Q. Zhang, S. Filippi, S. Flaxman, and D. Sejdinovic, Feature-to-Feature Regression for a Two-Step Conditional Independence Test, in Uncertainty in Artificial Intelligence (UAI), 2017.

The algorithms for causal discovery and more broadly for learning the structure of graphical models require well calibrated and consistent conditional independence (CI) tests. We revisit the CI tests which are based on two-step procedures and involve regression with subsequent (unconditional) independence test (RESIT) on regression residuals and investigate the assumptions under which these tests operate. In particular, we demonstrate that when going beyond simple functional relationships with additive noise, such tests can lead to an inflated number of false discoveries. We study the relationship of these tests with those based on dependence measures using reproducing kernel Hilbert spaces (RKHS) and propose an extension of RESIT which uses RKHS-valued regression. The resulting test inherits the simple two-step testing procedure of RESIT, while giving correct Type I control and competitive power. When used as a component of the PC algorithm, the proposed test is more robust to the case where hidden variables induce a switching behaviour in the associations present in the data.

@inproceedings{ZhaFilFlaSej2017,
author = {Zhang, Q. and Filippi, S. and Flaxman, S. and Sejdinovic, D.},
title = {{{Feature-to-Feature Regression for a Two-Step Conditional Independence Test}}},
booktitle = {Uncertainty in Artificial Intelligence (UAI)},
url = {http://auai.org/uai2017/proceedings/papers/250.pdf},
supplements = {http://auai.org/uai2017/proceedings/supplements/250.pdf},
code = {https://github.com/oxmlcs/kerpy},
year = {2017}
}

19. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, Deep Kernel Machines via the Kernel Reparametrization Trick, in International Conference on Learning Representations (ICLR) - Workshop Track, 2017.

While deep neural networks have achieved state-of-the-art performance on many tasks across varied domains, they still remain black boxes whose inner workings are hard to interpret and understand. In this paper, we develop a novel method for efficiently capturing the behaviour of deep neural networks using kernels. In particular, we construct a hierarchy of increasingly complex kernels that encode individual hidden layers of the network. Furthermore, we discuss how our framework motivates a novel supervised weight initialization method that discovers highly discriminative features already at initialization.

@inproceedings{MitSejTeh2017,
author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
title = {{{Deep Kernel Machines via the Kernel Reparametrization Trick}}},
openreview = {https://openreview.net/forum?id=Bkiqt3Ntg¬eId=Bkiqt3Ntg},
booktitle = {International Conference on Learning Representations (ICLR) - Workshop Track},
year = {2017}
}

20. S. Flaxman, Y. W. Teh, and D. Sejdinovic, Poisson Intensity Estimation with Reproducing Kernels, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2017, PMLR 54:270–279.

Despite the fundamental nature of the inhomogeneous Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches of intensity functions exist, especially in high dimensional settings. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. The modeling challenge is that the usual representer theorem arguments no longer apply due to the form of the inhomogeneous Poisson process likelihood. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets.

@inproceedings{FlaTehSej2017,
author = {Flaxman, Seth and Teh, Yee Whye and Sejdinovic, Dino},
title = {{{Poisson Intensity Estimation with Reproducing Kernels}}},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
year = {2017},
arxiv = {https://arxiv.org/abs/1610.08623},
url = {http://proceedings.mlr.press/v54/flaxman17a.html},
code = {https://bitbucket.org/flaxter/kernel-poisson},
pages = {PMLR 54:270--279}
}

21. D. Vukobratovic, D. Jakovetic, V. Skachek, D. Bajovic, D. Sejdinovic, G. Karabulut Kurt, C. Hollanti, and I. Fischer, CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT, IEEE Access, vol. 4, 3360–3378, 2016.

In forthcoming years, the Internet of Things (IoT) will connect billions of smart devices generating and uploading a deluge of data to the cloud. If successfully extracted, the knowledge buried in the data can significantly improve the quality of life and foster economic growth. However, a critical bottleneck for realising the efficient IoT is the pressure it puts on the existing communication infrastructures, requiring transfer of enormous data volumes. Aiming at addressing this problem, we propose a novel architecture dubbed Condense (reconfigurable knowledge acquisition systems), which integrates the IoT-communication infrastructure into data analysis. This is achieved via the generic concept of network function computation: Instead of merely transferring data from the IoT sources to the cloud, the communication infrastructure should actively participate in the data analysis by carefully designed en-route processing. We define the Condense architecture, its basic layers, and the interactions among its constituent modules. Further, from the implementation side, we describe how Condense can be integrated into the 3rd Generation Partnership Project (3GPP) Machine Type Communications (MTC) architecture, as well as the prospects of making it a practically viable technology in a short time frame, relying on Network Function Virtualization (NFV) and Software Defined Networking (SDN). Finally, from the theoretical side, we survey the relevant literature on computing “atomic” functions in both analog and digital domains, as well as on function decomposition over networks, highlighting challenges, insights, and future directions for exploiting these techniques within practical 3GPP MTC architecture.

@article{VukJaketal2016a,
author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D. and Karabulut Kurt, G. and Hollanti, C. and Fischer, I.},
title = {{{CONDENSE: A Reconfigurable Knowledge Acquisition Architecture for Future 5G IoT}}},
year = {2016},
journal = {{IEEE Access}},
volume = {4},
doi = {10.1109/ACCESS.2016.2585468},
url = {http://dx.doi.org/10.1109/ACCESS.2016.2585468},
pages = {3360--3378}
}

22. D. Vukobratovic, D. Jakovetic, V. Skachek, D. Bajovic, and D. Sejdinovic, Network Function Computation as a Service in Future 5G Machine Type Communications, in International Symposium on Turbo Codes & Iterative Information Processing (ISTC), 2016, 365–369.

The 3GPP machine type communications (MTC) service is expected to contribute a dominant share of the IoT traffic via the upcoming fifth generation (5G) mobile cellular systems. MTC has ambition to connect billions of devices to communicate their data to MTC applications for further processing and data analysis. However, for majority of the applications, collecting all the MTC generated data is inefficient as the data is typically fed into application-dependent functions whose outputs determine the application actions. In this paper, we present a novel MTC architecture that, instead of collecting raw large-volume MTC data, offers the network function computation (NFC) as a service. For a given application demand (function to be computed), different modules (atomic nodes) of the communication infrastructure are orchestrated into a (reconfigurable) directed network topology, and each module is assigned an appropriately defined (reconfigurable) atomic function over the input data, such that the desired global network function is evaluated over the MTC data and a requested MTC-NFC service is delivered. We detail practical viability of incorporating MTC-NFC within the existing 3GPP architecture relying on emerging concepts of Network Function Virtualization and Software Defined Networking. Finally, throughout the paper, we point to the theoretical foundations that inspired the presented architecture highlighting challenges and future directions for designing 3GPP MTC-NFC service.

@inproceedings{VukJaketal2016b,
author = {Vukobratovic, D. and Jakovetic, D. and Skachek, V. and Bajovic, D. and Sejdinovic, D.},
title = {{{Network Function Computation as a Service in Future 5G Machine Type Communications}}},
booktitle = {International Symposium on Turbo Codes \& Iterative Information Processing (ISTC)},
year = {2016},
pages = {365-369},
doi = {10.1109/ISTC.2016.7593138},
url = {http://dx.doi.org/10.1109/ISTC.2016.7593138}
}

23. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression, in International Conference on Machine Learning (ICML), 2016, PMLR 48:1482–1491.

Performing exact posterior inference in complex generative models is often difficult or impossible due to an expensive to evaluate or intractable likelihood function. Approximate Bayesian computation (ABC) is an inference framework that constructs an approximation to the true likelihood based on the similarity between the observed and simulated data as measured by a predefined set of summary statistics. Although the choice of appropriate problem-specific summary statistics crucially influences the quality of the likelihood approximation and hence also the quality of the posterior sample in ABC, there are only few principled general-purpose approaches to the selection or construction of such summary statistics. In this paper, we develop a novel framework for this task using kernel-based distribution regression. We model the functional relationship between data distributions and the optimal choice (with respect to a loss function) of summary statistics using kernel-based distribution regression. We show that our approach can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. In addition to that, our framework shows superior performance when compared to related methods on toy and real-world problems.

@inproceedings{MitSejTeh2016,
author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
title = {{{DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression}}},
booktitle = {International Conference on Machine Learning (ICML)},
arxiv = {http://arxiv.org/abs/1602.04805},
url = {http://proceedings.mlr.press/v48/mitrovic16.html},
year = {2016},
pages = {PMLR 48:1482--1491}
}

24. G. Franchi, J. Angulo, and D. Sejdinovic, Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings, in IEEE International Conference on Image Processing (ICIP), 2016, 1898–1902.

We propose a novel approach for pixel classification in hyperspectral images, leveraging on both the spatial and spectral information in the data. The introduced method relies on a recently proposed framework for learning on distributions – by representing them with mean elements in reproducing kernel Hilbert spaces (RKHS) and formulating a classification algorithm therein. In particular, we associate each pixel to an empirical distribution of its neighbouring pixels, a judicious representation of which in an RKHS, in conjunction with the spectral information contained in the pixel itself, give a new explicit set of features that can be fed into a suite of standard classification techniques – we opt for a well established framework of support vector machines (SVM). Furthermore, the computational complexity is reduced via random Fourier features formalism. We study the consistency and the convergence rates of the proposed method and the experiments demonstrate strong performance on hyperspectral data with gains in comparison to the state-of-the-art results.

@inproceedings{FraAngSej2016,
author = {Franchi, G. and Angulo, J. and Sejdinovic, D.},
title = {{{Hyperspectral Image Classification with Support Vector Machines on Kernel Distribution Embeddings}}},
booktitle = {IEEE International Conference on Image Processing (ICIP)},
year = {2016},
arxiv = {http://arxiv.org/abs/1605.09136},
url = {http://dx.doi.org/10.1109/ICIP.2016.7532688},
doi = {10.1109/ICIP.2016.7532688},
pages = {1898-1902}
}

25. B. Paige, D. Sejdinovic, and F. Wood, Super-Sampling with a Reservoir, in Uncertainty in Artificial Intelligence (UAI), 2016, 567–576.

We introduce an alternative to reservoir sampling, a classic and popular algorithm for drawing a fixed-size subsample from streaming data in a single pass. Rather than draw a random sample, our approach performs an online optimization which aims to select the subset which provides the best overall approximation to the full data set, as judged using a kernel two-sample test. This produces subsets which minimize the worst-case relative error when computing expectations of functions in a specified function class, using just the samples from the subset. Kernel functions are approximated using random Fourier features, and the subset of samples itself is stored in a random projection tree, allowing for an algorithm which runs in a single pass through the whole data set, with only a logarithmic time complexity in the size of the subset at each iteration. These “super-samples” subsampled from the full data provide a concise summary, as demonstrated empirically on mixture models and the MNIST dataset.

@inproceedings{PaiSejWoo2016,
author = {Paige, B. and Sejdinovic, D. and Wood, F.},
title = {{{Super-Sampling with a Reservoir}}},
booktitle = {Uncertainty in Artificial Intelligence (UAI)},
year = {2016},
pages = {567--576},
url = {http://www.auai.org/uai2016/proceedings/papers/293.pdf}
}

26. S. Flaxman, D. Sejdinovic, J. P. Cunningham, and S. Filippi, Bayesian Learning of Kernel Embeddings, in Uncertainty in Artificial Intelligence (UAI), 2016, 182–191.

Kernel methods are one of the mainstays of machine learning, but the problem of kernel learning remains challenging, with only a few heuristics and very little theory. This is of particular importance in methods based on estimation of kernel mean embeddings of probability measures. For characteristic kernels, which include most commonly used ones, the kernel mean embedding uniquely determines its probability measure, so it can be used to design a powerful statistical testing framework, which includes nonparametric two-sample and independence tests. In practice, however, the performance of these tests can be very sensitive to the choice of kernel and its lengthscale parameters. To address this central issue, we propose a new probabilistic model for kernel mean embeddings, the Bayesian Kernel Embedding model, combining a Gaussian process prior over the Reproducing Kernel Hilbert Space containing the mean embedding with a conjugate likelihood function, thus yielding a closed form posterior over the mean embedding. The posterior mean of our model is closely related to recently proposed shrinkage estimators for kernel mean embeddings, while the posterior uncertainty is a new, interesting feature with various possible applications. Critically for the purposes of kernel learning, our model gives a simple, closed form marginal pseudolikelihood of the observed data given the kernel hyperparameters. This marginal pseudolikelihood can either be optimized to inform the hyperparameter choice or fully Bayesian inference can be used.

@inproceedings{FlaSejCunFil2016,
author = {Flaxman, S. and Sejdinovic, D. and Cunningham, J.P. and Filippi, S.},
title = {{{Bayesian Learning of Kernel Embeddings}}},
booktitle = {Uncertainty in Artificial Intelligence (UAI)},
arxiv = {http://arxiv.org/abs/1603.02160},
year = {2016},
pages = {182--191},
url = {http://www.auai.org/uai2016/proceedings/papers/145.pdf},
supplements = {http://www.auai.org/uai2016/proceedings/supp/145_supp.pdf}
}

27. M. Park, W. Jitkrittum, and D. Sejdinovic, K2-ABC: Approximate Bayesian Computation with Kernel Embeddings, in International Conference on Artificial Intelligence and Statistics (AISTATS), 2016, PMLR 51:398–407.

Complicated generative models often result in a situation where computing the likelihood of observed data is intractable, while simulating from the conditional density given a parameter value is relatively easy. Approximate Bayesian Computation (ABC) is a paradigm that enables simulation-based posterior inference in such cases by measuring the similarity between simulated and observed data in terms of a chosen set of summary statistics. However, there is no general rule to construct sufficient summary statistics for complex models. Insufficient summary statistics will “leak” information, which leads to ABC algorithms yielding samples from an incorrect (partial) posterior. In this paper, we propose a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics. Our approach, K2-ABC, uses maximum mean discrepancy (MMD) to construct a dissimilarity measure between the observed and simulated data. The embedding of an empirical distribution of the data into a reproducing kernel Hilbert space plays a role of the summary statistic and is sufficient whenever the corresponding kernels are characteristic. Experiments on a simulated scenario and a real-world biological problem illustrate the effectiveness of the proposed algorithm.

@inproceedings{ParJitSej2016,
author = {Park, M. and Jitkrittum, W. and Sejdinovic, D.},
title = {{{K2-ABC: Approximate Bayesian Computation with Kernel Embeddings}}},
booktitle = {International Conference on Artificial Intelligence and Statistics (AISTATS)},
arxiv = {http://arxiv.org/abs/1502.02558},
url = {http://proceedings.mlr.press/v51/park16.html},
year = {2016},
pages = {PMLR 51:398--407},
code = {https://github.com/wittawatj/k2abc}
}

28. H. Strathmann, D. Sejdinovic, S. Livingstone, Z. Szabo, and A. Gretton, Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families, in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015, 955–963.

We propose Kamiltonian Monte Carlo (KMC), a gradient-free adaptive MCMC algorithm based on Hamiltonian Monte Carlo (HMC). On target densities where HMC is unavailable due to intractable gradients, KMC adaptively learns the target’s gradient structure by fitting an exponential family model in a Reproducing Kernel Hilbert Space. Computational costs are reduced by two novel efficient approximations to this gradient. While being asymptotically exact, KMC mimics HMC in terms of sampling efficiency and offers substantial mixing improvements to state-of-the-art gradient free-samplers. We support our claims with experimental studies on both toy and real-world applications, including Approximate Bayesian Computation and exact-approximate MCMC.

@incollection{StrSejLivSzaGre2015,
author = {Strathmann, H. and Sejdinovic, D. and Livingstone, S. and Szabo, Z. and Gretton, A.},
title = {{{Gradient-free Hamiltonian Monte Carlo with Efficient Kernel Exponential Families}}},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {28},
year = {2015},
pages = {955--963},
arxiv = {http://arxiv.org/abs/1506.02564}
}

29. K. Chwialkowski, A. Ramdas, D. Sejdinovic, and A. Gretton, Fast Two-Sample Testing with Analytic Representations of Probability Measures, in Advances in Neural Information Processing Systems (NeurIPS), vol. 28, 2015, 1981–1989.

We propose a class of nonparametric two-sample tests with a cost linear in the sample size. Two tests are given, both based on an ensemble of distances between analytic functions representing each of the distributions. The first test uses smoothed empirical characteristic functions to represent the distributions, the second uses distribution embeddings in a reproducing kernel Hilbert space. Analyticity implies that differences in the distributions may be detected almost surely at a finite number of randomly chosen locations/frequencies. The new tests are consistent against a larger class of alternatives than the previous linear-time tests based on the (non-smoothed) empirical characteristic functions, while being much faster than the current state-of-the-art quadratic-time kernel-based or energy distance-based tests. Experiments on artificial benchmarks and on challenging real-world testing problems demonstrate that our tests give a better power/time tradeoff than competing approaches, and in some cases, better outright power than even the most expensive quadratic-time tests. This performance advantage is retained even in high dimensions, and in cases where the difference in distributions is not observable with low order statistics.

@incollection{ChwRamSejGre2015,
author = {Chwialkowski, K. and Ramdas, A. and Sejdinovic, D. and Gretton, A.},
title = {{{Fast Two-Sample Testing with Analytic Representations of Probability Measures}}},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {28},
year = {2015},
pages = {1981--1989},
url = {http://papers.nips.cc/paper/5685-fast-two-sample-testing-with-analytic-representations-of-probability-measures},
arxiv = {http://arxiv.org/abs/1506.04725}
}

30. D. Vukobratovic, D. Sejdinovic, and A. Pizurica, Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective, in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2015.

Compressed Sensing (CS) methods using sparse binary measurement matrices and iterative message-passing recovery procedures have been recently investigated due to their low computational complexity and excellent performance. Drawing much of inspiration from sparse-graph codes such as Low-Density Parity-Check (LDPC) codes, these studies use analytical tools from modern coding theory to analyze CS solutions. In this paper, we consider and systematically analyze the CS setup inspired by a class of efficient, popular and flexible sparse-graph codes called rateless codes. The proposed rateless CS setup is asymptotically analyzed using tools such as Density Evolution and EXIT charts and fine-tuned using degree distribution optimization techniques.

@inproceedings{VukSejPiz2015,
author = {Vukobratovic, Dejan and Sejdinovic, Dino and Pizurica, Aleksandra},
title = {{{Compressed Sensing Using Sparse Binary Measurements: A Rateless Coding Perspective}}},
booktitle = {IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)},
year = {2015},
doi = {10.1109/SPAWC.2015.7227005},
url = {http://dx.doi.org/10.1109/SPAWC.2015.7227005}
}

31. Z. Kurth-Nelson, G. Barnes, D. Sejdinovic, R. Dolan, and P. Dayan, Temporal Structure in Associative Retrieval, eLife, vol. 4, no. e04919, 2015.

Electrophysiological data disclose rich dynamics in patterns of neural activity evoked by sensory objects. Retrieving objects from memory reinstates components of this activity. In humans, the temporal structure of this retrieved activity remains largely unexplored, and here we address this gap using the spatiotemporal precision of magnetoencephalography (MEG). In a sensory preconditioning paradigm, ’indirect’ objects were paired with ’direct’ objects to form associative links, and the latter were then paired with rewards. Using multivariate analysis methods we examined the short-time evolution of neural representations of indirect objects retrieved during reward-learning about direct objects. We found two components of the evoked representation of the indirect stimulus, 200 ms apart. The strength of retrieval of one, but not the other, representational component correlated with generalization of reward learning from direct to indirect stimuli. We suggest the temporal structure within retrieved neural representations may be key to their function.

@article{KNeBarSejDolDay2015,
author = {Kurth-Nelson, Zeb and Barnes, Gareth and Sejdinovic, Dino and Dolan, Ray and Dayan, Peter},
title = {{{Temporal Structure in Associative Retrieval}}},
volume = {4},
number = {e04919},
year = {2015},
doi = {10.7554/eLife.04919},
publisher = {eLife Sciences Publications Limited},
journal = {{eL}ife},
url = {http://dx.doi.org/10.7554/eLife.04919}
}

32. W. Jitkrittum, A. Gretton, N. Heess, S. M. A. Eslami, B. Lakshminarayanan, D. Sejdinovic, and Z. Szabó, Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages, in Uncertainty in Artificial Intelligence (UAI), 2015.

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where the ability to accurately assess uncertainty and to efficiently and robustly update the message operator are essential.

@inproceedings{JitGreHeeEslLakSejSza2015,
author = {Jitkrittum, Wittawat and Gretton, Arthur and Heess, Nicolas and Eslami, S. M. Ali and Lakshminarayanan, Balaji and Sejdinovic, Dino and Szab\'{o}, Zolt\'{a}n},
booktitle = {Uncertainty in Artificial Intelligence (UAI)},
title = {{{Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages}}},
arxiv = {http://arxiv.org/abs/1503.02551},
year = {2015},
code = {https://github.com/wittawatj/kernel-ep},
url = {http://auai.org/uai2015/proceedings/papers/235.pdf},
supplements = {http://auai.org/uai2015/proceedings/supp/239_supp.pdf}
}

33. K. Chwialkowski, D. Sejdinovic, and A. Gretton, A Wild Bootstrap for Degenerate Kernel Tests, in Advances in Neural Information Processing Systems (NeurIPS), vol. 27, 2014, 3608–3616.

A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. To illustrate this approach, we construct a two-sample test, an instantaneous independence test and a multiple lag independence test for time series. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler.

@incollection{ChwSejGre2014,
title = {{{A Wild Bootstrap for Degenerate Kernel Tests}}},
author = {Chwialkowski, Kacper and Sejdinovic, Dino and Gretton, Arthur},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {27},
pages = {3608--3616},
year = {2014},
url = {http://papers.nips.cc/paper/5452-a-wild-bootstrap-for-degenerate-kernel-tests.pdf},
code = {https://github.com/kacperChwialkowski/wildBootstrap},
video = {http://research.microsoft.com/apps/video/?id=240378}
}

34. D. Sejdinovic, H. Strathmann, M. G. Lomeli, C. Andrieu, and A. Gretton, Kernel Adaptive Metropolis-Hastings, in International Conference on Machine Learning (ICML), 2014, 1665–1673.

A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.

@inproceedings{SejStrGarAndGre14,
author = {Sejdinovic, D. and Strathmann, H. and Lomeli, M.G. and Andrieu, C. and Gretton, A.},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2014},
pages = {1665--1673},
code = {https://github.com/karlnapf/kameleon-mcmc},
arxiv = {http://arxiv.org/abs/1307.5302},
url = {http://jmlr.org/proceedings/papers/v32/sejdinovic14.pdf},
supplements = {http://jmlr.org/proceedings/papers/v32/sejdinovic14-supp.zip}
}

35. O. Johnson, D. Sejdinovic, J. Cruise, R. Piechocki, and A. Ganesh, Non-Parametric Change-Point Estimation using String Matching Algorithms, Methodology and Computing in Applied Probability, vol. 16, no. 4, 987–1008, 2014.

Given the output of a data source taking values in a finite alphabet, we wish to estimate change-points, that is times when the statistical properties of the source change. Motivated by ideas of match lengths in information theory, we introduce a novel non-parametric estimator which we call CRECHE (CRossings Enumeration CHange Estimator). We present simulation evidence that this estimator performs well, both for simulated sources and for real data formed by concatenating text sources. For example, we show that we can accurately estimate the point at which a source changes from a Markov chain to an IID source with the same stationary distribution. Our estimator requires no assumptions about the form of the source distribution, and avoids the need to estimate its probabilities. Further, establishing a fluid limit and using martingale arguments.

@article{JohSejCruPieGan2014,
title = {{{Non-Parametric Change-Point Estimation using String Matching Algorithms}}},
year = {2014},
issn = {1387-5841},
journal = {Methodology and Computing in Applied Probability},
volume = {16},
number = {4},
doi = {10.1007/s11009-013-9359-2},
url = {http://dx.doi.org/10.1007/s11009-013-9359-2},
publisher = {Springer US},
author = {Johnson, Oliver and Sejdinovic, Dino and Cruise, James and Piechocki, Robert and Ganesh, Ayalvadi},
pages = {987-1008}
}

36. D. Sejdinovic, A. Gretton, and W. Bergsma, A Kernel Test for Three-Variable Interactions, in Advances in Neural Information Processing Systems (NeurIPS), vol. 26, 2013, 1124–1132.

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful three-variable interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.

@incollection{SejGreBer2013,
title = {{{A Kernel Test for Three-Variable Interactions}}},
author = {Sejdinovic, Dino and Gretton, Arthur and Bergsma, Wicher},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {26},
pages = {1124--1132},
year = {2013},
url = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions.pdf},
supplements = {http://papers.nips.cc/paper/4893-a-kernel-test-for-three-variable-interactions-supplemental.zip},
code = {http://www.gatsby.ucl.ac.uk/%7Egretton/interact/threeWayInteract.htm},
arxiv = {http://arxiv.org/abs/1306.2281},
video = {http://research.microsoft.com/apps/video/default.aspx?id=206943}
}

37. D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu, Equivalence of distance-based and RKHS-based statistics in hypothesis testing, Annals of Statistics, vol. 41, no. 5, 2263–2291, Oct. 2013.

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

@article{SejSriGreFuk2013,
author = {Sejdinovic, Dino and Sriperumbudur, Bharath and Gretton, Arthur and Fukumizu, Kenji},
doi = {10.1214/13-AOS1140},
journal = {Annals of Statistics},
month = oct,
volume = {41},
number = {5},
pages = {2263--2291},
title = {{{Equivalence of distance-based and RKHS-based statistics in hypothesis testing}}},
url = {http://dx.doi.org/10.1214/13-AOS1140},
year = {2013},
arxiv = {http://arxiv.org/abs/1207.6076}
}

38. A. Gretton, B. K. Sriperumbudur, D. Sejdinovic, H. Strathmann, S. Balakrishnan, M. Pontil, and K. Fukumizu, Optimal Kernel Choice for Large-Scale Two-Sample Tests, in Advances in Neural Information Processing Systems (NeurIPS), vol. 25, 2012, 1205–1213.

Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p=q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is thus critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.

@incollection{GreSriSejStrBalPonFuk2012,
title = {{{Optimal Kernel Choice for Large-Scale Two-Sample Tests}}},
author = {Gretton, Arthur and Sriperumbudur, Bharath K. and Sejdinovic, Dino and Strathmann, Heiko and Balakrishnan, Sivaraman and Pontil, Massimiliano and Fukumizu, Kenji},
booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
volume = {25},
pages = {1205--1213},
year = {2012},
url = {http://papers.nips.cc/paper/4727-optimal-kernel-choice-for-large-scale-two-sample-tests.pdf},
}

39. D. Sejdinovic, A. Gretton, B. K. Sriperumbudur, and K. Fukumizu, Hypothesis Testing Using Pairwise Distances and Associated Kernels, in International Conference on Machine Learning (ICML), 2012, 1111–1118.

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. The equivalence holds when energy distances are computed with semimetrics of negative type, in which case a kernel may be defined such that the RKHS distance between distributions corresponds exactly to the energy distance. We determine the class of probability distributions for which kernels induced by semimetrics are characteristic (that is, for which embeddings of the distributions to an RKHS are injective). Finally, we investigate the performance of this family of kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

@inproceedings{SejGreSriFuk12,
title = {{{Hypothesis Testing Using Pairwise Distances and Associated Kernels}}},
author = {Sejdinovic, D. and Gretton, Arthur and Sriperumbudur, Bharath K. and Fukumizu, Kenji},
booktitle = {International Conference on Machine Learning (ICML)},
year = {2012},
pages = {1111--1118},
url = {http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2012Sejdinovic_575.pdf},
arxiv = {http://arxiv.org/abs/1205.0411},
video = {http://techtalks.tv/talks/57320/}
}

40. R. Piechocki and D. Sejdinovic, Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks, in IEEE International Conference on Communications (ICC), 2012.

In this paper we introduce a novel modulation and multiplexing method which facilitates highly efficient and simultaneous communication between multiple terminals in wireless ad-hoc networks. We term this method Combinatorial Channel Signature Modulation (CCSM). The CCSM method is particularly efficient in situations where communicating nodes operate in highly time dispersive environments. This is all achieved with a minimal MAC layer overhead, since all users are allowed to transmit and receive at the same time/frequency (full simultaneous duplex). The CCSM method has its roots in sparse modelling and the receiver is based on compressive sampling techniques. Towards this end, we develop a new low complexity algorithm termed Group Subspace Pursuit. Our analysis suggests that CCSM at least doubles the throughput when compared to the state-of-the art.

@inproceedings{PieSej2012,
title = {{Combinatorial Channel Signature Modulation for Wireless ad-hoc Networks}},
author = {Piechocki, R. and Sejdinovic, D.},
booktitle = {IEEE International Conference on Communications (ICC)},
year = {2012},
doi = {10.1109/ICC.2012.6363956},
url = {http://dx.doi.org/10.1109/ICC.2012.6363956},
arxiv = {http://arxiv.org/abs/1201.5608},
file = {pdf/2012ICC.pdf}
}

41. A. Muller, D. Sejdinovic, and R. Piechocki, Approximate Message Passing under Finite Alphabet Constraints, in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2012.

In this paper we consider Basis Pursuit De-Noising (BPDN) problems in which the sparse original signal is drawn from a finite alphabet. To solve this problem we propose an iterative message passing algorithm, which capitalises not only on the sparsity but by means of a prior distribution also on the discrete nature of the original signal. In our numerical experiments we test this algorithm in combination with a Rademacher measurement matrix and a measurement matrix derived from the random demodulator, which enables compressive sampling of analogue signals. Our results show in both cases significant performance gains over a linear programming based approach to the considered BPDN problem. We also compare the proposed algorithm to a similar message passing based algorithm without prior knowledge and observe an even larger performance improvement.

@inproceedings{MulSejPie2012,
title = {{{Approximate Message Passing under Finite Alphabet Constraints}}},
author = {Muller, A. and Sejdinovic, D. and Piechocki, R.},
booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
year = {2012},
doi = {10.1109/ICASSP.2012.6288590},
url = {http://dx.doi.org/10.1109/ICASSP.2012.6288590},
arxiv = {http://arxiv.org/abs/1201.4949},
file = {pdf/2012ICASSP.pdf}
}

42. W. Dai, D. Sejdinovic, and O. Milenkovic, Gaussian Dynamic Compressive Sensing, in International Conference on Sampling Theory and Applications (SampTA), 2011.

We consider the problem of estimating a discrete-time sequence of sparse signals with Gaussian innovations. Instances of such problems arise in networking and imaging, in particular, dynamic and interventional MRI imaging. Our approach combines Kalman filtering and compressive sensing (CS) techniques by introducing a sparse MAP estimator for Gaussian signals, and then developing a CS-type algorithm for solving the sparse MAP problem. Despite the underlying assumption that the sequence of sparse signals is Gaussian, our approach also allows for efficient tracking of sparse non-Gaussian signals obtained via non-linear mappings, using only one sample/observation per time instance.

@inproceedings{DaiSejMil2011,
title = {{{Gaussian Dynamic Compressive Sensing}}},
author = {Dai, W. and Sejdinovic, D. and Milenkovic, O.},
booktitle = {International Conference on Sampling Theory and Applications (SampTA)},
year = {2011},
}

43. D. Sejdinovic and O. Johnson, Note on Noisy Group Testing: Asymptotic Bounds and Belief Propagation Reconstruction, in 48th Annual Allerton Conference on Communication, Control, and Computing, 2010, 998–1003.

An information theoretic perspective on group testing problems has recently been proposed by Atia and Saligrama, in order to characterise the optimal number of tests. Their results hold in the noiseless case, where only false positives occur, and where only false negatives occur. We extend their results to a model containing both false positives and false negatives, developing simple information theoretic bounds on the number of tests required. Based on these bounds, we obtain an improved order of convergence in the case of false negatives only. Since these results are based on (computationally infeasible) joint typicality decoding, we propose a belief propagation algorithm for the detection of defective items and compare its actual performance to the theoretical bounds.

@inproceedings{SejJoh2010,
title = {{{Note on Noisy Group Testing: Asymptotic Bounds and Belief Propagation Reconstruction}}},
author = {Sejdinovic, D. and Johnson, O.},
booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing},
pages = {998--1003},
year = {2010},
doi = {10.1109/ALLERTON.2010.5707018},
url = {http://dx.doi.org/10.1109/ALLERTON.2010.5707018},
arxiv = {http://arxiv.org/abs/1010.2441}
}

44. D. Sejdinovic, R. Piechocki, A. Doufexi, and M. Ismail, Decentralised Distributed Fountain Coding: Asymptotic Analysis and Design, IEEE Communications Letters, vol. 14, no. 1, 42–44, 2010.

A class of generic decentralised distributed fountain coding schemes is introduced and the tools of analysis of the performance of such schemes are presented. It is demonstrated that the developed approach can be used to formulate a robust code design methodology in a number of instances. We show that two non-standard applications of fountain codes, fountain codes for distributed source coding and fountain codes for unequal error protection lie within this decentralised distributed fountain coding framework.

@article{SejPieDouIsm2010,
title = {{{Decentralised Distributed Fountain Coding: Asymptotic Analysis and Design}}},
author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.},
journal = {IEEE Communications Letters},
volume = {14},
number = {1},
pages = {42--44},
year = {2010},
file = {pdf/2010CommLetter.pdf},
doi = {10.1109/LCOMM.2010.01.091541},
url = {http://dx.doi.org/10.1109/LCOMM.2010.01.091541}
}

45. D. Sejdinovic, C. Andrieu, and R. Piechocki, Bayesian Sequential Compressed Sensing in Sparse Dynamical Systems, in 48th Annual Allerton Conference on Communication, Control, and Computing, 2010, 1730–1736.

While the theory of compressed sensing provides means to reliably and efficiently acquire a sparse high-dimensional signal from a small number of its linear projections, sensing of dynamically changing sparse signals is still not well understood. We pursue a Bayesian approach to the problem of sequential compressed sensing and develop methods to recursively estimate the full posterior distribution of the signal.

@inproceedings{SejAndPie2010,
title = {{{Bayesian Sequential Compressed Sensing in Sparse Dynamical Systems}}},
author = {Sejdinovic, D. and Andrieu, C. and Piechocki, R.},
booktitle = {48th Annual Allerton Conference on Communication, Control, and Computing},
pages = {1730--1736},
year = {2010},
doi = {10.1109/ALLERTON.2010.5707125},
url = {http://dx.doi.org/10.1109/ALLERTON.2010.5707125}
}

46. D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. Piechocki, Expanding Window Fountain Codes for Unequal Error Protection, IEEE Transactions on Communications, vol. 57, no. 9, 2510–2516, 2009.

A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.

@article{SejVukDouSenPie2009,
title = {{{Expanding Window Fountain Codes for Unequal Error Protection}}},
author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.},
journal = {IEEE Transactions on Communications},
volume = {57},
number = {9},
pages = {2510--2516},
year = {2009},
doi = {10.1109/TCOMM.2009.09.070616},
url = {http://dx.doi.org/10.1109/TCOMM.2009.09.070616},
file = {pdf/2009TransComm.pdf}
}

47. D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong, Scalable Video Multicast Using Expanding Window Fountain Codes, IEEE Transactions on Multimedia, vol. 11, no. 6, 1094–1104, 2009.

Fountain codes were introduced as an efficient and universal forward error correction (FEC) solution for data multicast over lossy packet networks. They have recently been proposed for large scale multimedia content delivery in practical multimedia distribution systems. However, standard fountain codes, such as LT or Raptor codes, are not designed to meet unequal error protection (UEP) requirements typical in real-time scalable video multicast applications. In this paper, we propose recently introduced UEP expanding window fountain (EWF) codes as a flexible and efficient solution for real-time scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be tuned at the server side to meet the different reception criteria of heterogeneous receivers. The performance analysis using both analytical results and simulation experiments of H.264 scalable video coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.

@article{VukStaSejStaXio2009,
title = {{{Scalable Video Multicast Using Expanding Window Fountain Codes}}},
author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
journal = {IEEE Transactions on Multimedia},
volume = {11},
number = {6},
pages = {1094--1104},
year = {2009},
doi = {10.1109/TMM.2009.2026087},
url = {http://dx.doi.org/10.1109/TMM.2009.2026087},
file = {pdf/2009TransMultimedia.pdf}
}

48. D. Sejdinovic, R. Piechocki, A. Doufexi, and M. Ismail, Fountain Code Design for Data Multicast with Side Information, IEEE Transactions on Wireless Communications, vol. 8, no. 10, 5155–5165, 2009.

Fountain codes are a robust solution for data multicasting to a large number of receivers which experience variable channel conditions and different packet loss rates. However, the standard fountain code design becomes inefficient if all receivers have access to some side information correlated with the source information. We focus our attention on the cases where the correlation of the source and side information can be modelled by a binary erasure channel (BEC) or by a binary input additive white Gaussian noise channel (BIAWGNC). We analyse the performance of fountain codes in data multicasting with side information for these cases, derive bounds on their performance and provide a fast and robust linear programming optimization framework for code parameters. We demonstrate that systematic Raptor code design can be employed as a possible solution to the problem at the cost of higher encoding/decoding complexity, as it reduces the side information scenario to a channel coding problem. However, our results also indicate that a simpler solution, non-systematic LT and Raptor codes, can be designed to perform close to the information theoretic bounds.

@article{SejPieDouIsm2009,
title = {{{Fountain Code Design for Data Multicast with Side Information}}},
author = {Sejdinovic, D. and Piechocki, R. and Doufexi, A. and Ismail, M.},
journal = {IEEE Transactions on Wireless Communications},
volume = {8},
number = {10},
pages = {5155--5165},
year = {2009},
publisher = {IEEE},
doi = {10.1109/TWC.2009.081076},
url = {http://dx.doi.org/10.1109/TWC.2009.081076},
file = {pdf/2009TransWirelessComm.pdf}
}

49. D. Sejdinovic, R. J. Piechocki, and A. Doufexi, AND-OR Tree Analysis of Distributed LT Codes, in IEEE Information Theory Workshop (ITW), 2009, 261–265.

In this contribution, we consider design of distributed LT codes, i.e., independent rateless encodings of multiple sources which communicate to a common relay, where relay is able to combine incoming packets from the sources and forwards them to receivers. We provide density evolution formulae for distributed LT codes, which allow us to formulate distributed LT code design problem and prove the equivalence of performance of distributed LT codes and LT codes with related parameters in the asymptotic regime. Furthermore, we demonstrate that allowing LT coding apparatus at both the sources and the relay may prove advantageous to coding only at the sources and coding only at the relay.

@inproceedings{SejPieDou2009ITW,
title = {{{AND-OR Tree Analysis of Distributed LT Codes}}},
author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.},
booktitle = {IEEE Information Theory Workshop (ITW)},
pages = {261--265},
year = {2009},
doi = {10.1109/ITWNIT.2009.5158583},
url = {http://dx.doi.org/10.1109/ITWNIT.2009.5158583},
file = {pdf/2009ITW.pdf}
}

50. D. Vukobratovic, V. Stankovic, L. Stankovic, and D. Sejdinovic, Precoded EWF Codes for Unequal Error Protection of Scalable Video, in International ICST Mobile Multimedia Communications Conference (MOBIMEDIA), 2009.

Rateless codes are forward error correcting (FEC) codes of linear encoding-decoding complexity and asymptotically capacity-approaching performance over erasure channels with any erasure statistics. They have been recently recognized as a simple and efficient solution for packetized video transmission over networks with packet erasures. However, to adapt the error correcting capabilities of rateless codes to the unequal importance of scalable video, unequal error protection (UEP) rateless codes are proposed as an alternative to standard rateless codes. In this paper, we extend our recent work on UEP rateless codes called Expanding Window Fountain (EWF) codes in order to improve their UEP performance. We investigate the design of precoded EWF codes, where precoding is done using high-rate Low-Density Parity-Check (LDPC) codes, following the similar reasoning applied in the design of Raptor codes. The obtained results are presented in the context of UEP error correcting performance of EWF codes and applied on scalable video coded (SVC) transmission over erasure networks.

@inproceedings{VukStaStaSej2009,
title = {{{Precoded EWF Codes for Unequal Error Protection of Scalable Video}}},
author = {Vukobratovic, D. and Stankovic, V. and Stankovic, L. and Sejdinovic, D.},
booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)},
year = {2009},
url = {http://portal.acm.org/citation.cfm?id=1653559},
doi = {10.4108/ICST.MOBIMEDIA2009.7407},
file = {pdf/2009MobimediaB.pdf}
}

51. D. Sejdinovic, R. J. Piechocki, and A. Doufexi, Rateless Distributed Source Code Design, in International ICST Mobile Multimedia Communications Conference (MOBIMEDIA), 2009.

Over the past decade, rateless codes, i.e., digital fountain codes, have emerged as an efficient and robust solution for reliable data transmission over packet erasure networks and a particularly suitable one for multicasting and broadcasting applications where users may experience variable channel conditions and packet loss rates, such as mobile environments. Luby Transform (LT) and Raptor codes are practical fountain codes with a capacity approaching performance and a low computational cost. In addition to their channel coding applications, the use of fountain codes for various kinds of distributed source compression and distributed joint-source channel coding has been extensively studied lately, and with promising results. However, a systematic treatise of the code design and optimization considerations for such non-standard applications of fountain codes is still absent. In this contribution, we overview the main results concerned with rateless codes for distributed source coding and outline several examples of data dissemination protocols where carefully designed fountain codes can provide strikingly simple, yet robust solutions yielding both distributed source coding and channel coding gains.

@inproceedings{SejPieDou2009,
title = {{{Rateless Distributed Source Code Design}}},
author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A.},
booktitle = {International ICST Mobile Multimedia Communications Conference (MOBIMEDIA)},
year = {2009},
url = {http://portal.acm.org/citation.cfm?id=1653578},
doi = {10.4108/ICST.MOBIMEDIA2009.7455},
file = {pdf/2009MobimediaA.pdf}
}

52. D. Sejdinovic, Topics in Fountain Coding, PhD thesis, University of Bristol, 2009.

The invention of the sparse graph codes, error correction codes with low complexity and rates close to capacity, has had an unrivaled impact on digital communication systems. A recent advance in the sparse graph codes, fountain coding, due to its natural rate adaptivity, is becoming an error correction coding scheme of choice for many multicasting and broadcasting systems. This thesis studies the use of fountain codes for several non-standard coding problems commonly occuring in communications. Generic decentralised distributed fountain coding schemes for networked communications are developed, discussed and analysed, where many non-cooperating source nodes communicate possibly correlated data to a large number of receivers. Several results concerning the generalised asymptotic analysis of the fountain decoder in this decentralised and distributed coding setting are presented. The problem of fountain codes with unequal error protection property is explored, where a novel class of fountain codes, Expanding Window Fountain (EWF) codes, is proposed, analysed and shown to offer competitive performance applicable to scalable video multicasting. Further, asymptotic analysis, code design and optimisation are derived for both symmetric and asymmetric Slepian-Wolf coding with fountain codes. It is shown how one can obtain both channel coding and distributed source coding gains with the same fountain coding scheme, by a judicious choice of the code parameters. The developed methods of asymptotic analysis are extended to the problem of independent fountain encodings at multiple source nodes which communicate to a common relay. It is shown that the re-encoding of the multiple fountain encoded bitstreams at the relay node with another fountain code may reduce the number of required transmissions, and the overall code optimisation methods of such schemes are derived. Finally, dual fountain codes are introduced and equipped with a low complexity quantisation algorithm for a lossy source coding problem dual to binary erasure channel coding.

@phdthesis{Sej2009,
title = {{{Topics in Fountain Coding}}},
author = {Sejdinovic, D.},
year = {2009},
school = {University of Bristol},
file = {pdf/PhD_TopicsInFountainCoding.pdf}
}

53. D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong, Expanding Window Fountain Codes for Scalable Video Multicast, in IEEE International Conference on Multimedia and Expo (ICME), 2008, 77–80.

Digital Fountain (DF) codes have recently been suggested as an efficient forward error correction (FEC) solution for video multicast to heterogeneous receiver classes over lossy packet networks. However, to adapt DF codes to low-delay constraints and varying importance of scalable multimedia content, unequal error protection (UEP) DF schemes are needed. Thus, in this paper, Expanding Window Fountain (EWF) codes are proposed as a FEC solution for scalable video multicast. We demonstrate that the design flexibility and UEP performance make EWF codes ideally suited for this scenario, i.e., EWF codes offer a number of design parameters to be ldquotunedrdquo at the server side to meet the different reception conditions of heterogeneous receivers. Performance analysis of H.264 Scalable Video Coding (SVC) multicast to heterogeneous receiver classes confirms the flexibility and efficiency of the proposed EWF-based FEC solution.

@inproceedings{VukStaSejStaXio2008,
title = {{{Expanding Window Fountain Codes for Scalable Video Multicast}}},
author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
booktitle = {IEEE International Conference on Multimedia and Expo (ICME)},
pages = {77--80},
year = {2008},
doi = {10.1109/ICME.2008.4607375},
url = {http://dx.doi.org/10.1109/ICME.2008.4607375}
}

54. D. Sejdinovic, R. J. Piechocki, A. Doufexi, and M. Ismail, Fountain Coding with Decoder Side Information, in IEEE International Conference on Communications (ICC), 2008, 4477–4482.

In this contribution, we consider the application of digital fountain (DF) codes to the problem of data transmission when side information is available at the decoder. The side information is modelled as a "virtual" channel output when original information sequence is the input. For two cases of the system model, which model both the virtual and the actual transmission channel either as a binary erasure channel or as a binary input additive white Gaussian noise (BIAWGN) channel, we propose methods of enhancing the design of standard non-systematic DF codes by optimizing their output degree distribution based on the side information assumption. In addition, a systematic Raptor design has been employed as a possible solution to the problem.

@inproceedings{SejPieDouIsm2008ICC,
title = {{{Fountain Coding with Decoder Side Information}}},
author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.},
booktitle = {IEEE International Conference on Communications (ICC)},
pages = {4477--4482},
year = {2008},
doi = {10.1109/ICC.2008.840},
url = {http://dx.doi.org/10.1109/ICC.2008.840}
}

55. D. Sejdinovic, V. Ponnampalam, R. J. Piechocki, and A. Doufexi, The Throughput Analysis of Different IR-HARQ Schemes based on Fountain Codes, in IEEE Wireless Communications and Networking Conference (WCNC), 2008, 267–272.

In this contribution, we construct two novel IR-HARQ (automatic repeat request) schemes based on fountain codes, which combine the punctured and rateless IR-HARQ schemes, in order to attain the advantageous properties of both: nearly optimal performance of the former at the high signal-to-noise ratio (SNR) region and ratelessness of the latter. The preliminary simulation results indicate that these schemes are particularly suitable for scenarios where the transmission is originally assumed to occur at the very high SNR region, but resilience to severe deterioration of channel conditions is required.

@inproceedings{SejPonPieDou2008,
title = {{{The Throughput Analysis of Different IR-HARQ Schemes based on Fountain Codes}}},
author = {Sejdinovic, D. and Ponnampalam, V. and Piechocki, R.J. and Doufexi, A.},
booktitle = {IEEE Wireless Communications and Networking Conference (WCNC)},
pages = {267--272},
year = {2008},
doi = {10.1109/WCNC.2008.52},
url = {http://dx.doi.org/10.1109/WCNC.2008.52},
file = {pdf/2008WCNC.pdf}
}

56. D. Sejdinovic, R. J. Piechocki, A. Doufexi, and M. Ismail, Rate Adaptive Binary Erasure Quantization with Dual Fountain Codes, in IEEE Global Telecommunications Conference (GLOBECOM), 2008.

In this contribution, duals of fountain codes are introduced and their use for lossy source compression is investigated. It is shown both theoretically and experimentally that the source coding dual of the binary erasure channel coding problem, binary erasure quantization, is solved at a nearly optimal rate with application of duals of LT and raptor codes by a belief propagation-like algorithm which amounts to a graph pruning procedure. Furthermore, this quantizing scheme is rate adaptive, i.e., its rate can be modified on-the-fly in order to adapt to the source distribution, very much like LT and raptor codes are able to adapt their rate to the erasure probability of a channel.

@inproceedings{SejPieDouIsm2008,
title = {{{Rate Adaptive Binary Erasure Quantization with Dual Fountain Codes}}},
author = {Sejdinovic, D. and Piechocki, R.J. and Doufexi, A. and Ismail, M.},
booktitle = {IEEE Global Telecommunications Conference (GLOBECOM)},
year = {2008},
doi = {10.1109/GLOCOM.2008.ECP.238},
url = {http://dx.doi.org/10.1109/GLOCOM.2008.ECP.238},
file = {pdf/2008Globecom.pdf}
}

57. D. Vukobratovic, V. Stankovic, D. Sejdinovic, L. Stankovic, and Z. Xiong, Scalable Data Multicast Using Expanding Window Fountain Codes, in 45th Annual Allerton Conference on Communication, Control, and Computing, 2007.

Digital Fountain (DF) codes were introduced as an efficient and universal Forward Error Correction (FEC) solution for data multicast over lossy packet networks. However, in real-time applications, the DF encoder cannot make use of the “rateless” property as it was proposed in the DF framework, due to its delay constraints. In this scenario, many receivers might not be able to collect enough encoded symbols (packets) to perform succesful decoding of the source data block (e.g., they are connected as a low bit-rate receivers to a high bit-rate source stream, or they are affected by severe channel conditions). This paper proposes an application of recently introduced Expanding Window Fountain (EWF) codes as a scalable and efficient solution for real-time multicast data transmission. We show that, by carefully optimizing EWF code design parameters, it is possible to design a flexible DF solution that is capable of satisfying multicast data receivers over a wide range of data rates and/or erasure channel conditions.

@inproceedings{VukStaSejStaXio2007,
title = {{{Scalable Data Multicast Using Expanding Window Fountain Codes}}},
author = {Vukobratovic, D. and Stankovic, V. and Sejdinovic, D. and Stankovic, L. and Xiong, Z.},
booktitle = {45th Annual Allerton Conference on Communication, Control, and Computing},
year = {2007},
file = {pdf/2007Allerton.pdf}
}

58. D. Sejdinovic, D. Vukobratovic, A. Doufexi, V. Senk, and R. Piechocki, Expanding Window Fountain Codes for Unequal Error Protection, in Asilomar Conference on Signals, Systems and Computers, 2007, 1020–1024.

A novel approach to provide unequal error protection (UEP) using rateless codes over erasure channels, named Expanding Window Fountain (EWF) codes, is developed and discussed. EWF codes use a windowing technique rather than a weighted (non-uniform) selection of input symbols to achieve UEP property. The windowing approach introduces additional parameters in the UEP rateless code design, making it more general and flexible than the weighted approach. Furthermore, the windowing approach provides better performance of UEP scheme, which is confirmed both theoretically and experimentally.

@inproceedings{SejVukDouSenPie2007,
title = {{{Expanding Window Fountain Codes for Unequal Error Protection}}},
author = {Sejdinovic, D. and Vukobratovic, D. and Doufexi, A. and Senk, V. and Piechocki, R.},
booktitle = {Asilomar Conference on Signals, Systems and Computers},
doi = {10.1109/ACSSC.2007.4487375},
url = {http://dx.doi.org/10.1109/ACSSC.2007.4487375},
year = {2007},
pages = {1020--1024},
file = {pdf/2007Asilomar.pdf}
}


### Selected Workshop Papers and Abstracts

1. A. Caterini, R. Cornish, D. Sejdinovic, and A. Doucet, Variational Inference with Continuously-Indexed Normalizing Flows, in ICML 2020 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models, 2020.

Continuously-indexed flows (CIFs) have recently achieved improvements over baseline normalizing flows in a variety of density estimation tasks. In this paper, we adapt CIFs to the task of variational inference (VI) through the framework of auxiliary VI, and demonstrate that the advantages of CIFs over baseline flows can also translate to the VI setting for both sampling from posteriors with complicated topology and performing maximum likelihood estimation in latent-variable models.

@inproceedings{CatCorSejDou2020,
author = {Caterini, Anthony and Cornish, Rob and Sejdinovic, Dino and Doucet, Arnaud},
title = {{{Variational Inference with Continuously-Indexed Normalizing Flows}}},
booktitle = {ICML 2020 Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models},
year = {2020},
keywords = {nonarch}
}

2. D. Watson-Parris, S. Sutherland, M. Christensen, A. Caterini, D. Sejdinovic, and P. Stier, A Large-Scale Analysis of Pockets of Open Cells Enabled by Deep Learning, in American Geophysical Union Fall Meeting Abstracts, 2019, A11L–2769.

Pockets of Open Cells (POCs) have been a source of interest since they were first described 15 years ago (Bretherton et al., 2004) due to their complex nature (Glassmeier and Feingold 2017) and the importance of stratocumulus decks on the global climate. Indeed, it has been proposed that, by suppressing precipitation, anthropogenic aerosol could significantly reduce the occurrence of POCs and, through the increased cloud fraction, provide a large cooling affect on the Earth (Rosenfeld et al., 2006). To date however, no large-scale analysis of their properties or spatial and temporal prevalence has been performed. Machine learning is transforming many areas of science by providing new tools to analyse and understand the huge volumes of data that observations and models can provide. Climate science, with its wealth of data, is at the cusp of a similar transformation. One particular area where machine learning has made rapid progress is in object detection. Robust techniques are now available which can quickly identify objects in images without any need for the thresholding or edge detection which have been used in the past and often struggles with inhomogeneous features. Using a deep convolutional neural network trained on a small hand-logged dataset we have created a unique and comprehensive dataset of POCs from across the Californian, Peruvian and Namibian stratocumulus decks, spanning the whole lifetime of the MODIS mission. We have detected and analysed 8,491 POCs, quantifying their spatial and temporal distributions, as well as investigating their microphysical properties. POCs show a large, and remarkably consistent, difference in droplet effective radius compared to the surrounding clouds, but negligible difference in liquid water path. Further, we find that the global radiative effect of POCs, and hence the maximum forcing through any aerosol perturbation, is approximately 20mWm-2. Therefore, the potential for strong anthropogenic perturbations appears small.

@inproceedings{Watetal2019b,
author = {Watson-Parris, Duncan and Sutherland, Sam and Christensen, Matthew and Caterini, Anthony and Sejdinovic, Dino and Stier, Philip},
title = {{{A Large-Scale Analysis of Pockets of Open Cells Enabled by Deep Learning}}},
booktitle = {American Geophysical Union Fall Meeting Abstracts},
year = {2019},
pages = {A11L-2769},
keywords = {nonarch}
}

3. D. Watson-Parris, S. Sutherland, M. Christensen, A. Caterini, D. Sejdinovic, and P. Stier, Detecting Anthropogenic Cloud Perturbations with Deep Learning, in ICML 2019 Workshop on Climate Change: How Can AI Help?, 2019.

One of the most pressing questions in climate science is that of the effect of anthropogenic aerosol on the Earth’s energy balance. Aerosols provide the ‘seeds’ on which cloud droplets form, and changes in the amount of aerosol available to a cloud can change its brightness and other physical properties such as optical thickness and spatial extent. Clouds play a critical role in moderating global temperatures and small perturbations can lead to significant amounts of cooling or warming. Uncertainty in this effect is so large it is not currently known if it is negligible, or provides a large enough cooling to largely negate present-day warming by CO2. This work uses deep convolutional neural networks to look for two particular perturbations in clouds due to anthropogenic aerosol and assess their properties and prevalence, providing valuable insights into their climatic effects.

@inproceedings{Watetal2019a,
author = {Watson-Parris, Duncan and Sutherland, Sam and Christensen, Matthew and Caterini, Anthony and Sejdinovic, Dino and Stier, Philip},
title = {{{Detecting Anthropogenic Cloud Perturbations with Deep Learning}}},
booktitle = {ICML 2019 Workshop on Climate Change: How Can AI Help?},
arxiv = {https://arxiv.org/abs/1911.13061},
year = {2019},
keywords = {nonarch}
}

4. S. Cohen and D. Sejdinovic, On the Gromov-Wasserstein Distance and Coupled Deep Generative Models, in NeurIPS 2019 Workshop on Optimal Transport & Machine Learning, 2019.

Recent advances in optimal transport enabled to reformulate the problem as an adversarial optimization which results in the computation of Wasserstein distances and the training of coupled deep generative models. However, designing a sen- sible cost across potentially incomparable spaces is extremely challenging. The Gromov-Wasserstein approach instead considers the intra-relational geometries of the compared measures which alleviates the incomparability issue. In most previous works, the Gromov cost function is a Euclidean metric measuring the discrepancy between pairwise distances in the different spaces, which is highly sensitive to the scale of the different metric spaces. We thus propose the m-Gromov-Wasserstein distance, which enables the introduction of the Hilbert-Schmidt independence criterion (HSIC) as cost function. We show that this formulation is trivially ex- tendable to the task of learning multiple couplings, relies on dependence instead of distance, and has Gromov-Wasserstein as a special case. We then devise a scalable algorithm for computing this distance based on coupled Wasserstein GANs for general choices of cost function and apply it to learning couplings of multiple deep generative models across incomparable spaces dimensionally and intrinsically. We then show that the classic Gromov-Wasserstein approaches may suffer from sym- metries within individual metric spaces, and we devise a semi-supervised algorithm to break the symmetries.

@inproceedings{CohSej2019,
author = {Cohen, Samuel and Sejdinovic, Dino},
title = {{{On the Gromov-Wasserstein Distance and Coupled Deep Generative Models}}},
booktitle = {NeurIPS 2019 Workshop on Optimal Transport \& Machine Learning},
year = {2019},
keywords = {nonarch}
}

5. V. Nguyen, D. T. Lennon, H. Moon, N. M. van Esbroeck, D. Sejdinovic, M. A. Osborne, G. A. D. Briggs, and N. Ares, Controlling Quantum Dot Devices using Deep Reinforcement Learning, in NeurIPS 2019 Workshop on Deep Reinforcement Learning, 2019.

A robust qubit implementation will form the building block of quantum computer. Such implementation is typically in a quantum physical device by controlling the electrostatic potential of quantum dots. However, controlling these quantum dot devices can be challenging due to unavoidable device variability. In this paper, we develop an elegant application of deep reinforcement learning for controlling quantum dot devices. Specifically, we present a computer-automated algorithm that controls and sets voltages to the gate electrodes of a gate-defined semiconductor double quantum dot. Our approach requires no human intervention and reduces the amount of measurements. This work alleviates the user effort required to control multiple quantum dot devices, each with multiple gate electrodes.

@inproceedings{Nguyenetal2019,
author = {Nguyen, Vu and Lennon, Dominic T. and Moon, Hyungil and van Esbroeck, Nina M. and Sejdinovic, Dino and Osborne, Michael A. and Briggs, G. Andrew D. and Ares, Natalia},
title = {{{Controlling Quantum Dot Devices using Deep Reinforcement Learning}}},
booktitle = {NeurIPS 2019 Workshop on Deep Reinforcement Learning},
year = {2019},
keywords = {nonarch}
}

6. J.-F. Ton, L. Chan, Y. W. Teh, and D. Sejdinovic, Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings, in NeurIPS 2019 Workshop on Meta Learning, 2019.

Current meta-learning approaches focus on learning functional representations of relationships between variables, i.e. estimating conditional expectations in re- gression. In many applications, however, the conditional distributions cannot be meaningfully summarized solely by expectation (due to e.g. multimodality). We in- troduce a novel technique for meta-learning conditional densities, which combines neural representation and noise contrastive estimation together with established literature in conditional mean embeddings into reproducing kernel Hilbert spaces. The method is validated on synthetic and real-world data, demonstrating the utility of sharing learned representations across multiple conditional density estimation tasks.

@inproceedings{Tonetal2019,
author = {Ton, Jean-Francois and Chan, Leung and Teh, Yee Whye and Sejdinovic, Dino},
title = {{{Noise Contrastive Meta-Learning for Conditional Density Estimation using Kernel Mean Embeddings}}},
booktitle = {NeurIPS 2019 Workshop on Meta Learning},
arxiv = {https://arxiv.org/abs/1906.02236},
year = {2019},
keywords = {nonarch}
}

7. H. C. L. Law, P. Zhao, J. Huang, and D. Sejdinovic, Hyperparameter Learning via Distributional Transfer, in NeurIPS 2018 Workshop on Meta Learning, 2018.

Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial ‘exploration’ even in cases where potentially similar prior tasks have been solved. We propose to transfer information across tasks using kernel embeddings of distributions of training datasets used in those tasks. The resulting method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective.

@inproceedings{LawZhaHuaSej2018,
author = {Law, Ho Chung Leon and Zhao, Peilin and Huang, Junzhou and Sejdinovic, Dino},
title = {{{Hyperparameter Learning via Distributional Transfer}}},
booktitle = {NeurIPS 2018 Workshop on Meta Learning},
arxiv = {https://arxiv.org/abs/1810.06305},
year = {2018},
keywords = {nonarch}
}

8. J. Mitrovic, P. Wirnsberger, C. Blundell, D. Sejdinovic, and Y. W. Teh, Infinitely Deep Infinite-Width Networks, in NeurIPS 2018 Workshop on Bayesian Deep Learning, 2018.

Infinite-width neural networks have been extensively used to study the theoretical properties underlying the extraordinary empirical success of standard, finite-width neural networks. Nevertheless, until now, infinite-width networks have been limited to at most two hidden layers. To address this shortcoming, we study the initialisation requirements of these networks and show that the main challenge for constructing them is defining the appropriate sampling distributions for the weights. Based on these observations, we propose a principled approach to weight initialisation that correctly accounts for the functional nature of the hidden layer activations and facilitates the construction of arbitrarily many infinite-width layers, thus enabling the construction of arbitrarily deep infinite-width networks. The main idea of our approach is to iteratively reparametrise the hidden-layer activations into appropriately defined reproducing kernel Hilbert spaces and use the canonical way of constructing probability distributions over these spaces for specifying the required weight distributions in a principled way. Furthermore, we examine the practical implications of this construction for standard, finite-width networks. In particular, we derive a novel weight initialisation scheme for standard, finite-width networks that takes into account the structure of the data and information about the task at hand. We demonstrate the effectiveness of this weight initialisation approach on the MNIST, CIFAR-10 and Year Prediction MSD datasets.

@inproceedings{Mitetal2018,
author = {Mitrovic, Jovana and Wirnsberger, Peter and Blundell, Charles and Sejdinovic, Dino and Teh, Yee Whye},
title = {{{Infinitely Deep Infinite-Width Networks}}},
booktitle = {NeurIPS 2018 Workshop on Bayesian Deep Learning},
year = {2018},
keywords = {nonarch}
}

9. T. G. J. Rudner and D. Sejdinovic, Inter-Domain Deep Gaussian Processes, in NeurIPS 2017 Workshop on Bayesian Deep Learning, 2017.

We propose a novel variational inference method for deep Gaussian processes (GPs), which combines doubly stochastic variational inference with variational Fourier features, an inter-domain approach that replaces inducing points-based in- ference with a framework that harnesses RKHS Fourier features. First experiments have shown that inter-domain deep Gaussian processes are able to achieve levels of predictive performance superior to shallow GPs and alternative deep GP models.

@inproceedings{RudSej2017,
author = {Rudner, Tim G. J. and Sejdinovic, Dino},
title = {{{Inter-Domain Deep Gaussian Processes}}},
booktitle = {NeurIPS 2017 Workshop on Bayesian Deep Learning},
year = {2017},
keywords = {nonarch},
url = {http://bayesiandeeplearning.org/2017/papers/68.pdf}
}

10. H. C. L. Law, D. J. Sutherland, D. Sejdinovic, and S. Flaxman, Bayesian Approaches to Distribution Regression, in NeurIPS 2017 Workshop: Learning on Distributions, Functions, Graphs and Groups, 2017.

Distribution regression has recently attracted much interest as a generic solution to the problem of supervised learning where labels are available at the group level, rather than at the individual level. Current approaches, however, do not propagate the uncertainty in observations due to sampling variability in the groups. This effectively assumes that small and large groups are estimated equally well, and should have equal weight in the final regression. We construct a Bayesian distribution regression formalism that accounts for this uncertainty, improving the robustness and performance of the model when group sizes vary. We can obtain MAP estimates for some models with backpropagation, while the full propagation of uncertainty requires MCMC-based inference. We demonstrate our approach on an illustrative toy dataset as well as a challenging age prediction problem.

@inproceedings{LawSutSejFla2017,
author = {Law, Ho Chung Leon and Sutherland, Dougal J. and Sejdinovic, Dino and Flaxman, Seth},
title = {{{Bayesian Approaches to Distribution Regression}}},
booktitle = {NeurIPS 2017 Workshop: Learning on Distributions, Functions, Graphs and Groups},
year = {2017},
keywords = {nonarch}
}

11. J. Mitrovic, D. Sejdinovic, and Y. W. Teh, Causal Inference via Kernel Deviance Measures, in NeurIPS 2017 Workshop on Causal Inference and Machine Learning for Intelligent Decision Making: From ’What If?’ To ’What Next?,’ 2017.

Identifying causal relationships among a set of variables is a fundamental problem in many areas of science. In this paper, we present a novel general-purpose causal inference method, Kernel Conditional Deviance for Causal Inference (KCDC), for inferring causal relationships from observational data. In particular, we propose a novel interpretation of the well-established notion of asymmetry between cause and effect. Based on this, we derive an asymmetry measure using the framework of representing conditional distributions in reproducing kernel Hilbert spaces thus providing the basis for causal discovery. We demonstrate the versatility and robustness of our method across several synthetic datasets. Furthermore, we test our method on the real-world benchmark dataset Tübingen Cause-Effect Pairs where it outperforms existing state-of-the-art methods.

@inproceedings{MitSejTeh2017-kcdcworkshop,
author = {Mitrovic, Jovana and Sejdinovic, Dino and Teh, Yee Whye},
title = {{{Causal Inference via Kernel Deviance Measures}}},
booktitle = {NeurIPS 2017 Workshop on Causal Inference and Machine Learning for Intelligent Decision Making: From 'What If?' To 'What Next?'},
year = {2017},
keywords = {nonarch}
}

12. J. Runge, D. Sejdinovic, and S. Flaxman, Overcoming Autocorrelation Biases for Causal Inference in Large Nonlinear Geoscientific Time Series Datasets, in Geophysical Research Abstracts, 2017, vol. 19, EGU2017–11366.

Causal discovery methods for geoscientific time series datasets aim at detecting potentially causal statistical associations that cannot be explained by other variables in the dataset. A large-scale complex system like the Earth presents major challenges for methods such as Granger causality. In particular, its high dimensionality and strong autocorrelations lead to low detection power, distorting biases, and unreliable hypothesis tests. Here we introduce a reliable method that outperforms current approaches in detection power and overcomes detection biases, making it suitable to detect even weak causal signals in large-scale geoscientific datasets. We illustrate the method’s capabilities on the global surface-pressure system where we unravel spurious associations and find several potentially causal links that are difficult to detect with standard methods, focusing in particular on drivers of the NAO.

@inproceedings{RunSejFla2017,
author = {Runge, J. and Sejdinovic, D. and Flaxman, S.},
title = {{{Overcoming Autocorrelation Biases for Causal Inference in Large Nonlinear Geoscientific Time Series Datasets}}},
booktitle = {Geophysical Research Abstracts},
year = {2017},
volume = {19},
pages = {EGU2017--11366},
keywords = {nonarch}
}

13. D. Sejdinovic, Kernel Embeddings and Bayesian Quadrature, in Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481), 2017, vol. 6, no. 11, 157.
@inproceedings{Sej2017a,
author = {Sejdinovic, Dino},
title = {{{Kernel Embeddings and Bayesian Quadrature}}},
booktitle = {Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481)},
editor = {Gretton, Arthur and Hennig, Philipp and Rasmussen, Carl Edward and Sch{\"o}lkopf, Bernhard},
pages = {157},
volume = {6},
number = {11},
year = {2017},
keywords = {nonarch}
}

14. D. Sejdinovic, Connections and Differences between Kernels and GPs, in Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481), 2017, vol. 6, no. 11, 166.
@inproceedings{Sej2017b,
author = {Sejdinovic, Dino},
title = {{{Connections and Differences between Kernels and GPs}}},
booktitle = {Dagstuhl Reports: New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481)},
editor = {Gretton, Arthur and Hennig, Philipp and Rasmussen, Carl Edward and Sch{\"o}lkopf, Bernhard},
pages = {166},
volume = {6},
number = {11},
year = {2017},
keywords = {nonarch}
}

15. D. Sejdinovic, Kernel Hypothesis Tests on Dependent Data, in Dagstuhl Reports: Machine Learning with Interdependent and Non-identically Distributed Data (Dagstuhl Seminar 15152), 2015, vol. 5, no. 4, 50–51.
@inproceedings{Sej2015,
author = {Sejdinovic, Dino},
title = {{{Kernel Hypothesis Tests on Dependent Data}}},
booktitle = {Dagstuhl Reports: Machine Learning with Interdependent and Non-identically Distributed Data (Dagstuhl Seminar 15152)},
editor = {Darrell, Trevor and Kloft, Marius and Pontil, Massimiliano and R{\"a}tsch, Gunnar and Rodner, Erik},
pages = {50-51},
volume = {5},
number = {4},
year = {2015},
keywords = {nonarch}
}
`

Built with jekyll-scholar.