Mihai Cucuringu

Associate Professor

Department of Statistics
Mathematical Institute
University of Oxford

Stipendiary Lecturer
Merton College

Turing Fellow
The Alan Turing Institute


[Homepage]      [Research]     [CV]      [Personal]     

  (25) A. d'Aspremont, M. Cucuringu, H. Tyagi, Ranking and synchronization from pairwise measurements via SVD, (arXiv) (2019)

Given a measurement graph G = (V,E) and an unknown signal r ∈ Rn, we investigate algorithms for recovering r from pairwise measurements of the form ri - rj; {i,j} ∈ E. This problem arises in a variety of applications, such as ranking teams in sports data and time synchronization of distributed networks. Framed in the context of ranking, the task is to recover the ranking of n teams (induced by r) given a small subset of noisy pairwise rank offsets. We propose a simple SVD-based algorithmic pipeline for both the problem of time synchronization and ranking. We provide a detailed theoretical analysis in terms of robustness against both sampling sparsity and noise perturbations with outliers, using results from matrix perturbation and random matrix theory. Our theoretical findings are complemented by a detailed set of numerical experiments on both synthetic and real data, showcasing the competitiveness of our proposed algorithms with other state-of-the-art methods.

Keywords: ranking, group synchronization, spectral algorithms, matrix perturbation theory, singular value decomposition, low-rank matrix completion.

  (24) M. Cucuringu, A. Pizzoferrato, Y. van Gennip, An MBO scheme for clustering and semi-supervised clustering of signed networks, (arXiv) (2019)

We introduce a principled method for the signed clustering problem, where the goal is to partition a graph whose edge weights take both positive and negative values, such that edges within the same cluster are mostly positive, while edges spanning across clusters are mostly negative. Our method relies on a graph-based diffuse interface model formulation utilizing the Ginzburg-Landau functional, based on an adaptation of the classic numerical Merriman-Bence-Osher (MBO) scheme for minimizing such graph-based functionals. The proposed objective function aims to minimize the total weight of inter-cluster positively-weighted edges, while maximizing the total weight of the inter-cluster negatively-weighted edges. Our method scales to large sparse networks, and can be easily adjusted to incorporate labelled data information, as is often the case in the context of semi-supervised learning. We tested our method on a number of both synthetic stochastic block models and real-world data sets (including financial correlation matrices), and obtained promising results that compare favourably against a number of state-of-the-art approaches from the recent literature.

Keywords: MBO, clustering, signed networks, graph Laplacians, spectral methods, time series, correlation clustering

  (23) A. Elliott, M. Cucuringu, M. M. Luaces, P. Reidy, and G. Reinert, Anomaly detection in networks with application to financial transaction networks, (arXiv) (2018)

This paper is motivated by the task of detecting anomalies in networks of financial transactions, with accounts as nodes and a directed weighted edge between two nodes denoting a money transfer. The weight of the edge is the transaction amount. Examples of anomalies in networks include long paths of large transaction amounts, rings of large payments, and cliques of accounts. There are many methods available which detect such specific structures in networks. Here we introduce a method which is able to detect previously unspecified anomalies in networks. The method is based on a combination of features from network comparison and spectral analysis as well as local statistics, yielding 140 main features. We then use a simple feature sum method, as well as a random forest method, in order to classify nodes as normal or anomalous. We test the method first on synthetic networks which we generated, and second on a set of synthetic networks which were generated without the methods team having access to the ground truth. The first set of synthetic networks was split in a training set of 70 percent of the networks, and a test set of 30 percent of the networks. The resulting classifier was then applied to the second set of synthetic networks. We compare our method with Oddball, a widely used method for anomaly detection in networks, as well as to random classification. While Oddball outperforms random classification, both our feature sum method and our random forest method outperform Oddball. On the test set, the random forest outperforms feature sum, whereas on the second synthetic data set, initially feature sum tends to pick up more anomalies than random forest, with this behaviour reversing for lower-scoring anomalies. In all cases, the top 2 percent of flagged anomalies contained on average over 90 percent of the planted anomalies.

Keywords: anomaly detection, spectral methods, networkcomparison, planted subgraph

  (22) M. Cucuringu, P. Davies, A. Glielmo, H. Tyagi, "SPONGE: A generalized eigenproblem for clustering signed networks", AISTATS 2019 (code)

We introduce a principled and theoretically sound spectral method for k-way clustering in signed graphs, where the affinity measure between nodes takes either positive or negative values. Our approach is motivated by social balance theory, where the task of clustering aims to decompose the network into disjoint groups such that individuals within the same group are connected by as many positive edges as possible, while individuals from different groups are connected by as many negative edges as possible. Our algorithm relies on a generalized eigenproblem formulation inspired by recent work on constrained clustering. We provide theoretical guarantees for our approach in the setting of a signed stochastic block model, by leveraging tools from matrix perturbation theory and random matrix theory. An extensive set of numerical experiments on both synthetic and real data shows that our approach compares favorably with state-of-the-art methods for signed clustering, especially for large number of clusters.

Keywords: signed clustering, graph Laplacians, stochastic block models, spectral methods, time series, clustering correlation matrices

  (21) M. Cucuringu, H. Tyagi "Provably robust estimation of modulo 1 samples of a smooth function with applications to phase unwrapping", (arXiv), (2018) accepted in Journal of Machine Learning Research (JMLR)

Consider an unknown smooth function f : [0, 1] → R, and assume we are given n noisy mod 1 samples of f, i.e., yi = (f(xi ) + ηi) mod 1 for xi ∈ [0,1], where ηi denotes noise. Given the samples (xi,yi), i =1,...,n, our goal is to recover smooth, robust estimates of the clean samples f(xi) mod 1. We formulate a natural approach for solving this problem, which works with angular embeddings of the noisy mod 1 samples over the unit circle, inspired by the angular synchronization framework. This amounts to solving a smoothness regularized least-squares problem - a quadratically constrained quadratic program (QCQP) - where the variables are constrained to lie on the unit circle. Our proposed approach is based on solving its relaxation, which is a trust-region sub-problem and hence solvable efficiently. We provide theoretical guarantees demonstrating its robustness to noise for adversarial, as well as random Gaussian and Bernoulli noise models. To the best of our knowledge, these are the first such theoretical results for this problem. We demonstrate the robustness and efficiency of our proposed approach via extensive numerical simulations on synthetic data, along with a simple least-squares based solution for the unwrapping stage, that recovers the original samples of f (up to a global shift). It is shown to perform well at high levels of noise, when taking as input the denoised modulo 1 samples.

Finally, we also consider two other approaches for denoising the modulo 1 samples that leverage tools from Riemannian optimization on manifolds, including a Burer-Monteiro approach for a semidefinite programming relaxation of our formulation. For the two-dimensional version of the problem, which has applications in synthetic aperture radar interferometry (InSAR), we are able to solve instances of real-world data with a million sample points in under 10 seconds, on a personal laptop.

Keywords: quadratically constrained quadratic programming (QCQP), trust-region sub-problem, angular embedding, phase unwrapping, semidefinite programming, angular synchronization.

  (20) A. Tsokos, S. Narayanan, I. Kosmidis, G. Baio, M. Cucuringu, G. Whitaker and F. J. Király, "Modeling outcomes of soccer matches", (arXiv), Machine Learning 2018

Abstract: We compare various extensions of the Bradley-Terry model and a hierarchical Poisson log-linear model in terms of their performance in predicting the outcome of soccer matches (win, draw, or loss). The parameters of the Bradley-Terry extensions are estimated by maximizing the log-likelihood, or an appropriately penalized version of it, while the posterior densities of the parameters of the hierarchical Poisson log-linear model are approximated using integrated nested Laplace approximations. The prediction performance of the various modeling approaches is assessed using a novel, context-specific framework for temporal validation that is found to deliver accurate estimates of the test error. The direct modeling of outcomes via the various Bradley-Terry extensions and the modeling of match scores using the hierarchical Poisson log-linear model demonstrate similar behavior in terms of predictive performance.

Keywords: Bradley-Terry model; Poisson log-linear hierarchical model; Maximum penalized likelihood; Integrated Nested Laplace Approximation; Temporal validation

  (19) M. Cucuringu, H. Tyagi "On denoising modulo 1 samples of a function", (arXiv), AISTATS 2018

Consider an unknown smooth function f : [0, 1] → R, and say we are given n noisy mod 1 samples of f, i.e., yi = (f(xi ) + ηi) mod 1 for xi ∈ [0,1], where ηi denotes noise. Given the samples (xi,yi), i =1,...,n, our goal is to recover smooth, robust estimates of the clean samples f(xi) mod 1. We formulate a natural approach for solving this problem which works with angular embeddings of the noisy mod 1 samples over the unit complex circle, inspired by the angular synchronization framework. Our approach amounts to solving a quadratically constrained quadratic program (QCQP) which is NP-hard in its basic form, and therefore we consider its relaxation which is a trust region sub-problem and hence solvable efficiently. We demonstrate its robustness to noise via extensive numerical simulations on several synthetic examples, along with a detailed theoretical analysis. To the best of our knowledge, we provide the first algorithm for denoising mod 1 samples of a smooth function, which comes with robustness guarantees.

Keywords: Angular embedding, QCQP, phase unwrapping, angular synchronization.

  (18) M. Cucuringu, R. Erban "ADM-CLE approach for detecting slow variables in continuous time Markov chains and dynamic data", SIAM Journal on Scientific Computing, 39(1), B76-B101 (2017)

In this paper we build on the anisotropic diffusion map (ADM) framework for detecting intrinsic slow variables in high-dimensional dynamic data. We develop and analyze a new ADM-Langevin approach, which has the potential of being more efficient than the existing ADM method for a large class of chemical systems. The contribution of our paper is three-fold. First, we are able to improve the computational efficiency of the existing ADM approach and avoid the very computationally intensive step of running local bursts simulations by using an approximation of the chemical Langevin equation. Second, we exploit the spectrum of each local covariance matrix, and locally build a sparse ellipsoid-like neighborhood graph at each point in the data set, as opposed to computing the similarity between all pairs of points in the domain, thus making our approach scalable to networks with thousands or even tens of thousands of nodes. As our third and main contribution, we propose a robust approximation method for the mapping from the observable space to the dynamically meaningful inaccessible space, which allows us to investigate a Markov-Chain based approach to estimating the stationary distribution of the detected slow variable, without any a-priori knowledge of it. Along the way, we derive analytically an expression for the conditional distribution of the fast variable given the slow variable, thus rendering our approach entirely deterministic, as opposed to existing approaches that rely on stochastic simulation algorithms.

Keywords: Anisotropic diffusion maps, random-walk Laplacian, eigenvectors, chemical networks, intrinsic variables, stochastic simulation algorithms, separation of scale, slow variables, stationary distribution, high dimensional data, manifold learning, Langevin equation, complex dynamical systems.

  (17) M. Cucuringu, C. Marshak, D. Montag, P. Rombach "Rank Aggregation for Course Sequence Discovery", Complex Networks (2017)

Abstract: This work extends the rank aggregation framework for the setting of discovering optimal course sequences at the university level, and contributes to the literature on educational applications of network analysis. Each student provides a partial ranking of the courses taken throughout her or his undergraduate career. We build a network of courses by computing pairwise rank comparisons between courses based on the order students typically take them, and aggregate the results over the entire student population, to obtain a proxy for the rank offset between pairs of courses. We extract a global ranking of the courses via several state-of-the art al- gorithms for ranking with pairwise noisy information, including SerialRank, Rank Centrality, and the recent SyncRank based on the group synchronization problem. We test this application of rank aggregation on 15 years of student data from the Department of Mathematics at the University of California, Los Angeles (UCLA). Furthermore, we experiment with the above approach on different subsets of the stu- dent population conditioned on final GPA, and highlight several differences in the obtained rankings that uncover potential hidden pre-requisites in the Mathematics curriculum. Keywords: Temporal rank aggregation, spectral methods, educational data, ranking algorithms.
  (16) M. Cucuringu, "Sync-Rank: Robust Ranking, Constrained Ranking and Rank Aggregation via Eigenvector and SDP Synchronization", IEEE Transactions on Network Science and Engineering, 3 (1): 58-79, (2016)

We consider the classic problem of establishing a statistical ranking of a set of n items given a set of inconsistent and incomplete pairwise comparisons between such items. Instantiations of this problem occur in numerous applications in data analysis (e.g., ranking teams in sports data), computer vision, and machine learning. We formulate the above problem of ranking with incomplete noisy information as an instance of the group synchronization problem over the group SO(2) of planar rotations, whose usefulness has been demonstrated in numerous applications in recent years in computer vision and graphics, sensor network localization and structural biology. Its least squares solution can be approximated by either a spectral or a semidefinite programming (SDP) relaxation, followed by a rounding procedure. We show extensive numerical simulations on both synthetic and real-world data sets (Premier League soccer games, a Halo 2 game tournament and NCAA College Basketball games), which show that our proposed method compares favorably to other ranking methods from the recent literature. Existing theoretical guarantees on the group synchronization problem imply lower bounds on the largest amount of noise permissible in the data while still achieving exact recovery of the ground truth ranking. We propose a similar synchronization-based algorithm for the rank-aggregation problem, which integrates in a globally consistent ranking many pairwise rank-offsets or partial rankings, given by different rating systems on the same set of items, an approach which yields significantly more accurate results than other aggregation methods, including Rank-Centrality, a recent state-of-the-art algorithm. Furthermore, we discuss the problem of semi-supervised ranking when there is available information on the ground truth rank of a subset of players, and propose an algorithm based on SDP which is able to recover the ranking of the remaining players, subject to such hard constraints. Finally, synchronization-based ranking, combined with a spectral technique for the densest subgraph problem, makes it possible to extract locally-consistent partial rankings, in other words, to identify the rank of a small subset of players whose pairwise rank comparisons are less noisy than the rest of the data, which other methods are not able to identify. We discuss a number of related open questions and variations of the ranking problem in other settings, which we defer for future investigation.

Keywords: ranking, angular synchronization, spectral algorithms, semidefinite programming, rank aggregation, partial rankings, least squares, singular value decomposition, densest subgraph problem.

  (15) M. Cucuringu, I. Koutis, S. Chawla, G. Miller, and R. Peng, "Simple and Scalable Constrained Clustering: A Generalized Spectral Method", AISTATS 2016 (Artificial Intelligence and Statistics Conference) (2016)

We present a principled spectral approach to the well-studied constrained clustering problem. It reduces clustering to a generalized eigenvalue problem on Laplacians. The method works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning. In practice this translates to a very fast implementation that consistently outperforms existing spectral approaches. We support this claim with experiments on various data sets: our approach recovers correct clusters in examples where previous methods fail, and handles data sets with millions of data points - two orders of magnitude larger than before.

Keywords: constrained clustering, generalized eigenvalues problem, Laplacian linear systems

  (14) M. Cucuringu, M. P. Rombach, S. H. Lee, M. A. Porter, "Detection of Core-Periphery Structure in Networks Using Spectral Methods and Geodesic Paths", European Journal of Applied Mathematics, Vol. 27, No. 6: 846-887 (2016)

We introduce several novel and computationally ecient methods for detecting “core-periphery structure” in networks. Core- periphery structure is a type of meso-scale structure that includes densely-connected core vertices and sparsely-connected peripheral vertices. Core vertices are well-connected both among themselves and to peripheral vertices, which are not well-connected to any vertices. Our first method, which is based on transportation in networks, aggregates information from many geodesic paths in a network and yields a score for each vertex that reflects the likelihood that that vertex is a core vertex. Our second method is based on a low-rank approximation of the adjacency matrix of the network, which can often be expressed as a tensor-product matrix. Our third approach uses the bottom eigenvector of the random-walk Laplacian to infer a coreness score and a classification into core and peripheral vertices. Additionally, we design an objective function to (1) help classify vertices into core or peripheral vertices and (2) provide a goodness-of-fit criterion for classifications into core versus peripheral vertices. To examine the performance of our methods, we apply our algorithms to both synthetically-generated networks and a variety of real-world data sets.

Keywords: Networks, core-periphery structure, shortest-path algorithms, low-rank matrix approximations, graph Laplacians, spectral algorithms.

  (13) M. Cucuringu, J. Woodworth, "Point Localization and Density Estimation from Ordinal kNN Graphs Using Synchronization", 2015 IEEE Machine Learning for Signal Processing Workshop (Short version) (2015)

We consider the problem of embedding unweighted, directed k-nearest neighbor graphs in low-dimensional Euclidean space. The k-nearest neighbors of each vertex provides ordinal information on the distances between points, but not the distances themselves. We use this ordinal information along with the low-dimensionality to recover the coordinates of the points up to arbitrary similarity transformations (rigid transformations and scaling). Furthermore, we also illustrate the possibility of robustly recovering the underlying density via the Total Variation Maximum Penalized Likelihood Estimation (TV-MPLE) method. We make existing approaches scalable by using an instance of a local-to-global algorithm based on group synchronization, recently proposed in the literature in the context of sensor network localization and structural biology, which we augment with a scaling synchronization step. We demonstrate the scalability of our approach on large graphs, and show how it compares to the Local Ordinal Embedding (LOE) algorithm, which was recently proposed for recovering the configuration of a cloud of points from pairwise ordinal comparisons between a sparse set of distances.

Keywords: k-nearest-neighbor graphs, ordinal constraints, graph embeddings, density estimation, eigenvector synchronization, linear programming, divide-and-conquer.

 (12) M. Cucuringu, "Synchronization over Z2 and community detection in multiplex signed networks with constraints", Journal of Complex Networks, 3 (3):469-506 (2015)

Finding group elements from noisy measurements of their pairwise ratios is also known as the synchronization problem, first introduced in the context of the group SO(2) of planar rotations. The usefulness of synchronization over the group Z2 has been demonstrated in recent algorithms for localization of sensor networks and three-dimensional structuring of molecules. In this paper, we focus on synchronization over Z2 and consider the problem of identifying communities in a multiplex network when the interaction between the nodes is described by a signed (and possibly weighted) measure of similarity, and when the multiplex network has a natural separation into two components. In the setting when one has the additional information that certain subsets of nodes represent the same (unknown) group element, we consider and compare several algorithms for the synchronization problem over Z2, based on spectral and semidefinite programming relaxations (SDP), and message passing algorithms. In other words, all nodes within such a subset represent the same unknown group element, and one has available noisy pairwise measurements between pairs of nodes that belong to different non-overlapping subsets. Following a recent analysis of the eigenvector method for synchronization over SO(2), we analyze the robustness to noise of the eigenvector method for synchronization over Z2, when the underlying graph of pairwise measurements is the Erdos-Renyi random graph, using results from random matrix theory on the largest eigenvalue of rank-1 deformation of large random matrices. We also propose a message passing synchronization algorithm, inspired by the standard belief propagation algorithm for approximating marginal distributions, that outperforms the existing eigenvector synchronization algorithm only for certain classes of graphs and noise models, and enjoys the flexibility of incorporating additional (possibly non-convex) constraints that may not be easily accommodated by any of the other spectral or SDP-based methods. We apply the synchronization methods both to synthetic models and a real data set of roll call voting patterns in the U.S. Congress across time, to identify the two existing communities, i.e., the Democratic and Republican parties, and discuss a number of related open problems.
  (11) S. H. Lee, M. Cucuringu, M. A. Porter, "Density-Based and Transport-Based Core-Periphery Structures in Networks", Physical Review E, Vol. 89, No. 3: 032810 (2014)

Networks often possess mesoscale structures, and studying them can yield insights into both structure and function. It is most common to study community structure, but numerous other types of mesoscale structures also exist. In this paper, we examine core-periphery structures based on both density and transport. In such structures, core network components are well-connected both among themselves and to peripheral components, which are not well-connected to anything. We examine core-periphery structures in a wide range of examples of transportation, social, and financial networks—including road networks in large urban areas, a rabbit warren, a dolphin social network, a European interbank network, and a migration network between counties in the United States. We illustrate that a recently developed transport-based notion of node coreness is very useful for characterizing transportation networks. We also generalize this notion to examine core versus peripheral edges, and we show that the resulting diagnostic is also useful for transportation networks. To examine the properties of transportation networks further, we develop a family of generative models of roadlike networks. We illustrate the effect of the dimensionality of the embedding space on transportation networks, and we demonstrate that the correlations between different measures of coreness can be very different for different types of networks.

 (10) M. Cucuringu, A. Singer, D. Cowburn, "Eigenvector Synchronization, Graph Rigidity and the Molecule Problem", Information and Inference: A Journal of the IMA, 1 (1), pp. 2167 (2012)

The graph realization problem has received a great deal of attention in recent years, due to its importance in applications such as wireless sensor networks and structural biology. In this paper, we extend on previous work and propose the 3D-ASAP algorithm, for the graph realization problem in $\mathbb{R}^3$, given a sparse and noisy set of distance measurements. 3D-ASAP is a divide and conquer, non-incremental and non-iterative algorithm, which integrates local distance information into a global structure determination. Our approach starts with identifying, for every node, a subgraph of its 1-hop neighborhood graph, which can be accurately embedded in its own coordinate system. In the noise-free case, the computed coordinates of the sensors in each patch must agree with their global positioning up to some unknown rigid motion, that is, up to translation, rotation and possibly reflection. In other words, to every patch there corresponds an element of the Euclidean group Euc(3) of rigid transformations in $\mathbb{R}^3$, and the goal is to estimate the group elements that will properly align all the patches in a globally consistent way. Furthermore, 3D-ASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation. In addition, we also propose 3D-SP-ASAP, a faster version of 3D-ASAP, which uses a spectral partitioning algorithm as a preprocessing step for dividing the initial graph into smaller subgraphs. Our extensive numerical simulations show that 3D-ASAP and 3D-SP-ASAP are very robust to high levels of noise in the measured distances and to sparse connectivity in the measurement graph, and compare favorably to similar state-of-the art localization algorithms.

Keywords: Graph realization, distance geometry, eigenvectors, synchronization, spectral graph theory, rigidity theory, SDP, the molecule problem, divide-and-conquer.

 (9) M. Cucuringu, V. Blondel, P. Van Dooren, "Extracting spatial information from networks with low-order eigenvectors", Physical Review E 87, 032803 (2013)

We consider the problem of inferring meaningful spatial information in networks from incomplete information on the connection intensity between the nodes of the network. We consider two spatially distributed networks: a population migration flow network within the US, and a network of mobile phone calls between cities in Belgium. For both networks we use the eigenvectors of the Laplacian matrix constructed from the link intensities to obtain informative visualizations and capture natural geographical subdivisions. We observe that some low order eigenvectors localize very well and seem to reveal small geographically cohesive regions that match remarkably well with political and administrative boundaries. We discuss possible explanations for this observation by describing diffusion maps and localized eigenfunctions. In addition, we discuss a possible connection with the weighted graph cut problem, and provide numerical evidence supporting the idea that lower order eigenvectors point out local cuts in the network. However, we do not provide a formal and rigorous justification for our observations.

Keywords: Eigenvector localization, diffusion maps, visualization of large data sets, social-economic networks, community detection, gravity model.

 (8) M. Cucuringu, Y. Lipman , A. Singer, "Sensor network localization by eigenvector synchronization over the Euclidean group", ACM Transactions on Sensor Networks, 8 (3), pp. 1-42 (2012)

We present a new approach to localization of sensors from noisy measurements of a subset of their Euclidean distances. Our algorithm starts by finnding, embedding and aligning uniquely realizable subsets of neighboring sensors called patches. In the noise-free case, each patch agrees with its global positioning up to an unknown rigid motion of translation, rotation and possibly reflection. The reflections and rotations are estimated using the recently developed eigenvector synchronization algorithm, while the translations are estimated by solving an overdetermined linear system. The algorithm is scalable as the number of nodes increases, and can be implemented in a distributed fashion. Extensive numerical experiments show that it compares favorably to other existing algorithms in terms of robustness to noise, sparse connectivity and running time. While our approach is applicable to higher dimensions, in the current paper we focus on the two dimensional case.

Keywords: Sensor networks, distance geometry, eigenvectors, synchronization, rigidity theory, spectral graph theory.

 (7) M. Cucuringu, M. W. Mahoney, "Localization on low-order eigenvectors of data matrices", Technical Report (arXiv) (2011)

Eigenvector localization refers to the situation when most of the components of an eigenvector are zero or near-zero. This phenomenon has been observed on eigenvectors associated with extremal eigenvalues, and in many of those cases it can be meaningfully interpreted in terms of "structural heterogeneities" in the data. For example, the largest eigenvectors of adjacency matrices of large complex networks often have most of their mass localized on high-degree nodes; and the smallest eigenvectors of the Laplacians of such networks are often localized on small but meaningful community-like sets of nodes. Here, we describe localization associated with low-order eigenvectors, i.e., eigenvectors corresponding to eigenvalues that are not extremal but that are \buried" further down in the spectrum. Although we have observed it in several unrelated applications, this phenomenon of low-order eigenvector localization defies common intuitions and simple explanations, and it creates serious difficulties for the applicability of popular eigenvector-based machine learning and data analysis tools. After describing two examples where low-order eigenvector localization arises, we present a very simple model that qualitatively reproduces several of the empirically-observed results. This model suggests certain coarse structural similarities among the seemingly-unrelated applications where we have observed low-order eigenvector localization, and it may be used as a diagnostic tool to help extract insight from data graphs when such low-order eigenvector localization is present.
 (6) F. Blanchet-Sadri, E. Allen, C. Byrum, M. Cucuringu and R. Mercas, "Counting Bordered Partial Words by Critical Positions" , The Electronic Journal of Combinatorics, Vol. 18, pp. 138 (2011)

A partial word, sequence over a finite alphabet that may have some undefined positions or holes, is bordered if one of its proper prefixes is compatible with one of its suffixes. The number theoretical problem of enumerating all bordered full words (the ones without holes) of a fixed length n over an alphabet of a fixed size k is well known. In this paper, we investigate the number of bordered partial words having h holes with the parameters k, n. It turns out that all borders of a full word are simple, and so every bordered full word has a unique minimal border no longer than half its length. Counting bordered partial words is made extremely more difficult by the failure of that combinatorial property since there is now the possibility of a minimal border that is nonsimple. Here, we give recursive formulas based on our approach of the so-called simple and nonsimple critical positions.

Keywords: Theory of formal languages; Number theory; Combinatorics on words; Partial words; Bordered partial words; Simple border; Simply bordered partial words; Critical positions

 (5) F. Blanchet-Sadri, M. Cucuringu, "Counting primitive partial words", Journal of Automata, Languages and Combinatorics 15, 3/4, 199-227 (2010)

A word is primitive if it is not a power of another word. The number of primitive words of a fixed length over an alphabet of a fixed size is well known and relates to the M¨obius function. In this paper, we investigate the number of primitive partial words which are strings that may contain “do not know” symbols.

Keywords: Combinatorics on words; Words; Partial words; Primitive words; Primitive partial words; Mobius function; Periods; Exact periods.

 (4) M. Cucuringu, J. Puente, and D. Shue, "Model Selection in Undirected Graphical Models with Elastic Net ", Technical Report, (arXiv) (2010)

Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on l1 regularized neighborhood estimation to include the elastic net l1 + l2 penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates.

 (3) A. Singer, M. Cucuringu, "Uniqueness of Low-Rank Matrix Completion by Rigidity Theory", SIAM Journal on Matrix Analysis and Applications, 31 (4), pp. 1621-1641 (2010)

The problem of completing a low-rank matrix from a subset of its entries is often encountered in the analysis of incomplete data sets exhibiting an underlying factor model with applications in collaborative filtering, computer vision, and control. Most recent work has been focused on constructing efficient algorithms for exact or approximate recovery of the missing matrix entries and proving lower bounds for the number of known entries that guarantee a successful recovery with high probability. A related problem from both the mathematical and algorithmic points of view is the distance geometry problem of realizing points in a Euclidean space from a given subset of their pairwise distances. Rigidity theory answers basic questions regarding the uniqueness of the realization satisfying a given partial set of distances. We observe that basic ideas and tools of rigidity theory can be adapted to determine uniqueness of low-rank matrix completion, where inner products play the role that distances play in rigidity theory. This observation leads to efficient randomized algorithms for testing necessary and sufficient conditions for local completion and for testing sufficient conditions for global completion. Crucial to our analysis is a new matrix, which we call the completion matrix, that serves as the analogue of the rigidity matrix.

Keywords: low-rank matrices, missing values, rigidity theory, iterative methods, collaborative filtering

 (2) M. Cucuringu, R. Strichartz, "Infinitesimal Resistance Metrics on Sierpinski Gasket Type Fractals", Analysis, Vol. 28, Issue 3, pp. 319-331 (2008)

We prove the existence of an infinitesimal resistance metric on the Sierpinksi gasket (SG) at boundary points, junction points and periodic points. This is a renormalized limit of the effective resistance metric as we zoom in on the point, and satisfies a self-similar identity. We obtain similar results on PCF fractals with three boundary points.
 (1) M. Cucuringu, R. Strichartz, "Self-Similar Energy Forms on the Sierpinski Gasket with Twists", Potential Analysis, Volume 27, Issue 1, pp. 45-60 (2007)

By introducing twists into the iterated function system that defines the Sierpinski gasket, we are able to construct a unique regular energy form that satisfies a self-similar identity with any prescribed projective weights. Our construction is explicit (involving finding a root of a 4th order polynomial), and we are able to find explicitly a polynomial identity for the algebraic variety containing the smooth manifold of admissible weights. Without the twists, there are obstructions to existence, and a complete description due to Sabot is quite complicated.