Mihai Cucuringu Alan Turing Research Fellow 
[Homepage] [Research] [CV] [Personal] [Links] 
(18) M. Cucuringu, R. Erban "ADMCLE approach for detecting slow variables in continuous time Markov chains and dynamic data", SIAM Journal on Scientific Computing, 39(1), B76B101 (2017), (arXiv:1504.01786)
In this paper we build on the anisotropic diffusion map (ADM) framework for detecting intrinsic slow variables in highdimensional dynamic data. We develop and analyze a new ADMLangevin approach, which has the potential of being more efficient than the existing ADM method for a large class of chemical systems. The contribution of our paper is threefold. First, we are able to improve the computational efficiency of the existing ADM approach and avoid the very computationally intensive step of running local bursts simulations by using an approximation of the chemical Langevin equation. Second, we exploit the spectrum of each local covariance matrix, and locally build a sparse ellipsoidlike neighborhood graph at each point in the data set, as opposed to computing the similarity between all pairs of points in the domain, thus making our approach scalable to networks with thousands or even tens of thousands of nodes. As our third and main contribution, we propose a robust approximation method for the mapping from the observable space to the dynamically meaningful inaccessible space, which allows us to investigate a MarkovChain based approach to estimating the stationary distribution of the detected slow variable, without any apriori knowledge of it. Along the way, we derive analytically an expression for the conditional distribution of the fast variable given the slow variable, thus rendering our approach entirely deterministic, as opposed to existing approaches that rely on stochastic simulation algorithms. Key words. Anisotropic diffusion maps, randomwalk Laplacian, eigenvectors, chemical networks, intrinsic variables, stochastic simulation algorithms, separation of scale, slow variables, stationary distribution, high dimensional data, manifold learning, Langevin equation, complex dynamical systems. 

(17) M. Cucuringu, "SyncRank: Robust Ranking, Constrained Ranking and Rank Aggregation via Eigenvector and SDP Synchronization", IEEE Transactions on Network Science and Engineering, 3 (1): 5879, (2016), arXiv:1504.01070, (arXiv:1504.01070)
We consider the classic problem of establishing a statistical ranking of a set of n items given a set of inconsistent and incomplete pairwise comparisons between such items. Instantiations of this problem occur in numerous applications in data analysis (e.g., ranking teams in sports data), computer vision, and machine learning. We formulate the above problem of ranking with incomplete noisy information as an instance of the group synchronization problem over the group SO(2) of planar rotations, whose usefulness has been demonstrated in numerous applications in recent years in computer vision and graphics, sensor network localization and structural biology. Its least squares solution can be approximated by either a spectral or a semidefinite programming (SDP) relaxation, followed by a rounding procedure. We show extensive numerical simulations on both synthetic and realworld data sets (Premier League soccer games, a Halo 2 game tournament and NCAA College Basketball games), which show that our proposed method compares favorably to other ranking methods from the recent literature. Existing theoretical guarantees on the group synchronization problem imply lower bounds on the largest amount of noise permissible in the data while still achieving exact recovery of the ground truth ranking. We propose a similar synchronizationbased algorithm for the rankaggregation problem, which integrates in a globally consistent ranking many pairwise rankoffsets or partial rankings, given by different rating systems on the same set of items, an approach which yields significantly more accurate results than other aggregation methods, including RankCentrality, a recent stateoftheart algorithm. Furthermore, we discuss the problem of semisupervised ranking when there is available information on the ground truth rank of a subset of players, and propose an algorithm based on SDP which is able to recover the ranking of the remaining players, subject to such hard constraints. Finally, synchronizationbased ranking, combined with a spectral technique for the densest subgraph problem, makes it possible to extract locallyconsistent partial rankings, in other words, to identify the rank of a small subset of players whose pairwise rank comparisons are less noisy than the rest of the data, which other methods are not able to identify. We discuss a number of related open questions and variations of the ranking problem in other settings, which we defer for future investigation. Key words. ranking, angular synchronization, spectral algorithms, semidefinite programming, rank aggregation, partial rankings, least squares, singular value decomposition, densest subgraph problem. 

(16) M. Cucuringu, I. Koutis, S. Chawla, G. Miller, and R. Peng, "Simple and Scalable Constrained Clustering: A Generalized Spectral Method", AISTATS 2016 (Artificial Intelligence and Statistics Conference) (arXiv:1601.04746)
We present a principled spectral approach to the wellstudied constrained clustering problem. It reduces clustering to a generalized eigenvalue problem on Laplacians. The method works in nearlylinear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2way partitioning. In practice this translates to a very fast implementation that consistently outperforms existing spectral approaches. We support this claim with experiments on various data sets: our approach recovers correct clusters in examples where previous methods fail, and handles data sets with millions of data points  two orders of magnitude larger than before. Key words. constrained clustering, generalized eigenvalues problem, Laplacian linear systems 

(15) M. Cucuringu, M. P. Rombach, S. H. Lee, M. A. Porter, "Detection of CorePeriphery Structure in Networks Using Spectral Methods and Geodesic Paths", European Journal of Applied Mathematics, Vol. 27, No. 6: 846887 (2016), (arXiv:1410.6572), (arXiv:1410.6572)
We introduce several novel and computationally ecient methods for detecting “coreperiphery structure” in networks. Core periphery structure is a type of mesoscale structure that includes denselyconnected core vertices and sparselyconnected peripheral vertices. Core vertices are wellconnected both among themselves and to peripheral vertices, which are not wellconnected to any vertices. Our first method, which is based on transportation in networks, aggregates information from many geodesic paths in a network and yields a score for each vertex that reflects the likelihood that that vertex is a core vertex. Our second method is based on a lowrank approximation of the adjacency matrix of the network, which can often be expressed as a tensorproduct matrix. Our third approach uses the bottom eigenvector of the randomwalk Laplacian to infer a coreness score and a classification into core and peripheral vertices. Additionally, we design an objective function to (1) help classify vertices into core or peripheral vertices and (2) provide a goodnessoffit criterion for classifications into core versus peripheral vertices. To examine the performance of our methods, we apply our algorithms to both syntheticallygenerated networks and a variety of realworld data sets. Key words. Networks, coreperiphery structure, shortestpath algorithms, lowrank matrix approximations, graph Laplacians, spectral algorithms. 

(14) M. Cucuringu, J. Woodworth, "Point Localization and Density Estimation from Ordinal kNN graphs using Synchronization", submitted, (2015), (arXiv:1504.00722); Conference version accepted to 2015 IEEE Machine Learning for Signal Processing Workshop (Boston, 2015)
We consider the problem of embedding unweighted, directed knearest neighbor graphs in lowdimensional Euclidean space. The knearest neighbors of each vertex provides ordinal information on the distances between points, but not the distances themselves. We use this ordinal information along with the lowdimensionality to recover the coordinates of the points up to arbitrary similarity transformations (rigid transformations and scaling). Furthermore, we also illustrate the possibility of robustly recovering the underlying density via the Total Variation Maximum Penalized Likelihood Estimation (TVMPLE) method. We make existing approaches scalable by using an instance of a localtoglobal algorithm based on group synchronization, recently proposed in the literature in the context of sensor network localization and structural biology, which we augment with a scaling synchronization step. We demonstrate the scalability of our approach on large graphs, and show how it compares to the Local Ordinal Embedding (LOE) algorithm, which was recently proposed for recovering the configuration of a cloud of points from pairwise ordinal comparisons between a sparse set of distances. Key words. knearestneighbor graphs, ordinal constraints, graph embeddings, density estimation, eigenvector synchronization, linear programming, divideandconquer. 

(13) M. Cucuringu, "Synchronization over Z_{2} and community detection in multiplex signed networks with constraints", Journal of Complex Networks, 3 (3):469506, (2015) Finding group elements from noisy measurements of their pairwise ratios is also known as the synchronization problem, first introduced in the context of the group SO(2) of planar rotations. The usefulness of synchronization over the group Z_{2} has been demonstrated in recent algorithms for localization of sensor networks and threedimensional structuring of molecules. In this paper, we focus on synchronization over Z_{2} and consider the problem of identifying communities in a multiplex network when the interaction between the nodes is described by a signed (and possibly weighted) measure of similarity, and when the multiplex network has a natural separation into two components. In the setting when one has the additional information that certain subsets of nodes represent the same (unknown) group element, we consider and compare several algorithms for the synchronization problem over Z_{2}, based on spectral and semidefinite programming relaxations (SDP), and message passing algorithms. In other words, all nodes within such a subset represent the same unknown group element, and one has available noisy pairwise measurements between pairs of nodes that belong to different nonoverlapping subsets. Following a recent analysis of the eigenvector method for synchronization over SO(2), we analyze the robustness to noise of the eigenvector method for synchronization over Z_{2}, when the underlying graph of pairwise measurements is the ErdosRenyi random graph, using results from random matrix theory on the largest eigenvalue of rank1 deformation of large random matrices. We also propose a message passing synchronization algorithm, inspired by the standard belief propagation algorithm for approximating marginal distributions, that outperforms the existing eigenvector synchronization algorithm only for certain classes of graphs and noise models, and enjoys the flexibility of incorporating additional (possibly nonconvex) constraints that may not be easily accommodated by any of the other spectral or SDPbased methods. We apply the synchronization methods both to synthetic models and a real data set of roll call voting patterns in the U.S. Congress across time, to identify the two existing communities, i.e., the Democratic and Republican parties, and discuss a number of related open problems. 

(12) S. H. Lee, M. Cucuringu, M. A. Porter, "DensityBased and TransportBased CorePeriphery Structures in Networks", Physical Review E, Vol. 89, No. 3: 032810 (2014), (arXiv:1310.8387) Networks often possess mesoscale structures, and studying them can yield insights into both structure and function. It is most common to study community structure, but numerous other types of mesoscale structures also exist. In this paper, we examine coreperiphery structures based on both density and transport. In such structures, core network components are wellconnected both among themselves and to peripheral components, which are not wellconnected to anything. We examine coreperiphery structures in a wide range of examples of transportation, social, and financial networks—including road networks in large urban areas, a rabbit warren, a dolphin social network, a European interbank network, and a migration network between counties in the United States. We illustrate that a recently developed transportbased notion of node coreness is very useful for characterizing transportation networks. We also generalize this notion to examine core versus peripheral edges, and we show that the resulting diagnostic is also useful for transportation networks. To examine the properties of transportation networks further, we develop a family of generative models of roadlike networks. We illustrate the effect of the dimensionality of the embedding space on transportation networks, and we demonstrate that the correlations between different measures of coreness can be very different for different types of networks. 


(11) M. Cucuringu, A. Singer, D. Cowburn,
"Eigenvector Synchronization, Graph Rigidity and the Molecule Problem", Information and Inference: A Journal of the IMA, 1 (1), pp. 2167 (2012), arXiv:1111.3304, (arXiv)
The graph realization problem has received a great deal of attention in recent years, due to its importance in applications such as wireless sensor networks and structural biology. In this paper, we extend on previous work and propose the 3DASAP algorithm, for the graph realization problem in $\mathbb{R}^3$, given a sparse and noisy set of distance measurements. 3DASAP is a divide and conquer, nonincremental and noniterative algorithm, which integrates local distance information into a global structure determination. Our approach starts with identifying, for every node, a subgraph of its 1hop neighborhood graph, which can be accurately embedded in its own coordinate system. In the noisefree case, the computed coordinates of the sensors in each patch must agree with their global positioning up to some unknown rigid motion, that is, up to translation, rotation and possibly reflection. In other words, to every patch there corresponds an element of the Euclidean group Euc(3) of rigid transformations in $\mathbb{R}^3$, and the goal is to estimate the group elements that will properly align all the patches in a globally consistent way. Furthermore, 3DASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation. In addition, we also propose 3DSPASAP, a faster version of 3DASAP, which uses a spectral partitioning algorithm as a preprocessing step for dividing the initial graph into smaller subgraphs. Our extensive numerical simulations show that 3DASAP and 3DSPASAP are very robust to high levels of noise in the measured distances and to sparse connectivity in the measurement graph, and compare favorably to similar stateofthe art localization algorithms. Keywords: Graph realization, distance geometry, eigenvectors, synchronization, spectral graph theory, rigidity theory, SDP, the molecule problem, divideandconquer. 

(10) M. Cucuringu, V. Blondel, P. Van Dooren, "Extracting spatial information from networks with loworder eigenvectors", Physical Review E 87, 032803 (2013), arXiv:1111.0920, (arXiv) We consider the problem of inferring meaningful spatial information in networks from incomplete information on the connection intensity between the nodes of the network. We consider two spatially distributed networks: a population migration flow network within the US, and a network of mobile phone calls between cities in Belgium. For both networks we use the eigenvectors of the Laplacian matrix constructed from the link intensities to obtain informative visualizations and capture natural geographical subdivisions. We observe that some low order eigenvectors localize very well and seem to reveal small geographically cohesive regions that match remarkably well with political and administrative boundaries. We discuss possible explanations for this observation by describing diffusion maps and localized eigenfunctions. In addition, we discuss a possible connection with the weighted graph cut problem, and provide numerical evidence supporting the idea that lower order eigenvectors point out local cuts in the network. However, we do not provide a formal and rigorous justification for our observations. Keywords: Eigenvector localization, diffusion maps, visualization of large data sets, socialeconomic networks, community detection, gravity model. 

(9) M. Cucuringu, Y. Lipman , A. Singer, "Sensor network localization by eigenvector synchronization over the Euclidean group", ACM Transactions on Sensor Networks, 8 (3), pp. 142 (2012) We present a new approach to localization of sensors from noisy measurements of a subset of their Euclidean distances. Our algorithm starts by finnding, embedding and aligning uniquely realizable subsets of neighboring sensors called patches. In the noisefree case, each patch agrees with its global positioning up to an unknown rigid motion of translation, rotation and possibly reflection. The reflections and rotations are estimated using the recently developed eigenvector synchronization algorithm, while the translations are estimated by solving an overdetermined linear system. The algorithm is scalable as the number of nodes increases, and can be implemented in a distributed fashion. Extensive numerical experiments show that it compares favorably to other existing algorithms in terms of robustness to noise, sparse connectivity and running time. While our approach is applicable to higher dimensions, in the current paper we focus on the two dimensional case. Keywords: Sensor networks, distance geometry, eigenvectors, synchronization, rigidity theory, spectral graph theory. 

(8) M. Cucuringu, M. W. Mahoney, "Localization on loworder eigenvectors of data matrices", Technical Report, arXiv:1109.1355 (2011) (arXiv) Eigenvector localization refers to the situation when most of the components of an eigenvector are zero or nearzero. This phenomenon has been observed on eigenvectors associated with extremal eigenvalues, and in many of those cases it can be meaningfully interpreted in terms of "structural heterogeneities" in the data. For example, the largest eigenvectors of adjacency matrices of large complex networks often have most of their mass localized on highdegree nodes; and the smallest eigenvectors of the Laplacians of such networks are often localized on small but meaningful communitylike sets of nodes. Here, we describe localization associated with loworder eigenvectors, i.e., eigenvectors corresponding to eigenvalues that are not extremal but that are \buried" further down in the spectrum. Although we have observed it in several unrelated applications, this phenomenon of loworder eigenvector localization defies common intuitions and simple explanations, and it creates serious difficulties for the applicability of popular eigenvectorbased machine learning and data analysis tools. After describing two examples where loworder eigenvector localization arises, we present a very simple model that qualitatively reproduces several of the empiricallyobserved results. This model suggests certain coarse structural similarities among the seeminglyunrelated applications where we have observed loworder eigenvector localization, and it may be used as a diagnostic tool to help extract insight from data graphs when such loworder eigenvector localization is present. 
(7) F. BlanchetSadri, E. Allen, C. Byrum, M. Cucuringu and R. Mercas, "Counting Bordered Partial Words by Critical Positions" , The Electronic Journal of Combinatorics, Vol. 18, 2011, #P138. A partial word, sequence over a finite alphabet that may have some undefined positions or holes, is bordered if one of its proper prefixes is compatible with one of its suffixes. The number theoretical problem of enumerating all bordered full words (the ones without holes) of a fixed length n over an alphabet of a fixed size k is well known. In this paper, we investigate the number of bordered partial words having h holes with the parameters k, n. It turns out that all borders of a full word are simple, and so every bordered full word has a unique minimal border no longer than half its length. Counting bordered partial words is made extremely more difficult by the failure of that combinatorial property since there is now the possibility of a minimal border that is nonsimple. Here, we give recursive formulas based on our approach of the socalled simple and nonsimple critical positions. Keywords: Theory of formal languages; Number theory; Combinatorics on words; Partial words; Bordered partial words; Simple border; Simply bordered partial words; Critical positions 

(6) F. BlanchetSadri, M. Cucuringu, "Counting primitive partial words", Journal of Automata, Languages and Combinatorics 15 (2010) 3/4, 199227. (Chapter 6 in book by F. BlanchetSadri, "Algorithmic Combinatorics on Partial Words", Chapman & Hall/CRC Press, Boca Raton, FL, 2008) A word is primitive if it is not a power of another word. The number of primitive words of a fixed length over an alphabet of a fixed size is well known and relates to the M¨obius function. In this paper, we investigate the number of primitive partial words which are strings that may contain “do not know” symbols. Keywords: Combinatorics on words; Words; Partial words; Primitive words; Primitive partial words; Mobius function; Periods; Exact periods. 


(5) M. Cucuringu, J. Puente, and D. Shue,
"Model Selection in Undirected Graphical Models with Elastic Net ", Technical Report, 2010, arXiv: 1111.0559 (arXiv) Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on l1 regularized neighborhood estimation to include the elastic net l1 + l2 penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pairwise neighborhood union estimates. 

(4) A. Singer, M. Cucuringu, "Uniqueness of LowRank Matrix Completion by Rigidity Theory", SIAM Journal on Matrix Analysis and Applications, 31 (4), pp. 16211641 (2010) (arXiv) The problem of completing a lowrank matrix from a subset of its entries is often encountered in the analysis of incomplete data sets exhibiting an underlying factor model with applications in collaborative filtering, computer vision, and control. Most recent work has been focused on constructing efficient algorithms for exact or approximate recovery of the missing matrix entries and proving lower bounds for the number of known entries that guarantee a successful recovery with high probability. A related problem from both the mathematical and algorithmic points of view is the distance geometry problem of realizing points in a Euclidean space from a given subset of their pairwise distances. Rigidity theory answers basic questions regarding the uniqueness of the realization satisfying a given partial set of distances. We observe that basic ideas and tools of rigidity theory can be adapted to determine uniqueness of lowrank matrix completion, where inner products play the role that distances play in rigidity theory. This observation leads to efficient randomized algorithms for testing necessary and sufficient conditions for local completion and for testing sufficient conditions for global completion. Crucial to our analysis is a new matrix, which we call the completion matrix, that serves as the analogue of the rigidity matrix. Keywords: lowrank matrices, missing values, rigidity theory, iterative methods, collaborative ﬁltering 
(3) F. BlanchetSadri, M. Cordier, M. Cucuringu and R. Kirsch, "Combinatorics on Border Correlations of Partial Words", International Conference on Automata, Languages and Related Topics, Debrecen, Hungary, October 2124, 2008 In this paper, we consider the border sets of partial words of length n over a finite alphabet, and study the combinatorics of specific representations of them, called border correlations, which are binary vectors of length n indicating the borders. We characterize precisely which of these vectors are valid border correlations. We establish a onetoone correspondence between the set of valid border correlations and the set of valid ternary correlations of a given length, the latter being ternary vectors representing the strong and strictly weak period sets. This gives an efficient algorithm for generating the valid ternary correlations of a given length. It turns out that the set of border correlations of partial words over an arbitrary alphabet of cardinality at least two is the same as the set of border correlations of partial words over the binary alphabet. We also give a correspondence between the ternary correlation of a partial word and its refined border correlation which specifies the lengths of all the word’s bordered cyclic shifts’ minimal borders. Finally, we investigate the population size, that is, the number of partial words sharing a given border correlation, and obtain formulas to compute it. Our approach is based on the fact that the sets of all border correlations of a given length form distributive lattices under suitably defined partial orderings. Keywords: Theory of formal languages; Combinatorics on words; Partial words; Border correlations; Refined border correlations; Ternary correlations; Population size. 

(2) M. Cucuringu, R. Strichartz, "Infinitesimal Resistance Metrics on Sierpinski Gasket Type Fractals" Analysis , Vol. 28, Issue 3 (2008), page 319331 We prove the existence of an infinitesimal resistance metric on the Sierpinksi gasket (SG) at boundary points, junction points and periodic points. This is a renormalized limit of the effective resistance metric as we zoom in on the point, and satisfies a selfsimilar identity. We obtain similar results on PCF fractals with three boundary points. 

(1) M. Cucuringu, R. Strichartz, "SelfSimilar Energy Forms on the Sierpinski Gasket with Twists"  Potential Analysis, 27 ( Aug 2006), page 4560 By introducing twists into the iterated function system that defines the Sierpinski gasket, we are able to construct a unique regular energy form that satisfies a selfsimilar identity with any prescribed projective weights. Our construction is explicit (involving finding a root of a 4th order polynomial), and we are able to find explicitly a polynomial identity for the algebraic variety containing the smooth manifold of admissible weights. Without the twists, there are obstructions to existence, and a complete description due to Sabot is quite complicated. 
© 2017 Mihai Cucuringu