Continuous Average Straightness in Spatial Graphs

Vincent Labatut

vendredi 12 mai 2023 à 11h en salle 26-00/228 LIP6, Sorbonne Université

Slides

Straightness is a measure designed to characterize a pair of vertices in a spatial graph. In practice, it is often averaged over the whole graph, or a part of it. The standard approach consists in: 1) discretizing the graph edges, 2) processing the vertex-to-vertex Straightness considering the additional vertices resulting from this discretization, and 3) averaging the obtained values. However, this discrete approximation can be computationally expensive on large graphs, and its precision has not been clearly assessed. In this work, we adopt a continuous approach to average the Straightness over the edges of spatial graphs. This allows us to derive 5 distinct measures able to characterize precisely the accessibility of the whole graph, as well as individual vertices and edges. Our method is generic and could be applied to other measures designed for spatial graphs. We perform an experimental evaluation of our continuous average Straightness measures, and show how they behave differently from the traditional vertex-to-vertex ones. Moreover, we also study their discrete approximations, and show that our approach is globally less demanding in terms of both processing time and memory usage.

Penser l’ingénierie et ses outils informatiques dans des contextes de soutenabilité forte : le cas de l’analyse environnementale en conception

Lou Grimal

mardi 9 mai 2023 à 11h en salle 26-00/428 LIP6, Sorbonne Université

Une transition des systèmes socio-techniques semble être nécessaire afin de limiter le dépassement des limites planétaires. L’ingénierie, activité influencée par le contexte historique et épistémologique de son époque, peut être un levier pour cette transition. Ma thèse a permis de proposer un cadre théorique pour comprendre comment l’ingénierie peut exister dans des contextes de soutenabilité forte et explorer des outils informatiques qui peuvent être déployés dans ces contextes. Ce cadre est composé de 4 caractéristiques (éthique, objectif, démarche, expertise) et adresse le niveau de l’ingénierie et le niveau des interactions entre l’ingénieur et ses outils informatiques. La méthodologie Value Sensitive Design a été mise en œuvre pour explorer ce cadre théorique. Deux expérimentations ont été conduites pour tester une première médiatisation de la permaingénierie dans un outil informatique – outil mettant en œuvre une démarche d’analyse environnementale collaborative. Ces expérimentations ont permis de montrer un besoin de cadre conceptuel pour une ingénierie en contexte de soutenabilité forte et un manque de pratique des acteurs exprimant ce besoin. Trois contributions ont été identifiées : (1) la formalisation du cadre de permaingénierie, (2) l’approche par les interactions humains-machines pour adresser les questions de transitions culturelles et changement de valeurs, (3) l’impossibilité de transformer un outil d’ingénierie de soutenabilité faible en outil de soutenabilité forte.

Hearing the shape of a robotic arena: Spectral shape analysis by Kilobots

Main author and presenter: Leo Cazenille
Other authors: Nicolas Lobato-Dauzier, Alessia Loi, Mika Ito, Olivier Marchal, Nathanael Aubert-Kato, Nicolas Bredeche, Anthony J. Genot

April 18th, 2023, 11am
Room: 24-25/405

Biological swarms have showcased their extraordinary capabilities in tackling geometric challenges by employing limited perception and mobility. They achieve this feat by internally diffusing information to bridge the gap between local and global scales, ultimately facilitating collective consensus and decision-making, even when individual agents only have access to local information. In this study, we strive to adapt this paradigm to robotic swarms, which consist of small robots with constrained sensing and computational abilities.

Our bio-inspired approach leverages spectral shape analysis, enabling the robotic swarms to identify the shape of a given arena. By estimating the second eigenvalue of the Laplacian collectively through information exchange, the robots can effectively obtain a fingerprint of the arena’s geometry. This metric, known as algebraic connectivity, proves invaluable in the context of swarm-based problem-solving and coordination.

To evaluate the performance of our proposed method, we conducted experiments involving 25 real robots as well as simulations using Kilombo, a state-of-the-art simulator for kilobots. Our objective was to assess the efficacy of our approach by attempting to classify a set of 8 shapes with varying geometric properties. The results of these experiments and simulations indicate that the diffusion-based spectral analysis can indeed empower robotic swarms to accurately sense and classify the geometry of their environment.

In conclusion, our innovative approach offers a promising avenue for advancing swarm-based problem-solving and coordination by drawing inspiration from the remarkable capabilities of biological swarms in addressing geometric challenges with limited perception and mobility.

(Presentation in French with slides in English).

Énumération efficace des cliques maximales dans les flots de liens réels massifs

Alexis Baudin, Clémence Magnien, Lionel Tabourier

EGC 2023, vol. RNTI-E-39, pp.139-150

Les flots de liens offrent un formalisme de description d’interactions au cours du temps. Un lien correspond à deux sommets qui interagissent sur un intervalle de temps. Une clique est un ensemble de sommets associé à un intervalle de temps durant lequel ils sont tous connectés. Elle est maximale si ni son ensemble de sommets ni son intervalle de temps ne peuvent être augmentés. Les algorithmes existants pour énumérer ces structures ne permettent pas de traiter des jeux de données réels de plus de quelques centaines de milliers d’interactions. Or, l’accès à des données toujours plus massives demande d’adapter les outils à de plus grandes échelles. Nous proposons alors un algorithme qui énumère les cliques maximales sur des réseaux temporels réels et massifs atteignant jusqu’à plus de 100 millions de liens. Nous montrons expérimentalement qu’il améliore l’état de l’art de plusieurs ordres de grandeur.

Download

Compact representation of distances in sparse graphs: a tour around 2-hop labelings

Laurent Viennot

mercredi 25 janvier 2023, 26-00/332 LIP6, Sorbonne Université

Slides

A 2-hop labeling (a.k.a. hub labeling) consists in assigning to each node of a graph a small subset of nodes called hubs so that any pair of nodes have a common hub lying on a shortest path joining them. Such labelings appeared to provide a very efficient representation of distances in practical road network where surprisingly small hub sets can be computed. A graph parameter called skeleton dimension allows to explain this behaviour. Connecting any two nodes through a common hub can be seen as a 2-hop shortest path in a super-graph of the original graph. A natural extension is to consider more hops and is related to the notion of hopsets introduced in parallel computation of shortest paths. Surprisingly, a 3-hop construction leads to a data-structure for representing distances which is asymptotically both smaller and faster than 2-hop labeling in graphs of bounded skeleton dimension. Another natural question is to ask whether 2-hop labelings can be efficient more generally in sparse graphs. Unfortunately, this is not the case as there exists bounded degree graphs that require quasi-linear average hub set size. The construction of such difficult graphs is related to the construction of dense graphs with n nodes that can be decomposed into n induced matchings as introduced by Ruzsa and Szemerédi in the seventies.

Tailored vertex ordering for faster triangle listing in large graphs

Fabrice Lécuyer, Louis Jachiet, Clémence Magnien, Lionel Tabourier

ALENEX 2023

Listing triangles is a fundamental graph problem with many applications, and large graphs require fast algorithms. Vertex ordering allows the orientation of edges from lower to higher vertex indices, and state-of-the-art triangle listing algorithms use this to accelerate their execution and to bound their time complexity. Yet, only basic orderings have been tested. In this paper, we show that studying the precise cost of algorithms instead of their bounded complexity leads to faster solutions. We introduce cost functions that link ordering properties with the running time of a given algorithm. We prove that their minimization is NP-hard and propose heuristics to obtain new orderings with different trade-offs between cost reduction and ordering time. Using datasets with up to two billion edges, we show that our heuristics accelerate the listing of triangles by an average of 38% when the ordering is already given as an input, and 16% when the ordering time is included.

Download

A combinatorial link between labelled graphs and increasingly labelled Schröder trees

Antoine Genitrini, Mehdi Naima, Olivier Bodini

The 15th Latin American Theoretical Informatics Symposium (LATIN 2022)

In this paper we study a model of Schr ̈oder trees whose labelling is increasing along the branches. Such tree family is useful in the context of phylogenetic. The tree nodes are of arbitrary arity (i.e. out-degree) and the node labels can be repeated throughout different branches of the tree. Once a formal construction of the trees is formalized, we then turn to the enumeration of the trees inspired by a renormalisation due to Stanley on acyclic orientations of graphs. We thus exhibit links between our tree model and labelled graphs and prove a one-to-one correspondence between a subclass of our trees and labelled graphs. As a by-product we obtain a new natural combinatorial interpretation of Stanley’s renormalising factor. We then study different combinatorial characteristics of our tree model and finally, we design an efficient uniform random sampler for our tree model which allows to generate uniformly Erdös-Renyi graph with a constant number of rejections on
average.

Download

Compressing bipartite graphs with a dual reordering scheme

Maximilien Danisch, Ioannis Panagiotas, Lionel Tabourier

arXiv:2209.12062

In order to manage massive graphs in practice, it is often necessary to resort to graph compression, which aims at reducing the memory used when storing and processing the graph. Efficient compression methods have been proposed in the literature, especially for web graphs. In most cases, they are combined with a vertex reordering pre-processing step which significantly improves the compression rate. However, these techniques are not as efficient when considering other kinds of graphs. In this paper, we focus on the class of bipartite graphs and adapt the vertex reordering phase to their specific structure by proposing a dual reordering scheme. By reordering each group of vertices in the purpose of minimizing a specific score, we show that we can reach better compression rates. We also suggest that this approach can be further refined to make the node orderings more adapted to the compression phase that follows the ordering phase.

Download

A Fast Algorithm for Ranking Users by their Influence in Online Social Platforms

Nouamane Arhachoui, Esteban Bautista, Maximilien Danisch, Anastasios Giovanidis

ASONAM 2022

Abstract—Measuring the influence of users in social networks is key for numerous applications. A recently proposed influence metric, coined as $\psi$-score, allows to go beyond traditional centrality metrics, which only assess structural graph importance, by further incorporating the rich information provided by the posting and re-posting activity of users. The $\psi$-score is shown in fact to generalize PageRank for non-homogeneous node activity. Despite its significance, it scales poorly to large datasets; for a network of N users it requires to solve N linear systems of equations of size N. To address this problem, this work introduces a novel scalable algorithm for the fast approximation of $\psi$-score, named Power-$\psi$. The proposed algorithm is based on a novel equation indicating that it suffices to solve one system of equations of size N to compute the $\psi$-score. Then, our algorithm exploits the fact that such system can be recursively and distributedly approximated to any desired error. This permits the $\psi$-score, summarizing both structural and behavioral information for the nodes, to run as fast as PageRank. We validate the effectiveness of the proposed algorithm on several real-world datasets.

Download

Efficient data stream mining

Maroua Bahri

June 27th, 2022, 11am

Room : 24-25/405

Slides

Data streams pose several challenges for learning algorithms, including mainly, but not limited to, restricted resources (in terms of memory and processing time), high-dimensionality, and concept drift constraints. To process these evolving data, we need efficient and accurate techniques and strategies, such as window models, summarization techniques, and other manners to restrict the storage to a part of — and/or synopsis information from — the stream instead of maintaining it in its entirety. This talk will present how such challenges can be addressed and how we can reduce machine learning algorithms’ resource costs while maintaining good accuracy.

A local updating algorithm for Personalized PageRank via Chebyshev Polynomials

Esteban Bautista, Matthieu Latapy

In Social Network Analysis and Mining, 2022, vol. 12, no 1, p. 1-11.

The personalized PageRank algorithm is one of the most versatile tools for the analysis of networks. In spite of its ubiquity, maintaining personalized PageRank vectors when the underlying network constantly evolves is still a challenging task. To address this limitation, this work proposes a novel distributed algorithm to locally update personalized PageRank vectors when the graph topology changes. The proposed algorithm is based on the use of Chebyshev polynomials and a novel update equation that encompasses a large family of PageRank-based methods. In particular, the algorithm has the following advantages: (i) it has faster convergence speed than state-of-the-art alternatives for local PageRank updating; and (ii) it can update the solution of recent generalizations of PageRank for which no updating algorithms have been developed. Experiments in a real-world temporal network of an autonomous system validate the effectiveness of the proposed algorithm.

Download

The Role of Network Topology on Community-aware Centrality Measures

Stephany Rajeh

April 4th, 2022, 11am
Room : 24-25/405

While there is a great deal of work on designing centrality measures, the mainstream does not exploit the network’s community structure. Nevertheless, communities are pervasive in many real-world networks. A community is generally apprehended as a group of nodes densely connected between each other and sparsely connected with other nodes. As communities play a significant role in understanding how nodes behave in networks, a research area concerned with the relation between community structure and the importance of nodes has recently emerged in network science. These works have shown that incorporating community structure information allows designing more effective centrality measures. We refer to them as “community-aware” centrality measures. In this talk, we shed light on how classical (i.e., community-agnostic) centrality measures relate to community-aware centrality measures given a network’s macroscopic and mesoscopic topology. Then, we show the subtility of using these measures in different dynamic models, namely the Susceptible-Infected-Recovered (SIR) model and the Linear Threshold (LT) model. Additionally, as there are plenty of works to detect overlapping communities, few scientists make use of the overlapping community structure to identify critical nodes. Indeed, nodes may belong to several communities in many situations, indicating an overlapping community structure. We propose a framework to target influential nodes in networks with an overlapping community structure inspired by the concept of vitality. Finally, ascribable to the significance of communities in real-world networks, we present a backbone extraction method that maintains the network’s modularity while essentially reducing its  original size.

Randomized reference models for complex networks

Christian Lyngby Vestergaard, PhD

March 25th, 2022, 11am
Room : 26-00/332

Video: https://nuage.lip6.fr/s/db5bsrFMssEAoQD

Many dynamical systems can be successfully analyzed by representing them as networks. Empirically measured networks and dynamic processes that take place in these situations show heterogeneous, non-Markovian, and intrinsically correlated topologies and dynamics. This makes their analysis particularly challenging. Randomized reference models (RRMs) have emerged as a general and versatile toolbox for studying such systems. Defined as random networks with given features constrained to match those of an input (empirical) network, they may for example be used to identify important features of empirical networks and their effects on dynamical processes unfolding in the network. RRMs are typically implemented as procedures that reshuffle an empirical network, making them very generally applicable. However, the effects of most shuffling procedures on network features remain poorly understood, rendering their use non-trivial and susceptible to misinterpretation. I will describe a unified framework for the important class of RRMs generated by uniform shuffling procedures, which we by analogy to statistical physics will name microcanonical RRMs (MRRMs). MRRMs constrain chosen features to take exactly the same value as in the empirical network but are otherwise maximally random. Our framework lets us build a taxonomy of MRRMs that orders them and deduces their effects on important network features. It additionally tells us how we may generate new MRRMs by composition of existing ones. I will show examples of how the framework can be applied to unravel the influence of different features of an empirical network of mobile-phone calls on the spread of information and to characterize structural circuit motifs in the brain wiring diagrams (connectomes) of small animals. If time permits, I will finally discuss how we may use graph compression techniques to alleviate the statistical problems associated to classical null hypothesis testing for network motif discovery.
Reference: hal.archives-ouvertes.fr/hal-01817633v4

Complex Systems approach to the emergence of socio-economic phenomena

Yérali Gandica

February 3rd 2022, 2pm; Room 26-00/332

Slides

In this talk, I will present some of my publications using Complex Systems approaches to understand the emergence of socio-economic phenomena. Complex System science aims to study the phenomena that emerge from the interactions between the constituents and, thus, cannot be understood by studying a singular, isolated component. The field has incorporated concepts and methods derived from many areas, ranging from statistical physics and biology to economics and sociology, which, in a constant process of cross-fertilization, have given rise to new types of questions framed into the field of Complex Systems. The study of interacting particle systems has, for along time, been an essential subject of physics. The use of statistical methods has allowed for significant advances in this area by providing a bridge between the microscopic interactions and the sizeable collective behaviour of the system. The application of the methods of statistical physics to social phenomena, where the interacting particles are now interacting human beings, has proven to be very fruitful in allowing for the understanding of many features of social collective behaviour. The success of the statistical physics approaches to explain social data is currently attracting much interest, as demonstrated by the rapidly increasing number of publications in the field from both natural and social scientists. Some of the works that I will present are based on simulations, while some others are based on real data. In some cases, the comparison between simulations and real data is achieved. I will also explain some works in progress and projects for the incoming medium and long term to give a flavour about the scientific path that I plan, to illustrate the coherence of my scientific line of research.

Random Spanning Forests on Graphs for Fast Laplacian-Based Computations

Nicolas Tremblay

lundi 17 janvier 2022, LIP6, Sorbonne Université

Graphs are ubiquitous tools to represent networks, may they be networks modeling data from neurosciences, sociology, molecular biology, chemistry, etc. A cornerstone of the analysis of graphs is the Laplacian matrix L that encodes their structure. From a linear algebra point-of-view, the analysis of L offers fundamental insights on key properties of the graph: the diffusion speed of an information or a disease on a network, the vulnerability of a network to targeted or random attacks, the redundancy of certain parts of the network, the network’s structure in more or less independent modules, are all examples of characteristics of a network one may extract from the Laplacian matrix.

In this work, we concentrate on two specific problems that often arise in the context of graph-based data: i/ compute inverse traces of the form Tr( (L+qI)^(-1) ), ii/ compute smoothing operations of the form (L+qI)^(-1) y where q>0 and y some vector defined over the nodes of the graph. These two problems arise in many well-known graph-based algorithms, such as semi-supervised learning, label propagation, graph Tikhonov regularization, graph inpainting, etc.

In the context of large graphs, the required inverse which scales as O(n^3) in the worst-case, is often too expensive in practice. Many approaches have been developed in the state-of-the-art to circumvent this problem: polynomial approximation and (preconditioned) conjugate gradient are the two most well-known.

In this work, we develop a new class of techniques based on random spanning forests. We show that these forests are natural candidates to provide original, efficient, and easy-to-implement estimators.

This is joint work with Pierre-Olivier Amblard, Luca Avena, Simon Barthelmé, Alexandre Gaudillière and Yusuf Yigit Pilavci

Clique percolation method: memory efficient almost exact communities

Alexis Baudin, Maximilien Danisch, Sergey Kirgizov, Clémence Magnien

International Conference on Advanced Data Mining and Applications (ADMA), 2021

Automatic detection of relevant groups of nodes in large real-world graphs, i.e. community detection, has applications in many fields and has received a lot of attention in the last twenty years. The most popular method designed to find overlapping communities (where a node can belong to several communities) is perhaps the clique percolation method (CPM). This method formalizes the notion of community as a maximal union of k-cliques that can be reached from each other through a series of adjacent k-cliques, where two cliques are adjacent if and only if they overlap on k-1 nodes. Despite much effort CPM has not been scalable to large graphs for medium values of k. Recent work has shown that it is possible to efficiently list all k-cliques in very large real-world graphs for medium values of k. We build on top of this work and scale up CPM. In cases where this first algorithm faces memory limitations, we propose another algorithm, CPMZ, that provides a solution close to the exact one, using more time but less memory.

Download

A logical approach for temporal and multiplex networks analysis

Esteban Bautista, Matthieu Latapy

In 10th International Conference on Complex Networks and their Applications, Madrid (Spain), December 2021 (Poster)

Many systems generate data as a set of triplets (a,b,c): they may represent that user a
called b at time c or that customer a purchased product b in store c. These datasets are
traditionally studied as networks with an extra dimension (time or layer), for which the
fields of temporal and multiplex networks have extended graph theory to account for
the new dimension [1]. However, such frameworks detach one variable from the others
and allow to extend one same concept in many ways, making it hard to capture pat-
terns across all dimensions and to identify the best definitions for a given dataset. This
work overrides this vision and proposes a direct processing of the set of triplets. While
[2] also approaches triplets directly, it focuses on specific patterns and applications.
Our work shows that a more general analysis is possible by partitioning the data and
building categorical propositions (CPs) that encode informative patterns. We show that
several concepts from graph theory can be framed under this formalism and leverage
such insights to extend the concepts to data triplets. Lastly, we propose an algorithm to
list CPs satisfying specific constraints and apply it to a real world dataset.

Download

Shared-memory implementation of the Karp-Sipser kernelization process

Johannes Langguth, Ioannis Panagiotas, Bora Uçar

28th edition of the IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2021), Dec 2021, Bangalore, India

We investigate the parallelization of the Karp-Sipser kernelization technique, which constitutes the central part of the well-known Karp-Sipser heuristic for the maximum cardinality matching problem. The technique reduces a given problem instance to a smaller but equivalent one, by repeated applications of two operations: vertex removal, and merging two vertices. The operation of merging two vertices poses the principal challenge in parallelizing the technique. We describe an algorithm that minimizes the need for synchronization and present an efficient shared-memory parallel implementation of the kernelization technique for bipartite graphs. Using extensive experiments on a variety of multicore CPUs, we show that our implementation scales well up to 32 cores on one socket.

Download

Full Bitcoin Blockchain Data Made Easy

Jules Azad Emery, Matthieu Latapy

ASONAM, 2021

Despite the fact that it is publicly available, collecting and processing the full bitcoin blockchain data is not trivial. Its mere size, history, and other features indeed raise quite specific challenges, that we address in this paper. The strengths of our approach are the following: it relies on very basic and standard tools, which makes the procedure reliable and easily reproducible; it is a purely lossless procedure ensuring that we catch and preserve all existing data; it provides additional indexing that makes it easy to further process the whole data and select appropriate subsets of it. We present our procedure in details and illustrate its added value on large-scale use cases, like address clustering. We provide an implementation online, as well as the obtained dataset.

Download

Assessing conservation of alternative splicing with evolutionary splicing graphs

Diego Javier Zea, Sofya Laskina, Alexis Baudin, Hugues Richard and Elodie Laine

Genome Research, 2021

Understanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalising the definition of splicing graph. We provide a practical solution to construct parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species. We show a clear link between the functional relevance, tissue-regulation and conservation of alternative transcripts on a set of 50 genes. By scaling up to the whole human protein-coding genome, we identify a few thousands of genes where alternative splicing modulates the number and composition of pseudo-repeats. We have implemented our approach in ThorAxe, an efficient, versatile, robust and freely available computational tool.

Download