Efficient and simple generation of random simple connected graphs with prescribed degree sequence

Fabien Viger and Matthieu Latapy

Journal of Complex Networks (2015) 4 (1): 15-37

We address here the problem of generating random graphs uniformly from the set of simple connected graphs having a prescribed degree sequence. Our goal is to provide an algorithm designed for practical use both because of its ability to generate very large graphs (efficiency) and because it is easy to implement (simplicity). We focus on a family of heuristics for which we introduce optimality conditions, and show how this optimality can be reached in practice. We then propose a different approach, specifically designed for real-world degree distributions, which outperforms the first one. Based on a conjecture which we argue rigorously and which was confirmed by strong empirical evidence, we finally reduce the best asymptotic complexity bound known so far.

RankMerging: Apprentissage supervisé de classements pour la prédiction de liens dans les grands réseaux sociaux

Lionel Tabourier, Anne-Sophie Libert et Renaud Lambiotte

EGC 2015, 15ème conférence internationale sur l’extraction et la gestion des connaissances

Trouver les liens manquants dans un grand réseau social est une tâche difficile, car ces réseaux sont peu denses, et les liens peuvent correspondre à des environnements structurels variés. Dans cet article, nous décrivons RankMerging, une méthode d’apprentissage supervisé simple pour combiner l’information obtenue par différentes méthodes de classement. Afin d’illustrer son intérêt, nous l’appliquons à un réseau d’utilisateurs de téléphones portables, pour montrer comment un opérateur peut détecter des liens entre les clients de ses concurrents. Nous montrons que RankMerging surpasse les méthodes à disposition pour prédire un nombre variable de liens dans un grand graphe épars.

Revealing intricate properties of communities in the bipartite structure of online social networks

Raphaël Tackx, Jean-Loup Guillaume and Fabien Tarissan

In IEEE Ninth International Conference on Research Challenges in Information Science (RCIS’15), Athènes, Greece, 2015

Many real-world networks based on human activities exhibit a bipartite structure. Although bipartite graphs seem appropriate to analyse and model their properties, it has been shown that standard metrics fail to reproduce intricate patterns observed in real networks. In particular, the overlapping of the neighbourhood of communities is difficult to capture precisely. In this work, we tackle this issue by analysing the structure of 4 real-world networks coming from online social activities. We first analyse their structure using standard metrics. Surprisingly, the clustering coefficient turns out to be less relevant than the redundancy coefficient to account for overlapping patterns. We then propose new metrics, namely the dispersion and the monopoly coefficients, and show that they help refining the study of bipartite overlaps. Finally, we compare the results obtained on real networks with the ones obtained on random bipartite models. This shows that the patterns captured by the redundancy and the dispersion coefficients are strongly related to the real nature of the observed overlaps.

Download

Temporal properties of legal decision networks: a case study from the International Criminal Court

Fabien Tarissan and Raphaëlle Nollez-Goldbach

In 28th International Conference on Legal Knowledge and Information Systems (JURIX’15), Braga, Portugal, 2015.

Many studies have proposed to apply artificial intelligence techniques to legal networks, whether it be for highlighting legal reasoning, resolving conflict or extracting information from legal databases. In this context, a new line of research has recently emerged which consists in considering legal decisions as elements of complex networks and conduct a structural analysis of the relations between the decisions. It has proved to be efficient for detecting important decisions in legal rulings. In this paper, we follow this approach and propose to extend structural analyses with temporal properties. We define in particular the notion of relative in-degree, temporal distance and average longevity and use those metrics to rank the legal decisions of the two first trials of the International Criminal Court. The results presented in this paper highlight non trivial temporal properties of those legal networks, such as the presence of decisions with an unexpected high longevity, and show the relevance of the proposed relative in-degree property to detect landmark decisions. We validate the outcomes by confronting the results to the one obtained with the standard in-degree property and provide juridical explanations of the decisions identified as important by our approach.

Download

Augmenter les retweets sur Twitter : comment tirer parti des mentions ?

Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Mohit Sharma, Sumanth Bandi, Jean-Loup Guillaume, Stéphane Raux and Bivas Mitra

6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015

Alors que Twitter est devenu incontournable, la propagation des tweets et hashtags est toujours largement incomprise. Le propagation d’information sur Twitter est principalement due aux retweets et aux mentions mais, alors que les retweets ne permettent d’atteindre que les abonnés d’un individu, les mentions permettent d’atteindre n’importe qui directement. De nombreuses études ont montré que les mentions sont largement utilisées sur Twitter, mais surtout qu’elles sont fondamentales pour augmenter la popularité des tweets et hashtags. Des méthodes automatiques pour choisir les bons utilisateurs à mentionner pourraient donc permettre d’augmenter la visibilité des tweets. Dans cet article nous proposons un système de recommandation de mentions en temps réel pour aug- menter la popularité d’un tweet. Ce système est basé sur un modèle de propagation de tweet dans un graphe multiplexe construit à partir d’une étude de données réelles. Il permet de clairement faire la différence entre les propagations dues aux mentions et celles dues aux abonnements. Les simulations du modèle donnent des résultats similaires aux observations empiriques et sont également fondées sur des résultats analytiques. En utilisant ces différents résultats nous proposons une stratégie de recom- mandation effective et une application Twitter associée.

Download

Déplier la structure communautaire dun réseau en mesurant la proximité aux représentants de communauté

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015

Nous proposons un algorithme pour déplier la structure communautaire des grands graphes de terrain. L’algorithme est basé sur la détection de la communauté de chaque représentant communau- taire : nœud contenu dans une seule communauté et important en son sein. Cette détection est faite avec une approche à base de mesure de proximité développée récemment. Par comparaison avec d’autres méthodes de l’état de l’art nous montrons que notre algorithme a des performances équivalentes voire meilleures et est capable de traiter les plus grands graphes de terrain.

Download

Détection de communautés dans les flots de liens par optimisation de la modularité

Emmanuel Orsini

6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015.

L’article qui suit propose de donner un sens à la modularité dans les flots de liens et ainsi de bénéficier de certaines de ses propriétés, et des heuristiques qui l’optimisent. Cette no- tion de modularité aboutira après quelques simplifications à un algorithme capable de calculer une partition sur un jeu de données de 400 000 emails. Pour ce faire on construira une nou- velle modélisation où le temps est complètement continu, sur laquelle la modularité se définit naturellement et de manière pertinente. Cette modélisation apporte une nouvelle interpretation des réseaux dynamiques, qui se veut suffisamment générale pour s’adapter à différents types de données.

Download

Temporal Patterns of Pedophile Activity in a P2P Network: First Insights about User Profiles from Big Data

Raphaël Fournier and Matthieu Latapy

International Journal of Internet Science ARTICLE IN PRES S 2015, 10 (1), ISSN 1662-5544

Recent studies have shown that child abuse material is shared through peer-to-peer (P2P) networks, which allow users to exchange files without a central server. Obtaining knowledge on the extent of this activity has major consequences for child protection, policy making and Internet regulation. Previous works have developed tools and analyses to provide overall figures in temporally-limited measurements. Offenders’ behavior is mostly studied through small-scale interviews and there is few information on the times at which they engage in such activity. Here we show that the proportion of search-engine queries for pedophile content gradually has grown by a factor of almost 3 in three years. We also find that during the day, certain hours are, on average, privileged by seekers. Our results demonstrate that P2P networks are actively used to search for pedophile content and we find new and large-scale results on pedophile offenders’ profile, indicating that a substantial proportion is well-integrated into family life and professional work activities.

Download

Densest subgraph computation and applications in finding events on social media

Oana Bllu

Vendredi 27 novembre 2015 à 11h, salle 24-25/405

Slides

Finding dense subgraphs in large graphs is a key primitive in a variety of real-world application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which might correspond to communities in social networks or interesting events. In this talk we present a natural generalization of the densest subgraph problem, where the main goal is to find at most k subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. We will also illustrate how finding dense subgraphs can be an important subroutine for event detection in social media. Social media has great potential, as apart from the traditional media sources, many users post updates on different events. The highly dynamic nature of social networks gives the benefit of timely updates and the huge amount of content the benefit of diversity and large coverage. However, finding events presents also non-trivial challenges given the large amount of noisy and irrelevant data present in social media.

Graph analysis of functional brain networks: theory, applications and issues

Fabrizio De Vico Fallani

Vendredi 4 décembre 2015 à 11h, salle 26-00/332

Slides

We have known for at least 100 years that the brain is organized as a network of connections between neuronal ensembles. However, only in the last 10 years there has been a rapid growth in our ability to quantify the complex topology of brain networks, using mathematical tools derived from graph theory. In this talk, I will present recent development of graph theoretical approaches to analyze brain networks and model (re)organizational principles of the neural function underlying behavior and outcome, in healthy and diseased conditions. The final part will be devoted to highlight some of the current issues related to complex brain network analysis.

Inferring synaptic connections from spike data of multiple neurons

Ryota Kobayashi

Vendredi 6 novembre 2015 à 11h, salle 26-00/332

Slides

Correlations in neuronal activity are defined as « functional connections » between pairs of neurons. It is still unclear how the functional connectivity is related to underlying (physiological) synaptic connectivity. Here, we develop a coupled escape rate model (CERM) to infer synaptic connections from spike data of multiple neurons. Estimation performance of the proposed method was compared to that of the previous methods by using a simulated data generated by a realistic cortical network model, which consists of thousands of detailed model neurons (Kitano & Fukai, 2007). We conclude that CERM method is a promising method to infer synaptic connections from multiple neural spike data. This is joint work with Katsunori Kitano.

Dependable Issues Resolved with Distributed Streams

Yann Busnel

Mardi 6 octobre 2015 à 11h, salle 24-25/405

The analysis of massive data streams is fundamental in many monitoring applications (e.g, Internet routers). For networks operators, it is a recurrent and crucial issue to determine whether huge data streams, received at their monitored devices, are correlated or not as it may reveal the presence of attacks. First, we propose a metric, called codeviation, that allows to evaluate the correlation between distributed streams. This metric is inspired from classical metric in statistics and probability theory, and as such enables to understand how observed quantities change together, and in which proportion. We then propose to estimate the codeviation in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We give upper and lower bounds on the quality of the codeviation, and provide both local and distributed algorithms that additively approximates the codeviation among data streams using sub-linear space. On the other hand, we consider the problem of identifying global iceberg attacks in massive and physically distributed streams. A global iceberg is a distributed denial of service attack, where some elements globally recur many times across the distributed streams, but locally, they do not appear as a deny of service. A natural solution to defend against global iceberg attacks is to rely on multiple routers that locally scan their network traffic, and regularly provide monitoring information to a server in charge of collecting and aggregating all the monitored information. Any relevant solution to this problem must minimise the communication between the routers and the coordinator, and the space required by each node to analyse its stream. We propose a distributed algorithm that tracks global icebergs on the fly with guaranteed error bounds, limited memory and processing requirements. We present a thorough analysis of our algorithm performance. In particular we derive an optimal upper bound on the number of bits communicated between the multiple routers and the coordinator in presence of an oblivious adversary. Finally, we present the main results of the experiments we have run on a cluster of single-board computers. Those experiments confirm the efficiency and accuracy of our algorithm to track global icebergs hidden in very large input data streams exhibiting different shapes.

Modèles de génération des graphes de collaboration multi-niveaux

Ghislain Romaric Meleu

Vendredi 2 octobre 2015 à 11h, salle 24-25/405

Slides

Les entêtes des articles scientifiques permettent de construire trois réseaux de collaborations : les réseaux des auteurs, des laboratoires et des institutions. Les réseaux sont corrélés du fait des relations d’affiliations qui existent entre les acteurs des trois réseaux. Nous appelons réseaux hiérarchiques (ou multi-niveaux) de tels réseaux; les réseaux de niveaux supérieurs étant déduits du réseau de co-publication des auteurs. Nous étudions une généralisation de ces réseaux hiérarchiques où un acteur est affilié à une organisation qui peut elle aussi être affiliée à une organisation de niveau supérieur et ainsi de suite. Les relations entre entités à un même niveau sont déduites de celles existantes entre entités de niveau inférieur. tre capable de générer artificiellement de tels réseaux suppose que l’on comprenne à la fois comment les acteurs (au niveau le plus bas) interagissent, mais aussi la dynamique des affiliations aux organisations. Nous proposons une première analyse de tels réseaux. Nous commençons par construire un modèle de génération de graphes de collaboration (de niveau 0) basé sur l’arrivée de petites cliques. Nous observons expérimentalement que ces réseaux ainsi que les réseaux de niveaux supérieurs déduits à partir d’un modèle d’affiliation par attachement préférentiel sont des réseaux small-world et scale-free. Les démonstrations sont faites pour le niveau 0.

Compact routing, main results and techniques

Christian Glacet

Vendredi 25 septembre 2015 à 11h, salle 24-25/405

Slides

Message routing is a central activity in any interconnection network. Route efficiency and memory requirements are two major central parameters in the design of a routing scheme. Routing along short paths is clearly desirable, and the storage of the routing information at each node must also be limited to allow quick routing decision, fast update, and scalability. There is a trade-off between the route efficiency (measured in terms of stretch) and the memory requirements (measured by the size of the routing tables). For a n nodes network, the classical shortest path routing scheme achieve stretch 1 with n entries per node (router). Compact routing techniques offer different tradeoffs by allowing routing detours. It is known that in order to guaranty that every route has a stretch strictly lower than 2k+1 the routing tables must have size of order n^1/k at least. Is it actually possible to design routing schemes that attain these lower bounds? This is the question I will answer during my talk, explaining in the mean time the main algorithmic ideas used to achieve the currently best known trade-offs. Finally I’ll also introduce the more specific problem of compact routing in « internet-like » graphs.

Strategic Analysis and Design of Robust and Resilient Interdependent Power and Communication Networks with a New Model of Interdependency

Arunabha Sen

Vendredi 11 septembre 2015 à 11h, salle 26-00/332

Slides

The critical infrastructures of the nation such as the power grid and the communication network are highly interdependent. Recognizing the need for a deeper understanding of the interdependency in a multi-layered network, significant efforts have been made in the research community in the last few years to achieve this goal. Accordingly a number of models have been proposed and analyzed. Unfortunately, most of the models are over simplified and as such they fail to capture the complex interdependency that exists between entities of power grid and communication networks involving a combination of conjunctive and disjunctive relations. To overcome the limitations of existing models, we have recently proposed a new model that is able to capture such complex interdependency relations. Utilizing this model, we have studied a number of problems, including (i) identification the k most vulnerable nodes of an interdependent network, (ii) entity hardening problem, (iii) progressive recovery problem, (iv) targeted attack problem and several others. In this talk, we first present the new model and then discuss several problems that has been studied utilizing this model.

Time Evolution of the Importance of Nodes in dynamic Networks

Clémence Magnien and Fabien Tarissan.

In proceedings of the International Symposium on Foundations and Applications of Big Data Analytics (FAB), in conjunction with ASONAM, 2015.

For a long time now, researchers have worked on defining different metrics able to characterize the importance of nodes in networks. Among them, centrality measures have proved to be pertinent as they relate the position of a node in the structure to its ability to diffuse an information efficiently. The case of dynamic networks, in which nodes and links appear and disappear over time, led the community to propose extensions of those classical measures. Yet, they do not investigate the fact that the network structure evolves and that node importance may evolve accordingly. In the present paper, we propose temporal extensions of notions of centrality, which take into account the paths existing at any given time, in order to study the time evolution of nodes’ importance in dynamic networks. We apply this to two datasets and show that the importance of nodes does indeed vary greatly with time. We also show that in some cases it might be meaningless to try to identify nodes that are consistently important over time, thus strengthening the interest of temporal extensions of centrality measures.

Download

A reliable and evolutive web application to detect social capitalists

Nicolas Dugué, Anthony Perez, Maximilien Danisch, Florian Bridoux, Amélie Daviau, Tennessy Kolubako, Simon Munier and Hugo Durbano.

IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2015, Paris. (Demo track paper)

On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in order to gain followers and to be retweeted. Their methods are successful enough to make them appear as influent users. Indeed, applications dedicated to the influence measurement such as Klout and Kred give high scores to most of these users. Meanwhile, their high number of retweets and followers are not due to the relevance of the content they tweet, but to their social capitalism techniques. In order to be able to detect these users, we train a classifier using a dataset of social capitalists and regular users. We then implement this classifier in a web application that we call DDP. DDP allows users to test whether a Twitter account is a social capitalist or not and to visualize the data we use to make the prediction. DDP allows administrator to crawl data from a lot of users automatically. Furthermore, administrators can manually label Twitter accounts as social capitalists or regular users to add them into the dataset. Finally, administrators can train new classifiers in order to take into account the new Twitter accounts added to the dataset, and thus making evolve the classifier with these new recently collected data. The web application is thus a way to collect data, make evolve the knowledge about social capitalists and to keep detecting them efficiently.

Download

Revealing contact patterns among high-school students using maximal cliques in link streams

Tiphaine Viard, Matthieu Latapy, Clémence Magnien

First International Workshop on Dynamics in Networks (DyNo), in conjunction with ASONAM, 2015.

Interaction traces between humans are usually rich in information concerning the patterns and habits of individuals. Such datasets have been recently made available, and more and more researchers address the new questions raised by this data. A link stream is a sequence of triplets (t, u, v) indicating that an interaction occurred between u and v at time t, and as such is a natural representation of these data. We generalize the classical notion of cliques in graphs to such link streams: for a given , a -clique is a set of nodes and a time interval such that all pairs of nodes in this set interact at least every during this time interval. We proceed to compute the maximal -cliques on a real-world dataset of contact among students, and show how it can bring new interpretation to patterns of contact.

Download

Classes of digraphs defined by forbidding induced subdigraphs and their chromatic-number

Pierre Aboulker

Jeudi 09 juillet 2015 à 11h, salle 26-00/332

Slides

A class of graphs is $chi$-bounded if there exists a function f such that for any graph G in the class, $chi(G)$ f ((G)). Gyrfas conjectured that for any tree T, the class of graphs that do not contain T as an induced subgraph is $chi$-bounded. We investigate the oriented analogue of this Conjecture. This a joint work with J. Bang-Jensen, N. Bousquet, P. Charbit, F. Havet, F. Maffray, S. Thomassé and J. Zamora

Inhomogeneous Hypergraphs

lie de Panafieu

Jeudi 02 juillet 2015 à 11h, salle 24-25/405

Slides

We introduce the inhomogeneous hypergraph model. Each edge can contain an arbitrary number of vertices, the vertices are colored, and each edge receives a weight which depends on the colors of the vertices it contains. This model provides a uniform setting to solve problems arising from various domains of computer science and mathematics. We will focus on applications to the enumeration of satisfied and satisfiable instances of Constraint Satisfaction Problems (CSP), and compute the limit probability for a random graph to be bipartit, the limit probability of satisfiability of systems of equations, the enumeration of properly k-colored graphs and investigate some graphs coming from social networks. We will present results on the asymptotics of inhomogeneous hypergraphs and their typical structure. Our main tool is analytic combinatorics.