Diffusion Cascades: Spreading Phenomena in Blog Network Communities

Abdelhamid Salah Brahim, Bénédicte Le Grand and Matthieu Latapy

Parallel Processing Letters 22(1): (2012)

A diffusion cascade occurs when information spreads from one node to the rest of the network through a succession of diffusion events. So far diffusion phenomena have been mostly considered at a macroscopic scale i.e. by studying all nodes of the network. We give a complementary way to analyse network interactions by considering the problem at different scales. To that purpose, we use the community structure of the network to characterize diffusion between nodes (and between communities) and to identify interactions behaviour patterns.

Download

Modèles de graphes aléatoires pour l’analyse de réseaux

Pierre Latouche

Jeudi 14 Juin 2012 à 11h, salle 26-00/101

Slides

Les réseaux sont largement utilisés en sciences sociales afin de décrire les intéractions entre individus. Dans ce contexte, de nombreuses méthodes non-supervisées de clustering ont été développées afin d’extraire des informations, à partir de la topologie des réseaux. La plupart d’entre elles partitionne les noeuds dans des classes disjointes, en fonction de leurs profils de connection. Récemment, des études ont mis en évidence les limites de ces techniques. En effet, elles ont montré qu’un grand nombre de réseaux « réels » contenaient des noeuds connus pour appartenir à plusieurs groupes simultanément. Pour répondre à ce problème, nous proposons le modèle à blocs stochastiques chevauchants, Overlapping Stochastic Block Model (OSBM) en anglais. Cette approche autorise les noeuds à appartenir à plus d’une classe et généralise le très connu Stochastic Block Model, sous certaines hypothèses. Nous proposons un algorithme d’inférence permettant de classer les nouds d’un réseau, ainsi qu’un critère de sélection de modèles pour estimer le nombre de classes. Nous utilisons ces travaux pour analyser la blogosphère politique française.

Deciding on the type of the degree distribution of a graph from traceroute-like measurements

Xiaomin Wang, Matthieu Latapy, Michèle Soria

International Journal of Computer Networks & Communications (IJCNC), May 2012, Volume 4. Number 3

The degree distribution of the Internet topology is considered as one of its main properties. However, it is only known through a measurement procedure which gives a biased estimate. This measurement may in first approximation be modeled by a BFS (Breadth-First Search) tree. We explore here our ability to infer the type (Poisson or power-law) of the degree distribution from such a limited knowledge. We design procedures which estimate the degree distribution of a graph from a BFS of it, and show experimentally (on models and real-world data) that this approach succeeds in making the difference between Poisson and power-law degree distributions.

Download

Relevance of SIR Model for Real-world Spreading Phenomena: Experiments on a Large-scale P2P System

Daniel F. Bernardes, Matthieu Latapy, Fabien Tarissan

Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), Istanbul, Turkey

Understanding the spread of information on complex networks is a key issue from a theoretical and applied perspective. Despite the effort in developing theoretical models for this phenomenon, gauging them with large-scale real-world data remains an important challenge due to the scarcity of open, extensive and detailed data. In this paper, we explain how traces of peer-to-peer file sharing may be used to this goal. We also perform simulations to assess the relevance of the standard SIR model to mimic key properties of spreading cascade. We examine the impact of the network topology on observed properties and finally turn to the evaluation of two heterogeneous versions of the SIR model. We conclude that all the models tested failed to reproduce key properties of such cascades: typically real spreading cascades are relatively “elongated” compared to simulated ones. We have also observed some interesting similarities common to all SIR models tested.

Download

Outskewer: Using Skewness to Spot Outliers in Samples and Time Series

Sébastien Heymann, Matthieu Latapy, Clémence Magnien

Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), Istanbul, Turkey

Finding outliers in datasets is a classical problem of high interest for (dynamic) social network analysis. However, most methods rely on assumptions which are rarely met in practice, such as prior knowledge of some outliers or about normal behavior. We propose here Outskewer, a new approach based on the notion of skewness (a measure of the symmetry of a distribution) and its evolution when extremal values are removed one by one. Our method is easy to set up, it requires no prior knowledge on the system, and it may be used on-line. We illustrate its performance on two data sets representative of many use-cases: evolution of ego-centered views of the internet topology, and logs of queries entered into a search engine.

Download

Complex Networks approach to Mutualistic Ecosystems

Laura Hernandez

Jeudi 24 mai 2012 à 11h, salle 25-26/101

Mutualistic ecosystems are usually groups of animals and plants, helping each other to fulfil essential biological functions such as feeding or reproduction as in seed dispersal or pollination networks. Such systems may be described in terms of a complex network, where the nodes represent the animal or plant species and the links represent the existence of a contact between a plant and an animal species. As only contacts between nodes belonging to different guilds are allowed, the corresponding network is bipartite. Coding this information in a bipartite adjacency matrix, it is observed that real ecosystems are not a random collection of interacting species, but they display instead, a high degree of internal organization. Different hypothesis are discussed in the ecological literature to explain this particular order. It is fairly obvious that a detailed explanation of the interaction behaviour of individual species can be of little help to understand the generalized pattern that is found across ecological systems of very different sizes and types, that involve plants of different nature and animals that range from insects to birds. The tools commonly used by ecologists to study these systems are based on the statistical analysis of observed data. In this talk I will present an alternative way to study this problem, by introducing an algorithm that allows us to try different supposed hypothesis in the form of a Contact Preference Rule (CPR) that governs the dynamics of the system. Starting from a random configuration the system is evolved under the studied CPR and the comparison of the order state reached by this artificial system with the order observed in real systems allows us to decide whether a CPR may be considered or not as responsible for the observed order. In particular, I will introduce a new way to measure the order of mutualistic ecosystems and I will discuss about the relationship between the phylogenetic proximity of the members of each guild and the observed order.

How to detect causality effects on large dynamical communication networks: A case study

Tabourier, L. and Stoica, A. and Peruani, F.

Communication Systems and Networks (COMSNETS), 2012 Fourth International Conference on

Here we propose a set of dynamical measures to detect causality effects on communication datasets. Using appropriate comparison models, we are able to enumerate patterns containing causality relationships. This approach is illustrated on a large cellphone call dataset: we show that specific patterns such as short chain-like trees and directed loops are more frequent in real networks than in comparison models at short time scales. We argue that these patterns – which involve a node and its close neighborhood – constitute indirect evidence of active spreading of information only at a local level. This suggests that mobile phone networks are used almost exclusively to communicate information to a closed group of individuals. Furthermore, our study reveals that the bursty activity of the callers promotes larger patterns at small time scales.

Download

Classifying Relationships in Social Networks

Aline Carneiro Viana

Lundi 14 mai 2012 à 11h, salle 25-26/101

Slides

The constant advancement of information systems has allowed more data to be generated and stored from the most diverse situations. It is fascinating that, behind these records, we see the reflection of the environment itself, since every record represents a decision made by some entity. In this work, we modeled real-world scenarios of mobility from using temporal complex networks. The analysis assumes that these systems are composed of entities able to interact in a rational manner, reflecting their interests and activity dynamic. In this direction, we propose a technique for analyzing mobility scenarios from random graphs. This technique examines how the real system would evolve if the agents decisions were random, and from there, you can check, for example, which edges are random and which are derived from social relationships, such as friendship or professional.

Stable community cores in complex networks

Massoud Seifi, Jean-Loup Guillaume, Ivan Junier, Jean-Baptiste Rouquier and Svilen Iskrov

Proceedings of the 3rd Workshop on Complex Networks (CompleNet 2012), Melbourne, Florida

Complex networks are generally composed of dense sub-networks called communities. Many algorithms have been proposed to automatically detect such communities. However, they are often unstable and behave non-deterministically. We propose here to use this non-determinism in order to compute groups of nodes on which community detection algorithms agree most of the time.We show that these groups of nodes, called community cores, are more similar to Ground Truth than communities in real and artificial networks. Furthermore, we show that in contrary to the classical approaches, we can reveal the absence of community structure in random graphs.

Download

Community Cores in Evolving Networks

Massoud Seifi and Jean-Loup Guillaume

Proceedings of the Mining Social Network Dynamic 2012 Workshop (MSND), Inconjunction with the international conference World Wide Web WWW 2012, Lyon,France, pp. 1173-1180

Community structure is a key property of complex networks.Many algorithms have been proposed to automatically detect communities in static networks but few studies haveconsidered the detection and tracking of communities in anevolving network. Tracking the evolution of a given community over time requires a clustering algorithm that producesstable clusters. However, most community detection algorithms are very unstable and therefore unusable for evolvingnetworks. In this paper, we apply the methodology proposedin [14] to detect what we call community cores in evolvingnetworks. We show that cores are much more stable than »classical » communities and that we can overcome the disadvantages of the stabilized methods.

Download

Intrinsically dynamic communities from evolving, directed network data

Bivas Mitra, Lionel Tabourier and Camille Roth

Computer Networks, Vol. 56(3), 2012.

Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. We assume that commu- nities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information: we suggest that it blurs essentially dynamic phe- nomena—such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. We propose a formalism which aims at tackling this issue in the context of time-directed data- sets (such as citation networks), and present several illustrations on both empirical and synthetic dynamic networks. We eventually introduce intrinsically dynamic metrics to qualify temporal community structure and emphasize their possible role as an estimator of the quality of the community detection—taking into account the fact that various empir- ical contexts may call for distinct ‘community’ definitions and detection criteria.

Download

Impact of clustering on epidemics in random networks

Emilie Coupechoux

Lundi 2 avril 2012 à 14h, salle 55-65/211

Slides

Motivated by the analysis of social networks, we study a model of network that has both a given degree distribution and a tunable clustering coefficient. We analyze two types of epidemic processes on this random graph model: a diffusion process, which is characterized by an infection probability, each neighbor transmitting the epidemic independently, and a contagion model, which is inspired by a simple coordination game played on the network. Both types of processes have been used to model spread of new ideas, technologies, viruses or worms and results have been obtained for random graphs with no clustering. In this talk, we are interested in the impact of clustering on the growth processes. In both cases, we characterize conditions under which a global cascade is possible, and compute the cascade size explicitly, as a function of the degree distribution and the clustering coefficient. While clustering inhibits the diffusion process (in power-law and regular graphs), its impact for the contagion process is more subtle and depends on the connectivity of the graph: in a low connectivity regime, clustering also inhibits the contagion, while in a high connectivity regime, clustering favors the appearance of global cascades but reduces their size.

File Diffusion in a Dynamic Peer-to-peer Network

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand

Proceedings of the  Mining Social Network Dynamic 2012 Workshop (MSND), In conjunction with the international conference World Wide Web WWW 2012, Lyon, France

Many studies have been made on diffusion in the field of epidemiology, and in the last few years, the development of social networking has induced new types of diffusion. In this paper, we focus on file diffusion on a peer-to-peer dynamic network using eDonkey protocol. On this network, we observe a linear behavior of the actual file diffusion. This result is interesting, because most diffusion models exhibit exponential behaviors. In this paper, we propose a new model of diffusion, based on the SI (Susceptible / Infected) model, which produces results close to the linear behavior of the observed diffusion. We then justify the linearity of this model, and we study its behavior in more details.

Download

Quantifying Paedophile Activity in a Large P2P System

Matthieu Latapy, Clémence Magnien et Raphaël Fournier

in Information Processing and Management, Volume 49, Issue 1, January 2013, Pages 248–263

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25% of queries are paedophile, and that more than 0.2% of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Download

Dynamics on and of subway networks

Camille Roth

Vendredi 2 mars 2012 à 14h, salle 25/26-101

Slides

Subway networks shape, to some extent, the structure of movements of individuals across a city; similarly, they are being partially shaped by the presence of these individuals in the city. This talk will present two complementary studies describing the dynamic processes which subway networks both host and undergo. The first analysis focuses on dynamics processes occurring on the subway network of a large city (London) in terms of its commuting patterns. It uses the large scale, real-time electronic ticketing data from the Oyster Card system, introduced less than a decade ago, to reveal a part of the structure and organization of the city. More precisely, this study shows that patterns of intraurban movement are strongly heterogeneous in terms of volume, but not in terms of distance travelled, and that there is a polycentric structure composed of large flows organized around a limited number of activity centers. For smaller flows, the pattern of connections becomes richer and more complex and is not strictly hierarchical since it mixes different levels consisting of different orders of magnitude. The second study investigates the temporal evolution of the major subway networks in the world over the last century. The main result is that most of these networks tend to converge to a shape which shares some generic features, despite their geographical and economical differences. These features include a core with branches radiating from it to cover about twice the average radial extension of the core. The core generally includes about 60% of the network stations and exhibits an average degree of order 2.5. Interestingly, core and branches define two distinct and universal regimes in terms of the number of stations at a given distance from the barycenter. This result which was difficult to interpret in the framework of fractal geometry finds here a natural explanation. More broadly, these two types of studies open the way to more integrated analyses of the coevolution between the dynamics on and of subway networks.

Local community identification in social networks

Blaise Ngonmang

Jeudi 22 mars 2012 à 11h, salle 25-26/101

Slides

In social networks, the detection of communities has gained considerable interest because it can be used for instance for visualization, recommendation in business applications or the analysis of the spread of infectious diseases. Many methods proposed in the litera- ture for the solution of this problem, assume that the structure of the entire network is known, which is not realistic for very large and dynamic networks. For this reason, approaches have been introduced recently to find the local community of a node. Most of these methods often fail when the starting node is at the boundary of a community. In addition, they are not able to detect overlapping communities. In this work, we propose new methods to find local communities that don’t have these drawbacks. Experiences on real and computer generated social networks such as Netscience, Amazon 2006 and Lan- cichinetti et al.’s benchmark show that these methods perform better than the solutions with which the comparisons were performed.

Impact of power-law topology on IP-level routing dynamics: simulation results

Amélie Medem, Clémence Magnien and Fabien Tarissan

Fourth International Workshop on Network Science for Communication Networks (NetSciCom), 2012

This paper focuses on the Internet IP-level routing topology and proposes relevant explanations to its apparent dynamics.We first represent this topology as a power-law random graph. Then, we incorporate to the graph two well known factors responsible for the observed dynamics, which are load balancing and route evolution. Finally, we simulate on the graph traceroute-like measurements. Repeating the process many times, we obtain several graph instances that we use to model the dynamics. Our results show that we are able to capture on power-law graphs the dynamic behaviors observed on the Internet. We find that the results on power-law graphs, while qualitatively similar to the one of  Erdös-Rényi  random graphs, highly differ quantitatively; for instance, the rate of discovery of new nodes in power-law graphs is extremely low compared to the rate in Erdös-Rényi graphs.

Download

Internal links and pairs as a new tool for the analysis of bipartite complex networks

Oussama Allali, Lionel Tabourier, Clémence Magnien, Matthieu Latapy

Social Network Analysis and Mining, March 2013, Volume 3, Issue 1, pp 85-91.

Many real-world complex networks are best modeled as bipartite (or 2-mode) graphs, where nodes are divided into two sets with links connecting one side to the other. However, there is currently a lack of methods to analyze properly such graphs as most existing measures and methods are suited to classical graphs. A usual but limited approach consists in deriving 1-mode graphs (called projections) from the underlying bipartite structure, though it causes important loss of information and data storage issues. We introduce here internal links and pairs as a new notion useful for a bipartite analysis, and which gives insights on the information lost by projecting the bipartite graph. We illustrate the relevance of theses concepts on several real-world instances illustrating how it enables to discriminate behaviors among various cases when we compare them to a benchmark of random graphs. Then, we show that we can draw benefit from this concept for both modeling complex networks and storing them in a compact format.

Download

On Pagerank, teleportation and modelling dynamics in complex networks

Renaud Lambiotte

Jeudi 16 février 2012 à 11h – salle 55-65/211

Slides

In this talk, I will present recent results from 2 recent papers. i) Random teleportation is a necessary evil for ranking and clustering directed networks based on random walks. Teleportation enables ergodic solutions, but the solutions must necessarily depend on the exact implementation and parametrization of the teleportation. For example, in the commonly used PageRank algorithm, the teleportation rate must trade off a heavily biased solution with a uniform solution. Here we show that teleportation to links rather than nodes enables a much smoother trade-off and effectively more robust results. We also show that, by not recording the teleportation steps of the random walker, we can further reduce the effect of teleportation with dramatic effects on clustering. ii) The traditional way of studying temporal networks is to aggregate the dynamics of the edges to create a static weighted network. This implicitly assumes that the edges are governed by Poisson processes, which is not typically the case in empirical temporal networks. Consequently, we examine the effects of non-Poisson inter-event statistics on the dynamics of edges, and we apply the concept of a generalized master equation to the study of continuous-time random walks on networks. We show that the equation reduces to the standard rate equations when the underlying process is Poisson and that the stationary solution is determined by an effective transition matrix whose leading eigenvector is easy to calculate. We discuss the implications of our work for dynamical processes on temporal networks and for the construction of network diagnostics that take into account their nontrivial stochastic nature.

Dynamics on networks for communities, centralities and consensus

Jean-Charles Delvenne

Lundi 6 février 2012 à 11h – salle 25-26/105

Dynamical systems taking place on networks, such as opinion dynamics, synchronization, consensus or random walks, reveal a lot about their structure. In particular we show, through a dynamical reinterpretation of well-known concepts, how centrality measures (such as pagerank, eigencentrality, etc.) and community detection quality functions (such as modularity, Potts, model, stability, etc.) are intimately related. The dynamical interpretation allows to design new centrality or community detection measures tailored for every particular application.