Quantifying paedophile queries in a large P2P system

Matthieu Latapy, Clémence Magnien and Raphaël Fournier

IEEE Infocom Mini-Conference 2011, Shanghai

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25 % of queries are paedophile, and that more than 0.2 % of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Download

Optimisation locale multi-niveaux de la modularité

Thomas Aynaud, Vincent Blondel, Jean-Loup Guillaume and Renaud Lambiotte

in Partitionnement de graphe : optimisation et applications, Traité IC2, Hermes-Lavoisier 2011

Dans ce chapitre, nous présentons une méthode gloutonne pour optimiser la modularité d’un graphe. Cette méthode de partionnement permet de traiter avec une excellente précision des systèmes de taille inégalée, allant jusqu’à plusieurs milliards de liens. Notre algorithme a de surcroît l’avantage de ne pas être limité à l’optimisation de la modularité puisqu’il peut être généralisé à d’autres fonctions de qualité, et de découvrir des communautés à différentes échelles. Les performances de l’algorithme sont évaluées sur des graphes artificiels pour lesquels la structure communautaire est connue, ainsi que sur des graphes de terrain réels.

Long range community detection

Thomas Aynaud and Jean-Loup Guillaume

Latin-American Workshop on Dynamic Networks (LAWDN), Buenos Aires, 2010

Complex networks can usually be divided in dense subnetworks called communities. In evolving networks, the usual way to detect communities is to find several partitions independently, one for each time step. However, this generally causes troubles when trying to track communities from one time step to the next. We propose here a new method to detect only one decomposition in communities that is good for (almost) every time step. We show that this unique partition can be computed with a modification of the Louvain method and that the loss of quality at each time step is generally low despite the constraint of global maximization. We also show that some specific modifications of the networks topology can be identified using this unique partition in the case of the Internet topology.

Download

Détection de communautés à long terme dans les graphes dynamiques

Thomas Aynaud and Jean-Loup Guillaume

Journée thématique Fouille de grands graphes, en conjonction avec la première conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Toulouse, France, 2010

La plupart des graphes de terrain peuvent être décomposés en sous graphes denses appelés communautés. Habituellement, dans des graphes dynamiques, les communautés sont détectées pour chaque instant indépendamment ce qui pose de nombreux problèmes tels que la stabilité ou le suivi de des communautés entre deux décompositions successives. Nous proposons ici une méthode pour trouver une partition unique, de qualité, couvrant une longue période. Cette décomposition peut être trouvée efficacement via une adaptation de la méthode de Louvain et la perte de qualité à chaque instant due à la contrainte de détecter des communautés globales s’avère assez faible.

Download

Termination of Multipartite Graph Series Arising from Complex Network Modelling

Matthieu Latapy, Thi Ha Duong Phan, Christophe Crespelle, Thanh Qui Nguyen

COCOA 2010

An intense activity is nowadays devoted to the definition of models capturing the properties of complex networks. Among the most promising approaches, it has been proposed to model these graphs via their clique incidence bipartite graphs. However, this approach has, until now, severe limitations resulting from its incapacity to reproduce a key property of this object: the overlapping nature of cliques in complex networks. In order to get rid of these limitations we propose to encode the structure of clique overlaps in a network thanks to a process consisting in iteratively factorising the maximal bicliques between the upper level and the other levels of a multipartite graph. We show that the most natural definition of this factorising process leads to infinite series for some instances. Our main result is to design a restriction of this process that terminates for any arbitrary graph. Moreover, we show that the resulting multipartite graph has remarkable combinatorial properties and is closely related to another fundamental combinatorial object. Finally, we show that, in practice, this multipartite graph is computationally tractable and has a size that makes it suitable for complex network modelling.

Download

Evaluation of a new method for measuring the internet degree distribution: Simulation results

Christophe Crespelle and Fabien Tarissan

in Complex Networks, special issue of Computer Communications, 34-5, pp. 635-648 (2011), DOI: 10.1016/j.comcom.2010.06.006

Many contributions rely on the degree distribution of the Internet topology. However, current knowledge of this property is based on biased and erroneous measurements and is subject to much debate. Recently, a new approach, referred to as the Neighborhood Flooding method, was proposed to avoid issues raised by classical measurements. It aims at measuring the neighborhood of Internet core routers by sending traceroute probes from many monitors distributed in the Internet towards a given target router. In this paper, we investigate the accuracy of this method with simulations. Our results show that Neighborhood Flooding is free from the bias highlighted in the classical approach and is able to observe properly the exact degree of a vast majority of nodes in the core of the network. We show how the quality of the estimation depends on the number of monitors used and we carefully examine the influence of parameters of the simulations on our results. We also point out some limitations of the Neighborhood Flooding method and discuss their impact on the observed distribution.

Download

Static community detection algorithms for evolving networks

Thomas Aynaud and Jean-Loup Guillaume

Proceedings of International Workshop on Dynamic Networks (WDN), in conjunction with WiOpt 2010, pages 508-514

Complex networks can often be divided in dense sub-networks called communities. We study, using a partition edit distance, how three community detection algorithms transform their outputs if the input network is sligthly modified. The instabilities appear to be important and we propose a modification of one algorithm to stabilize it and to allow the tracking of the communities in a dynamic network. This modification has one parameter which is a tradeoff between stability and quality. The resulting algorithm appears to be very effective. We finally use it on a dynamic network of blogs.

Download

Some Insight on Dynamics of Posts and Citations in Different Blog Communities

Abdelhamid Salah Brahim, Bénédicte Le Grand and Matthieu Latapy

IEEE ICC 2010 workshop « SocNets », Cape Town, South Africa, May 2010

This paper explores new approaches and methods to characterize post and citation dynamics in different blog communities. In particular, evolution of post popularity over time is studied, as well as information spreading cascades. This methodology goes beyond traditional approaches by defining classes of dynamic behaviors based on topological features of the post network, and by investigating the impact of topical communities on post popularity dynamics and on information spreading cascades. This methodology has been applied to a corpus of active French blogs monitored during 4 months.

Download

Impact of Sources and Destinations on the Observed Properties of the Internet Topology

Frédéric Ouédraogo, Clémence Magnien

Computer Communications, 34, 670-679, 2011

Maps of the internet topology are generally obtained by measuring the routes from a given set of sources to a given set of destinations (with tools such as traceroute). It has been shown that this approach misses some links and nodes. Worse, in some cases it can induce a bias in the obtained data, i.e. the properties of the obtained maps are significantly different from those of the real topology. In order to reduce this bias, the general approach consists in increasing the number of sources. Some works have studied the relevance of this approach. Most of them have used theoretical results, or simulations on network models. Some papers have used real data obtained from actual measurement procedures to evaluate the importance of the number of sources and destinations, but no work to our knowledge has studied extensively the importance of the choice of sources or destinations. Here, we use real data from internet topology measurements to study this question: by comparing partial measurements to our complete data, we can evaluate the impact of adding sources or destinations on the observed properties. We show that the number of sources and destinations used plays a role in the observed properties, but that their choice, and not only their number, also has a strong influence on the observations. We then study common statistics used to describe the internet topology, and show that they behave differently: some can be trusted once the number of sources and destinations are not too small, while others are difficult to evaluate.

Download

Age, Gender and Communication Networks

Alina Stoica, Zbigniew Smoreda, Christophe Prieur and Jean-Loup Guillaume

Proceedings of the Workshop on the Analysis of Mobile Phone Networks, satellite workshop to NetSci 2010

In this paper, we address some sociological andtopological issues associated with mobile phone communication.Based on a dataset of a few million users, we use customers’ ageand gender information to study relation between theseparameters and the average behavior of users in terms of numberof calls, number of SMS and calls duration. We also study thedataset from a networking point of view: we define differentprofiles based on the topological properties of the personalnetwork of each individual and study the relations between theseprofiles and the age of customers

Download

Structure multi-échelle de grands graphes de terrain

Thomas Aynaud and Jean-Loup Guillaume

in Technique et Science Informatiques, vol. 30/2, pp. 137-154, 2011

Most complex networks can be divided into dense sub-graphs called communities. These communities may also be divided recursively producing a hierarchical structure of communities, summarized in a tree named dendrogram. In this article we analyze this structure extracted from several complex networks. First we study the shape of the tree and, in particular, communities imbrications. Then we show that an excessive decomposition of communities can result in meaningless communities. We propose a couple of approaches to solve this problem.

Download

Detecting Events in the Dynamics of Ego-centered Measurements of the Internet Topology

Assia Hamzaoui, Matthieu Latapy and Clémence Magnien

Proceedings of International Workshop on Dynamic Networks (WDN), in conjunction with WiOpt 2010

Detecting events such as major routing changes or congestions in the dynamics of the internet topology is an important but challenging task. We explore here a top-down approach based on a notion of statistically significant events. It consists in identifying statistics which exhibit a homogeneous distribution with outliers, which correspond to events. We apply this approach to ego-centerd measurements of the internet topology (views obtained from a single monitor) and show that it succeeds in detecting meaningful events. Finally, we give some hints for the interpretation of such events in terms of network events.

Download

Estimating properties in dynamic systems: the case of churn in P2P networks

Lamia Benamara and Clémence Magnien

Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.

In many systems, such as P2P systems, the dynamicity of participating elements, or churn, has a strong impact. As a consequence, many efforts have been made to characterize it, and in particular to capture the session length distribution. However in most cases, estimating it rigorously is difficult. One of the reasons is that, because the observation window is by definition finite, parts of the sessions that begin before the window and/or end after it are missed. This induces a bias. Although it tends to decrease when the observation window length increases, it is difficult to quantify its importance, or how fast it decreases. Here, we introduce a general methodology that allows us to know if the observation window is long enough to characterize a given property. This methodology is not specific to one study case and may be applied to any property in a dynamic system. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We show that the measurement needs to last for at least one week in order to obtain representative results. We also show that our methodology allows us to precisely characterize the shape of the session length distribution.

Download

Rigorous measurement of IP-level neighborhood of Internet core routers

Christophe Crespelle, Matthieu Latapy, Elie Rotenberg

Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.

Many contributions use the degree distribution of IP-level internet topology. However, current knowledge of this property relies on biased and erroneous measurements, and so it is subject to much debate. We introduce here a new approach, dedicated to the core of the internet, which avoids the issues raised by classical measurements. It is based on the measurement of IP-level neighborhood of internet core routers, for which we design and implement a rigorous method. It consists in sending traceroute probes from many monitors distributed in the internet towards a given target router and carefully selecting the relevant information in collected data. Using simulations, we provide strong evidence of the accuracy of our approach. We then conduct real-world measurements illustrating the practical effectiveness of our method. This constitutes a significant step towards reliable knowledge of the IP-level degree distribution of the core of the internet.

Download

Interactive multiscale visualization of huge graphs: application to a network of weblogs

Massoud Seifi, Jean-loup Guillaume, Matthieu Latapy and Bénédicte Le Grand

Proceedings of the 8th Workshop on Visualization and Knowledge Extraction (EGC 2010)

De nombreux réseaux du monde réel peuvent être modélisés par des grands graphes. Réduire la complexité d’un graphe de manière à ce qu’il puisse être facilement interprété par l’oeil humain est une aide précieuse pour comprendre et analyser ce type de données. Nous proposons une méthodologie de visualisation interactive multi-échelle de grands graphes basée sur une classification des sommets qui nous permet de représenter ces graphes de manière lisible et interprétable. Nous appliquons notre méthodologie à un réseau de blogs francophones.

Download

Impact of Random Failures and Attacks on Poisson and Power-Law Random Networks

Clémence Magnien, Matthieu Latapy and Jean-Loup Guillaume

ACM Computing Surveys, 43 (3), 2011. Extended abstract published in LNCS, proceedings of the 8-th International Conference On Principles Of Distributed Systems OPODIS’04, 2004, Grenoble, France (title: Comparison of Failures and Attacks on Random and Scale-free Networks)

It appeared recently that the underlying degree distribution of a network may play a crucial role concerning its robustness. Empiric and analytic results have been obtained, based on asymptotic and mean-field approximations. Previous work insisted on the fact that power-law degree distributions induce a high resilience to random failure but a high sensitivity to some attack strategies, while Poisson degree distributions are quite sensitive in both cases. Then, much work has been done to extend these results. We focus here on these basic results, with the aim of deepening significantly our understanding of their origin and their limitations. We review in detail previous contributions and we give full proofs in a unified framework, in which the approximations on which these results rely are well identified. We then add to these known results a set of new ones aimed at enlightening some important aspects. We also provide extensive and rigorous experiments which make it possible to evaluate the relevance of the analytic results. We reach the conclusion that, even if it is clear that the basic results of the field are true and important, they are in practice much less striking than generally thought. The differences between random failures and attacks are not so huge and can be explained with simple facts. Likewise, the differences in the behaviors induced by power-law and Poisson distributions are not as striking as often claimed.

Download

Mobile IPv6 Deployments: Graph-based Analysis and practical Guidelines

Guillaume Valadon, Clémence Magnien and Ryuji Wakikawa

Computer Communications, 32, 2009

The Mobile IPv6 protocol is a major solution to supply mobility services on the Internet. Many networking vendors have already implemented it in their operating systems and equipments. Moreover, it was recently selected to provide permanent IP addresses to end users of WiMAX and 3GPP2. Mobile IPv6 relies on a specific router called the home agent that hides location changes of the mobile nodes from the rest of the Internet. To do so, the mobile nodes’ traffic must flow through the home agent. This mandatory deviation produces longer paths and higher communication delays. In order to solve these problems, we describe a new approach to address deployments of Mobile IPv6 based on graph theory and could be applied to any operator’s network. In particular, we use notions of centrality in graphs to quantify increases of communication distances induced by dogleg routing and identify relevant home agents locations. We evaluate this approach using real-world network topologies and show that the obtained Mobile IPv6 performance could be close to direct paths ones. The proposed algorithm is generic and can be used to achieve efficient deployments of Mobile IPv6 as well as Home Agent Migration: a new Mobile IPv6 architecture using several distributed home agents.

Download

Efficient Measurement of Complex Networks Using Link Queries

Fabien Tarissan, Matthieu Latapy and Christophe Prieur

In Proceedings of the IEEE International Workshop on Network Science For Communication Networks (NetSciCom’09)

Complex networks are at the core of an intense research activity. However, in most cases, intricate and costly measurement procedures are needed to explore their structure. In some cases, these measurements rely on link queries: given two nodes, it is possible to test the existence of a link between them. These tests may be costly, and thus minimizing their number while maximizing the number of discovered links is a key issue. This is a challenging task, though, as initially no information is known on the network. This paper studies this problem: we observe that properties classically observed on real-world complex networks give hints for their efficient measurement; we derive simple principles and several measurement strategies based on this, and experimentally evaluate their efficiency on real-world cases. In order to do so, we introduce methods to evaluate the efficiency of strategies. We also explore the bias that different measurement strategies may induce.

Download

Ten weeks in the life of an eDonkey server

Frédéric Aidouni, Matthieu Latapy and Clémence Magnien

Sixth International Workshop on Hot Topics in Peer-to-Peer Systems (Hot-P2P 2009), May 29, 2009, Rome, Italy

This paper presents a capture of the queries managed by an eDonkey server during almost 10 weeks, leading to the observation of almost 9 billion messages involving almost 90 million users and more than 275 million distinct files. Acquisition and management of such data raises several challenges, which we discuss as well as the solutions we developed. We obtain a very rich dataset, orders of magnitude larger than previously avalaible ones, which we provide for public use. We finally present basic analysis of the obtained data, which already gives evidence of non-trivial features.

Download

Fast dynamics in Internet topology: preliminary observations and explanations

Clémence Magnien, Frédéric Ouedraogo, Guillaume Valadon, Matthieu Latapy

Fourth International Conference on Internet Monitoring and Protection (ICIMP 2009), May 24-28, 2009, Venice, Italy

By focusing on what can be observed by running traceroute-like measurements at a high frequency from a single monitor to a fixed destination set, we show that the observed view of the topology is constantly evolving at a pace much higher than expected. Repeated measurements discover new IP addresses at a constant rate, for long period of times (up to several months). In order to provide explanations, we study this phenomenon both at the IP, and at the Autonomous System levels. We show that this renewal of IP addresses is partially caused by a BGP routing dynamics, altering paths between existing ASes. Furthermore, we conjecture that an intra AS routing dynamics is another cause of this phenomenon.

Download