Measurement of eDonkey Activity with Distributed Honeypots

Oussama Allali, Matthieu Latapy and Clémence Magnien

Sixth International Workshop on Hot Topics in Peer-to-Peer Systems (Hot-P2P 2009), May 29, 2009, Rome, Italy

Collecting information about user activity in peer-to-peer systems is a key but challenging task. We describe here a distributed platform for doing so on the eDonkey network, relying on a group of honeypot peers which claim to have certain files and log queries they receive for these files. We then conduct some measurements with typical scenarios and use the obtained data to analyze the impact of key parameters like measurement duration, number of honeypots involved, and number of advertised files. This illustrates both the possible uses of our measurement system, and the kind of data one may collect using it.

Download

Empreintes conceptuelles et spatiales pour la caractérisation des réseaux sociaux

Bénédicte Le Grand, Marie-Aude Aufaure et Michel Soto

Conférence EGC 2009 (Extraction et Gestion des Connaissances), Strasbourg, France, 27-30 janvier 2009

In this paper, Formal Concept Analysis and Galois lattices are used for the analysis of complex datasets, online social networks in particular. Lattice-inspired statistics computed on the objects of the lattice provide their « conceptual distribution ». An experimentation conducted on four social networks’ samples shows how these statistics may be used to characterize these networks and filter them automatically.

Download

Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs

Matthieu Latapy

Theoretical Computer Science (TCS) 407 (1-3), pages 458-473, 2008

Finding, counting and/or listing triangles (three vertices with three edges) in massive graphs are natural fundamental problems, which received recently much attention because of their importance in complex network analysis. We provide here a detailed survey of proposed main-memory solutions to these problems, in an unified way. We note that previous authors paid surprisingly little attention to space complexity of main-memory solutions, despite its both fundamental and practical interest. We therefore detail space complexities of known algorithms and discuss their implications. We also present new algorithms which are time optimal for triangle listing and beats previous algorithms concerning space needs. They have the additional advantage of performing better on power-law graphs, which we also detail. We finally show with an experimental study that these two algorithms perform very well in practice, allowing to handle cases which were previously out of reach.

Download

A Radar for the Internet

Matthieu Latapy, Clémence Magnien and Frédéric Ouédraogo

Proceedings of ADN’08: 1st International Workshop on Analysis of Dynamic Networks, in conjunction with IEEE ICDM 2008

In contrast with most internet topology measurement research, our concern here is not to obtain a map as complete and precise as possible of the whole internet. Instead, we claim that each machine’s view of this topology, which we call ego-centered view, is an object worth of study in itself. We design and implement an ego-centered measurement tool, and perform radar-like measurements consisting of repeated measurements of such views of the internet topology. We conduct long-term (several weeks) and high-speed (one round every few minutes) measurements of this kind from more than one hundred monitors, and we provide the obtained data. We also show that these data may be used to detect events in the dynamics of internet topology.

Download

Fast Computation of Empirically Tight Bounds for the Diameter of Massive Graphs

Clémence Magnien, Matthieu Latapy and Michel Habib

ACM Journal of Experimental Algorithmics (JEA), 13, 2009

The diameter of a graph is among its most basic parameters. Since a few years, it moreover became a key issue to compute it for massive graphs in the context of complex network analysis. However, known algorithms, including the ones producing approximate values, have too high a time and/or space complexity to be used in such cases. We propose here a new approach relying on very simple and fast algorithms that compute (upper and lower) bounds for the diameter. We show empirically that, on various real-world cases representative of complex networks studied in the literature, the obtained bounds are very tight (and even equal in some cases). This leads to rigorous and very accurate estimations of the actual diameter in cases which were previously untractable in practice.

Download

Fast unfolding of communities in large networks

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre

J. Stat. Mech. (october 2008) P10008

We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks

Download

Description and simulation of dynamic mobility networks

Pierre Borgnat, Eric Fleury, Jean-Loup Guillaume, Céline Robardet and Antoine Scherrer

Computer Networks 52 (2008), pp. 2842-2858

During the last decade, the study of large scale complex networks has attracted a substantial amount of attention and works from several domains: sociology, biology, computer science, epidemiology. Most of such complex networks are inherently dynamic, with new vertices and links appearing while some old ones disappear. Until recently, the dynamic of these networks was less studied and there is a strong need for dynamic network models in order to sustain protocol performance evaluations and fundamental analyzes in all the research domains listed above. We propose in this paper a novel framework for the study of dynamic mobility networks. We address the characterization of dynamics by proposing an in-depth description and analysis of two real-world data sets. We show in particular that links creation and deletion processes are independent of other graph properties and that such networks exhibit a large number of possible configurations, from sparse to dense. From those observations, we propose simple yet very accurate models that allow to generate random mobility graphs with similar temporal behavior as the one observed in experimental data.

Download

Evolving networks

Pierre Borgnat, Eric Fleury, Jean-Loup Guillaume, Clémence Magnien, Céline Robardet et Antoine Scherrer

Proceedings of NATO Advanced Study Institute on Mining Massive Data Sets for Security, IOS Press, 2008

Most real networks often evolve through time: changes of topology can occur if some nodes and/or edges appear and/or disappear, and the types or weights of nodes and edges can also change even if the topology stays static. Mobile devices with wireless capabilities (mobile phones, laptops, etc.) are a typical example of evolving networks where nodes or users are spread in the environment and connections between users can only occur if they are near each other. This whois- near-whom network evolves every time users move and communication services (such as the spread of any information) will deeply rely on the mobility and on the characteristics of the underlying network. This paper presents some recent results concerning the characterization of the dynamics of complex networks through three different angles: evolution of some parameters on snapshots of the network, parameters describing the evolution itself, and intermediate approaches consisting in the study of specific phenomena or users of interest through time.

Download

Basic Notions for the Analysis of Large Two-mode Networks

Matthieu Latapy, Clémence Magnien and Nathalie Del Vecchio

Social Networks (2008), vol. 30, no1, pp. 31-48

Many large real-world networks actually have a 2-mode nature: their nodes may be separated into two classes, the links being between nodes of different classes only. Despite this, and despite the fact that many ad-hoc tools have been designed for the study of special cases, very few exist to analyse (describe, extract relevant information) such networks in a systematic way. We propose here an extension of the most basic notions used nowadays to analyse large 1-mode networks (the classical case) to the 2-mode case. To achieve this, we introduce a set of simple statistics, which we discuss by comparing their values on a representative set of real-world networks and on their random versions. This makes it possible to evaluate their relevance in capturing properties of interest in 2-mode networks.

Download

Detection, Understanding, and Prevention of Traceroute Measurement Artifacts

Fabien Viger, Brice Augustin, Xavier Cuvellier, Clémence Magnien, Matthieu Latapy, Timur Friedman, and Renata Teixeira

Computer Networks 52-5 (2008), pp. 998-1018. Extended abstract published in the proceedings of the 6-th Internet Measurement Conference IMC’06, 2006, Rio de Janeiro, Brazil

Test of time award IMC 2022

Traceroute is widely used: from the diagnosis of network problems to the assemblage of internet maps. Unfortunately, there are a number of problems with traceroute methodology, which lead to the inference of erroneous routes. This paper studies particular structures arising in nearly all traceroute measurements. We characterize them as « loops », « cycles », and « diamonds ». We identify load balancing as a possible cause for the appearance of false loops, cycles and diamonds, i.e., artifacts that do not represent the internet topology. We provide a new publicly-available traceroute, called Paris traceroute, which, by controlling the packet header contents, provides a truer picture of the actual routes that packets follow. We performed measurements, from the perspective of a single source tracing towards multiple destinations, and Paris traceroute allowed us to show that many of the particular structures we observe are indeed traceroute measurement artifacts.

Complex Network Measurements: Estimating the Relevance of Observed Properties

Matthieu Latapy and Clémence Magnien

Infocom’08 Proceedings, Phoenix, USA

Complex networks, modeled as large graphs, received much attention during these last years. However, data on such networks is only available through intricate measurement procedures. Until recently, most studies assumed that these procedures eventually lead to samples large enough to be representative of the whole, at least concerning some key properties. This has crucial impact on network modeling and simulation, which rely on these properties. Recent contributions proved that this approach may be misleading, but no solution has been proposed. We provide here the first practical way to distinguish between cases where it is indeed misleading, and cases where the observed properties may be trusted. It consists in studying how the properties of interest evolve when the sample grows, and in particular whether they reach a steady state or not. In order to illustrate this method and to demonstrate its relevance, we apply it to data-sets on complex network measurements that are representative of the ones commonly used. The obtained results show that the method fulfills its goals very well. We moreover identify some properties which seem easier to evaluate in practice, thus opening interesting perspectives.

Download

Local leaders in random networks

Vincent D. Blondel, Jean-Loup Guillaume, Julien M. Hendrickx, Cristobald de Kerchove and Renaud Lambiotte

Phys. Rev. E 77, 036114 (2008)

We consider local leaders in random uncorrelated networks, i.e., nodes whose degree is higher than or equal to the degree of all their neighbors. An analytical expression is found for the probability for a node of degree k to be a local leader. This quantity is shown to exhibit a transition from a situation where high-degree nodes are local leaders to a situation where they are not, when the tail of the degree distribution behaves like the power law ~k^gamma_c with gamma_c=3. Theoretical results are verified by computer simulations, and the importance of finite-size effects is discussed.

Download

Distance distribution in random graphs and application to network exploration

Vincent D. Blondel, Jean-Loup Guillaume, Julien M. Hendrickx, and Raphaël M. Jungers

Phys. Rev. E 76, 066101 (2007)

We consider the problem of determining the proportion of edges that are discovered in an ErdsRényigraph when one constructs all shortest paths from a given source node to all other nodes. This problem is equivalent to the one of determining the proportion of edges connecting nodes that are at identical distance from the source node. The evolution of this quantity with the probability of existence of the edges exhibits intriguing oscillatory behavior. In order to perform our analysis, we introduce a different way of computing the distribution of distances between nodes. Our method outperforms previous similar analyses and leads to estimates that coincide remarkably well with numerical simulations. It allows us to characterize the phase transitions appearing when the connectivity probability varies.

Download

Multi-level analysis of an interaction network between individuals in a mailing-list

Rémi Dorat, Matthieu Latapy, Bernard Conein and Nicolas Auray

Annals of Telecommunications (2007), vol. 62, no3-4, pp. 325-349

It is well known now that most real-world complex networks have some properties which make them very different from random networks. In the case of interactions between authors of messages in a mailing-list, however, a multi-level structure may be responsible for some of these properties. We propose here a rigorous but simple formalism to investigate this question, and we apply it to an archive of the Debian user mailing-list. This leads to the identification of some properties which may indeed be explained this way, and of some properties which need deeper analysis.

Download

Describing and simulating routes on the Internet

Jérémie Leguay, Matthieu Latapy, Timur Friedman, Kavé Salamatian

Computer Networks 51, pages 2067-2087, 2007. Extended abstract published in LNCS, proceedings of the 4-th IFIP international conference on Networking, 2005, Waterloo, Canada

This contribution deals with actual routes followed by packets on the internet at IP level. We first propose a set of statistical properties to analyse such routes, which brings detailed information on them. We then use the obtained results to suggest and evaluate methods for generating artificial routes suitable for simulation purposes. This also makes it possible to evaluate various network models. This work is based on large data sets provided mainly by CAIDA’s skitter infrastructure.

Download

Bipartite graphs as Models of Complex Networks

Jean-Loup Guillaume and Matthieu Latapy

Physica A 371, pages 795-813, 2006. Extended abstract published in LNCS, proceedings of the 1-st international workshop on Combinatorial and Algorithmic Aspects of Networking CAAN’04, 2004, Banff, Canada

It appeared recently that the classical random graph model used to represent real-world complex networks does not capture their main properties. Since then, various attempts have been made to provide accurate models. We study here the first model which achieves the following challenges: it produces graphs which have the three main wanted properties (clustering, degree distribution, average distance), it is based on some real-world observations, and it is sufficiently simple to make it possible to prove its main properties. This model consists in sampling a random bipartite graph with prescribed degree distribution. Indeed, we show that any complex network can be viewed as a bipartite graph with some specific characteristics, and that its main properties can be viewed as consequences of this underlying structure. We also propose a growing model based on this observation.

Download

Dynamics of three-state excitable units on Poisson vs power-law random networks

Anne-Ruxandra Carvunis, Matthieu Latapy, Annick Lesne, Clémence Magnien and Laurent Pezard

Physica A 367, pages 585-612, 2006

The influence of the network topology on the dynamics of systems of coupled excitable units is studied numerically and demonstrates a lower dynamical variability for power-law networks than for Poisson ones. This effect which reflects a robust collective excitable behavior is however lower than that observed for diffusion processes or network robustness. Instead, the presence (and number) of triangles and larger loops in the networks appears as a parameter with strong influence on the considered dynamics.

Download

Computing communities in large networks using random walks

Pascal Pons and Matthieu Latapy

Journal of Graph Algorithms and Applications (JGAA) vol. 10, no. 2, pages 191-218, 2006. Extended abstract published in LNCS, proceedings of the 20-th International Symposium on Computer and Information Sciences ISCIS’05, 2005, Istambul, Turquie

Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn) and space O(n) in the worst case, and in time O(n log n) and space O(n) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.

Download

Clustering in P2P exchanges and consequences on performances

Stevens Le Blond, Jean-Loup Guillaume and Matthieu Latapy

LNCS, proceedings of the 4-th International workshop on Peer-to-Peer Systems IPTPS’05, 2005, Ithaka, New York, USA

We propose here an analysis of a rich dataset which gives an exhaustive and dynamic view of the exchanges processed in a running eDonkey system. We focus on correlation in term of data exchanged by peers having provided or queried at least one data in common. We introduce a method to capture these correlations (namely the data clustering), and study it in detail. We then use it to propose a very simple and efficient way to group data into clusters and show the impact of this underlying structure on search in typical P2P systems. Finally, we use these results to evaluate the relevance and limitations of a model proposed in a previous publication. We indicate some realistic values for the parameters of this model, and discuss some possible improvements.

Download