Computing communities in large networks using random walks

Pascal Pons and Matthieu Latapy

Journal of Graph Algorithms and Applications (JGAA) vol. 10, no. 2, pages 191-218, 2006. Extended abstract published in LNCS, proceedings of the 20-th International Symposium on Computer and Information Sciences ISCIS’05, 2005, Istambul, Turquie

Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn) and space O(n) in the worst case, and in time O(n log n) and space O(n) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.

Download

Clustering in P2P exchanges and consequences on performances

Stevens Le Blond, Jean-Loup Guillaume and Matthieu Latapy

LNCS, proceedings of the 4-th International workshop on Peer-to-Peer Systems IPTPS’05, 2005, Ithaka, New York, USA

We propose here an analysis of a rich dataset which gives an exhaustive and dynamic view of the exchanges processed in a running eDonkey system. We focus on correlation in term of data exchanged by peers having provided or queried at least one data in common. We introduce a method to capture these correlations (namely the data clustering), and study it in detail. We then use it to propose a very simple and efficient way to group data into clusters and show the impact of this underlying structure on search in typical P2P systems. Finally, we use these results to evaluate the relevance and limitations of a model proposed in a previous publication. We indicate some realistic values for the parameters of this model, and discuss some possible improvements.

Download

Combining the use of clustering and scale-free nature of user exchanges into a simple and efficient P2P system

Pierre Fraigniaud, Philippe Gauron and Matthieu Latapy

LNCS, proceedings of the 11-th international conference Euro-Par, 2005, Lisbonne, Portugal

It appeared recently that user interests in a P2P system possess clustering properties that may be used to reduce significantly the amount of traffic of flooding-based search strategies. It was also observed that they possess scale-free properties that may be used for the design of efficient routing-based search strategies. In this paper, we show that the combination of these two properties make it possible to design an efficient and simple fully decentralized search strategy. Further, simulations processed on real-world traces show that other unidentified properties hidden in actual queries make our protocol even more efficient, performing searches in logarithmic expected number of steps.

Download

Complex Network Metrology

Jean-Loup Guillaume and Matthieu Latapy

Complex Systems 16, pages 83-94, 2005

In order to study some complex networks like the Internet, the Web, social networks or biological networks, one first has to explore them. This gives a partial and biased view of the real object, which is generally assumed to be representative of the whole. However, up to now nobody knows how and how much the measure influences the results. Using the example of the Internet and a rough model of its exploration process, we show that the way a given complex network is explored may strongly influence the observed properties. This leads us to argue for the necessity of developing a science of metrology of complex networks. Its aim would be to study how the partial and biased view of a network relates to the properties of the whole network.

Download

Relevance of Massively Distributed Explorations of the Internet Topology: Qualitative Results

Jean-Loup Guillaume, Matthieu Latapy and Damien Magoni

Computer Networks 50, pages 3197-3224, 2006. Extended abstract published in the proceedings of the 24-th IEEE international conference Infocom’05, 2005, Miami, USA

Internet maps are generally constructed using the traceroute tool from a few sources to many destinations. It appeared recently that this exploration process gives a partial and biased view of the real topology, which leads to the idea of increasing the number of sources to improve the quality of the maps. In this paper, we present a set of experiments we have conducted to evaluate the relevance of this approach. It appears that the statistical properties of the underlying network have a strong influence on the quality of the obtained maps, which can be improved using massively distributed explorations. Conversely, some statistical properties are very robust, and so the known values for the Internet may be considered as reliable. We validate our analysis using real-world data and experiments, and we discuss its implications.

Download

Random generation of large connected simple graphs with prescribed degree distribution

Fabien Viger and Matthieu Latapy

Extended abstract published in LNCS, proceedings of the 11-th international conference on Computing and Combinatorics COCOON’05, 2005, Kunming, Yunnan, Chine

We address here the problem of generating random graphs uniformly from the set of simple connected graphs having a prescribed degree sequence. Our goal is to provide an algorithm suitable for practical use both because of its ability to generate very large graphs (efficiency) and because it is easy to implement (simplicity). We focus on a family of heuristics for which we introduce optimality conditions, and show how this optimality can be reached in practice. We then propose a different approach, specifically designed for real-world degree distributions, which outperforms the first one. Based on a conjecture which we discuss rigorously and study empirically, we finally reduce the best asymptotic complexity bound known so far.

Download

Statistical analysis of a P2P query graph based on degrees and their time-evolution

Jean-Loup Guillaume, Matthieu Latapy and Stevens Le-Blond

LNCS, proceedings of the 6-th International Workshop on Distributed Computing IWDC’04, 2004, Kolkata, India

Despite their crucial impact on the performances of P2P systems, very few is known on peers behaviors in such networks. We propose here a study of these behaviors in a running environment using a semicentralised P2P system (eDonkey). To achieve this, we use a trace of the queries made to a large server managing up to fifty thousands peers simultaneously, and a few thousands queries per second. We analyse these data using complex network methods, and focus in particular on the degrees, their correlations, and their time-evolution. Results show a large variety of observed phenomena, including the variety of peers behaviors and heterogeneity of data queries, which should be taken into account when designing P2P systems.

Download

Bipartite Structure of all Complex Networks

Jean-Loup Guillaume and Matthieu Latapy

Information Processing Letters (IPL) 90:5, pages 215-221, 2004

The analysis and modelling of various complex networks has received much attention in the last few years. Some such networks display a natural bipartite structure: two kinds of nodes coexist with links only between nodes of different kinds. This bipartite structure has not been deeply studied until now, mainly because it appeared to be specific to only a few complex networks. However, we show here that all complex networks can be viewed as bipartite structures sharing some important statistics, like degree distributions. The basic properties of complex networks can be viewed as consequences of this underlying bipartite structure. This leads us to propose the first simple and intuitive model for complex networks which captures the main properties met in practice.

Download

Efficient and Simple Encodings for the Web Graph

Jean-Loup Guillaume, Matthieu Latapy and Laurent Viennot

LNCS, proceedings of the 3-rd international conference Web-Age Information Management WAIM’02, 2002, Beijing, Chine. Abstract published in the proceedings of the 11-th international conference World Wide Web WWW’02, 2002, Honolulu, Hawaï

In this paper, we propose a set of simple and efficient methods based on standard, free and widely available tools, to store and manipulate large sets of URLs and large parts of the Web graph. Our aim is both to store efficiently the URLs list and the graph in order to manage all the computations in a computer central memory. We also want to make the conversion between URLs and their identifiers as fast as possible, and to obtain all the successors of an URL in the Web graph efficiently. The methods we propose make it possible to obtain a good compromise between these two challenges, and make it possible to manipulate large parts of the Web graph.

Download