Partitionnement des Liens d’un Graphe : Critères et Mesures

Noé Gaumont, François Queyroi

ALGOTEL 2014 — 16èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications, Jun 2014, Le Bois-Plage-en-Ré, France. pp.1-4

La recherche de communautés chevauchantes est un enjeu important pour l’analyse des réseaux complexes. Une piste souvent envisagée est la recherche d’un partitionnement des arêtes du graphe. L’évaluation de cette décomposition tient cependant rarement compte du fait que les communautés recherchées correspondent à des groupes d’arêtes. Nous discutons dans ce papier l’utilisation de nouveaux critères pouvant répondre à ce problème. Nous proposons de comparer le nombre de sommets incidents à un groupe d’arêtes au nombre attendu dans un graphe aléatoire. Un optimum local de la mesure dérivée de ce concept peut être obtenu par un algorithme glouton. Nous présentons les premiers résultats obtenus à travers une analyse de la mesure et des tests empiriques.

Download

Duplication of Time-Varying Graphs

François Queyroi

5ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2014.

Nous présentons une transformation de graphes temporels, appelée -duplication, permettant de réduire l’hétérogénéité temporelle dans l’analyse de réseaux dynamiques. Au lieu de construire une séquence d’instantanées à partir d’un découpage global du temps, nous utilisons une approche centrée sur les individus en considérant un sommet sur plusieurs sessions i.e. des périodes durant lesquelles il se connecte au moins tous les . Cette note décrit précisément le concept de -duplication et fournit des pistes quant à son utilisation pour l’analyse de réseaux complexes. En particulier, nous proposons une généralisation du concept de k-cores aux graphes temporels.

Structures biparties et communautés recouvrantes des graphes de terrains

Tackx Raphaël, Maximilien Danisch et Fabien Tarissan

In Acte de la 5ème Conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI’14), Paris, France, 2014.

De nombreux réseaux rencontrés en pratique se prêtent naturellement à la formalisation sous forme de graphes pour analyser et modéliser leur structure. Cette représentation plate des réseaux s’est montrée cependant peu efficace pour rendre compte de propriétés importantes et non triviales liées à la structure bipartie des réseaux. Des travaux récents ont montré notamment que des propriétés de recouvrements semblaient être présentes dans la plupart des réseaux réels et qu’elles permettaient de mieux expliquer des propriétés observées sur dans les graphes simples. Le présent travail entend poursuivre cette problématique en étudiant les propriétés liées aux recouvrements dans les structures communautaire des réseaux sociaux. Nous conduisons pour cela une étude basée sur le réseau des pages et catégories WIKIPEDIA et nous montrons notamment que parmi les métriques proposées récemment pour rendre compte de ces recouvrements complexes entre communautés, le coefficient de redondance était plus pertinent que le populaire coefficient de clustering biparti étudié généralement en pratique.

Download

Comparing overlapping properties of real bipartite networks

Fabien Tarissan

In ISCS 2014: Interdisciplinary Symposium on Complex Systems, Emergence, Complexity and Computation, 14:309-318, Springer, 2014.

Many real-world networks lend themselves to the use of graphs for analysing and modelling their structure. But such a simple representation has proven to miss some important and non trivial properties hidden in the bipartite structure of the networks. Recent papers have shown that overlapping properties seem to be present in bipartite networks and that it could explain better the properties observed in simple graphs. This work intends to investigate this question by studying two proposed metrics to account for overlapping structures in bipartite networks. The study, conducted on four dataset stemming from very different contexts (computer science, juridical science and social science), shows that the most popular metrics, the clustering coefficient, turns out to be less relevant that the recent redundancy coefficient to analyse intricate overlapping properties of real networks.

Download

Data-driven traffic and diffusion modeling in peer-to-peer networks : A real case study.

Romain Hollanders, Daniel Bernardes, Bivas Mitra, Raphael Jungers, Jean-Charles Delvenne, Fabien Tarissan.

In Journal of Network Science, 2(3):341-266, Cambridge University Press, 2014.

Peer-to-peer (p2p) systems have driven a lot of attention in the past decade as they have become a major source of Internet traffic. The amount of data flowing through the p2p network is huge and hence challenging both to comprehend and to control. In this work, we take advantage of a new and rich dataset recording p2p activity at a remarkable scale to address these difficult problems. After extracting the relevant and measurable properties of the network from the data, we develop two models that aim to make the link between the low-level properties of the network, such as the proportion of peers that do not share content (i.e., free riders) or the distribution of the files among the peers, and its high-level properties, such as the Quality of Service or the diffusion of content, which are of interest for supervision and control purposes. We observe a significant agreement between the high-level properties measured on the real data and on the synthetic data generated by our models, which is encouraging for our models to be used in practice as large-scale prediction tools. Relying on them, we demonstrate that spending efforts to reduce the amount of free-riders indeed helps to improve the availability of files on the network. We observe however a saturation of this phenomenon after 65% of free-riders.

Download

Measuring the Degree Distribution of Routers in the Core Internet

Matthieu Latapy, Elie Rotenberg, Christophe Crespelle, Fabien Tarissan

13th IFIP International Conference on Networking – Networking 2014, 2014, Trondheim, Norway. IEEE, pp.1-9

Most current models of the internet rely on knowledge of the degree distribution of its core routers, which plays a key role for simulation purposes. In practice, this distribution is usually observed directly on maps known to be partial, biased and erroneous. This raises serious concerns on the true knowledge one may have of this key property. Here, we design an original measurement approach targeting reliable estimation of the degree distribution of core routers, without resorting to any map. It consists in sampling random core routers and precisely estimate their degree thanks to probes sent from many distributed monitors. We run and assess a large-scale measurement following this approach, carefully controlling and correcting bias and errors encountered in practice. The estimate we obtain is much more reliable than previous knowledge, and it shows that the true degree distribution is very different from all current assumptions.

Download

UDP Ping: a dedicated tool for improving measurements of the Internet topology

Fabien Tarissan, Elie Rotenberg, Matthieu Latapy, Christophe Crespelle

IEEE 22nd International Symposium on Modeling Analysis and Simulation of Computer and Telecomunication Systems (MASCOTS’14), At Paris, France

The classical approach for Internet topology measurement consists in distributively collecting as much data as possible and merging it into one single piece of topology on which are conducted subsequent analysis. Although this approach may seem reasonable, in most cases network measurements performed in this way suffer from some or all of the following limitations: they give only partial views of the networks under concern, these views may be intrinsically biased, and they contain erroneous data due to the measurement tools. Here we present a new tool, named UDP Ping , that relies on a very different approach for the measurement of the Internet topology. Its basic principle is to measure the interface of a given target directed toward a monitor which sends the measurement probe. We demonstrate how to use it to deploy real world-wide measurements that provide reliable (i.e. bias and error free) knowledge of the Internet topology, namely the degree distribution of routers in the core Internet in our example.

Download

Prendre en compte le capitalisme social dans la mesure de linfluence sur Twitter

Maximilien Danisch, Nicolas Dugué, Anthony Perez

MARAMI 2014

L’influence sur Twitter est un sujet particulièrement discuté avec l’explosion de l’utilisation de ce service de micro-blogging. En effet, afin de fouiller efficacement dans la masse de tweets produite par les millions d’utilisateurs de Twitter, de déterminer les tendances et les informations pertinentes, il est important de pouvoir détecter les utilisateurs influents. Ainsi, plusieurs outils fournissant un score d’influence ont été proposés et font référence. Cependant, les algorithmes utilisés par les sociétés qui les développent restent secrets. Dans des travaux récents, il a été montré que des comptes automatiques peuvent obtenir des scores élevés sans raison. De façon à étendre et compléter ces travaux, nous montrons que ces outils sont incapables de distinguer les utilisateurs réels de ceux appelés capitalistes sociaux, qui obtiennent à tort des scores d’influence élevés. Afin de résoudre ce problème, nous définissons un classifieur qui réalise cet objectif et rétablit ainsi des scores réalistes pour les capitalistes sociaux. Pour réaliser ce classifieur, nous avons réuni un jeu de données contenant des exemples de capitalistes sociaux et d’utilisateurs réguliers du réseau ainsi que leurs informations de profils et d’utilisation. Pour finir, nous avons développé une application en ligne qui utilise ce classifieur.

Download

On the Use of Intrinsic Time Scale for Dynamic Community Detection and Visualization in Social Networks

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand

Proceedings of the 8th IEEE International Conference on Research Challenges in Information Science (RCIS 2014)

The analysis of social networks is a challenging research area, in particular because of their dynamic features. In this paper, we study such evolving graphs through the evolution of their community structure. More specifically, we build on existing approaches for the identification of stable communities over time. This paper presents two contributions. We first propose a new way to compute such stable communities, using a different time scale, called intrinsic time. This intrinsic time is related to the dynamics of the graph (e.g., in terms of link appearance or disappearance) and independent from traditional (extrinsic) time units, like the second. We then show how visualization both at intrinsic and extrinsic time scales can help validating and interpreting the obtained communities. Our results are illustrated on a social network made of contacts among the participants of the 2006 edition of the Infocom conference.

Download

RankMerging: Learning to rank in large-scale social networks

Lionel Tabourier, Anne-Sophie Libert, and Renaud Lambiotte

2014, DyNakII, 2nd International Workshop on Dynamic Networks and Knowledge Discovery (PKDD 2014 workshop)

In this work, we consider the issue of unveiling unknown links in a social network, one of the difficulties of this problem being the small number of unobserved links in comparison of the total number of pairs of nodes. We define a simple supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. As an illustration, we apply the method to the case of a cell phone service provider, which uses the network among its contractors as a learning set to discover links existing among users of its competitors. We show that our method substantially improves the performance of unsupervised metrics of classification. Finally, we discuss how it can be used with additional sources of data, including temporal or semantic information

Download

Link Prediction and Threads in Email Networks

Qinna Wang

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

We tackle the problem of predicting future links in dynamic networks. For this, we work with the Debian Mailing Lists. In this dataset, a user can post a question to the debian list and other users can reply it by email forming a thread. We show that the number of threads shared in the past between users is a better feature to predict future email exchanges than classical features, like the number of common neighbors. We also show that the structure of a thread do not match the traditional definition of a community, particularly a thread does not have many triangles and has many outgoing connections. While the number of shared (detected) communities is also a better feature to predict future email exchanges than traditional features, is not as good as the number of shared threads. We believe our work should raise interests in characterizing and detecting thread-like structures in dynamic networks.

Download

On the importance of considering social capitalism when measuring influence on Twitter

Maximilien Danisch, Nicolas Dugué and Anthony Perez

2014 International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC 2014)

Influence on Twitter has drawn a lot of attention these past few years since this microblogging service is used to share, seek or debate about any kind of information. Several tools providing so-called influential scores have thus been proposed. However, the algorithms behind them are kept secret and it is not clear how they consider influence. Yet, many users rely on such tools to evaluate and even try to improve their influence in the Twitter network. In a recent work, it has been shown that automatic accounts can obtain high influential scores with no intuitive reason. Extending and completing this work, we show that such measures fail at distinguishing so-called social capitalists from real, truthful users. This enlights the fact that actual scores do not seem to consider the way followers and interactions are obtained on the network. To overcome this issue, we define a classifier that discriminates social capitalists from truthful users. To that aim, we crawled the Twitter network to gather examples of certified social capitalists and regular users and obtained features related to the profile and behavior of each user. We then use such a classifier to balance Klout’s score to adjust influential scores. We also developed an application that allows using our classifier online. We believe our work should raise the question of the legitimacy of influence on Twitter, and lead to significant improvements in the way it is measured.

Download

Learning a Proximity Measure to Complete a Community

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

In large-scale online complex networks (Wikipedia, Facebook, Twitter, etc.) finding nodes related to a specific topic is a strategic research subject. This article focuses on two central notions in this context: communities (groups of highly connected nodes) and proximity measures (indicating whether nodes are topologically close). We propose a parameterized proximity measure which, given a set of nodes belonging to a community, learns the optimal parameters and identifies the other nodes of this community, called multi-ego-centered community as it is centered on a set of nodes. We validate our results on a large dataset of categorized Wikipedia pages and on benchmarks, we also show that our approach performs better than existing ones. Our main contributions are (i) a new ergonomic parametrized proximity measure, (ii) the automatic tuning of the proximity’s parameters and (iii) the unsupervised detection of community boundaries.

Download

Direct Generation of Random Graphs Exactly Realising a Prescribed Degree Sequence

Darko Obradovi, Maximilien Danisch

The 6th International Conference on Computational Aspects of Social Networks (CASON2014)

This paper intends to extend the possibilites available to researchers for the evaluation of directed networks with the use of randomly generated graphs. The direct generation of a simple network with a prescribed degree sequence still seems to be an open issue, since the prominent configuration model usually does not realise the degree distribution exactly. We propose such an algorithm using a heuristic for node prioritisation. We demonstrate that the algorithm samples approximately uniformly. In comparison to the switching Markov Chain Monte Carlo algorithms, the direct generation of edges allows an easy modification of the linking behaviour in the random graph, introducing for example degree correlations, mixing patterns or community structure. That way, more specific random graphs can be generated (non-uniformly) in order to test hypotheses on the question, whether specific network features are due to a specific linking behaviour only. Or it can be used to generate series of synthetic benchmark networks with a specific community structure, including hierarchies and overlaps.

Download

Mining bipartite graphs to improve semantic paedophile activity detection

Raphaël Fournier, Maximilien Danisch

IEEE International Conference on Research Challenges in Information Science 2014 (RCIS2014)

Peer-to-peer (P2P) networks are popular to exchange large volumes of data through the Internet. Pedophile activity is a very important topic for our society and some< works have recently attempted to gauge the extent of pedophile exchanges on P2P networks. A key issue is to obtain an efficient detection tool, which may decide if a sequence of keywords is related to the topic or not. We propose to use social network analysis in a large dataset from a P2P network to improve a state-of-the-art filter for pedophile queries. We obtain queries and thus combinations of words which are not tagged by the filter but should be. We also perform some experiments to explore if the original four categories of paedophile queries were to be found by topological measures only.

Download

Comparing Pedophile Activity in Different P2P Systems

Raphaël Fournier, Thibault Cholez, Matthieu Latapy, Isabelle Chrisment, Clémence Magnien, Olivier Festor and Ivan Daniloff

Soc. Sci. 2014, 3(3), 314-325

Peer-to-peer (P2P) systems are widely used to exchange content over the Internet. Knowledge of pedophile activity in such networks remains limited, despite having important social consequences. Moreover, though there are different P2P systems in use, previous academic works on this topic focused on one system at a time and their results are not directly comparable. We design a methodology for comparing KAD and eDonkey, two P2P systems among the most prominent ones and with different anonymity levels. We monitor two eDonkey servers and the KAD network during several days and record hundreds of thousands of keyword-based queries. We detect pedophile-related queries with a previously validated tool and we propose, for the first time, a large-scale comparison of pedophile activity in two different P2P systems. We conclude that there are significantly fewer pedophile queries in KAD than in eDonkey (approximately 0.09% vs. 0.25%).

Download

Network Decomposition into Fixed Points of Degree Peeling

James Abello and François Queyroi

Social Network Analysis and Mining, Springer, 2014, 4 (1), pp.191

Degree peeling is used to study complex networks. It is a decomposition of the network into vertex groups of increasing minimum degree. However, the peeling value of a vertex is non-local in this context since it relies on the number of connections the vertex has to groups above it. We explore a different way to decompose a network into edge layers such that the local peeling value of the vertices on each layer does not depend on their non-local connections with the other layers. This corresponds to the decomposition of a graph into subgraphs that are invariant with respect to degree peeling, i.e. they are fixed points. We introduce a general method to partition the edges of an arbitrary graph into fixed points of degree peeling, called the iterative-edge-core decomposition. Information from this decomposition is used to formulate a novel notion of vertex diversity based on Shannon’s entropy. We illustrate the usefulness of this decomposition on a variety of social networks including weighted graphs. Our method can be used as a preprocessing step for community detection and graph visualization.

Download

Identifying Roles in an IP Network with Temporal and Structural Density

Tiphaine Viard and Matthieu Latapy

NetSciCom 2014

Captures of IP traffic contain much information on very different kinds of activities like file transfers, users interacting with remote systems, automatic backups, or distributed computations. Identifying such activities is crucial for an appropriate analysis, modeling and monitoring of the traffic. We propose here a notion of density that captures both temporal and structural features of interactions, and generalizes the classical notion of clustering coefficient. We use it to point out important differences between distinct parts of the traffic, and to identify interesting nodes and groups of nodes in terms of roles in the network.

Download

Measuring Routing Tables in the Internet

Élie Rotenberg, Christophe Crespelle and Matthieu Latapy

NetSciCom 2014

The most basic function of an Internet router is to decide, for a given packet, which of its interfaces it will use to forward it to its next hop. To do so, routers maintain a routing table, in which they look up for a prefix of the destination address. The routing table associates an interface of the router to this prefix, and this interface is used to forward the packet. We explore here a new measurement method based upon distributed UDP probing to estimate this routing table for Internet routers.

Download

Multi-ego-centred communities

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

in « Complex Networks », pages 76-111, H. Cherifi (ed), Cambridge Scholars Publishing, 2014

The community structure of a graph is defined in various ways in the literature: partition, where nodes can belong to only one community. This vision is unrealistic and may lead to poor results because most nodes belong to several communities in real-world networks; overlapping community structure, which is the most natural view, but is often very difficult to identify in practice due to the complex structure of real-world networks and the huge potential number of such communities; egocentered community structure which focuses on individual nodes’ communities and seems to be a good compromise. In this chapter, the third vision is investigated; a new proximity measure based on opinion dynamics is proposed to score and select nodes according to their proximity to a node of interest. We call it the carryover opinion. In addition to be parameter-free, the carryover opinion can be calculated in a very time-efficient way and can thus be used in very large graphs. We also go further in the idea of egocentered communities by introducing the new concept of multi-egocentered communities, i.e., focusing on the communities of a set of nodes rather than of a single node. A key idea is that, although one node generally belongs to numerous communities, e.g., friends, colleagues, family, a small set of appropriate nodes can fully characterize a single community. We also show how to unfold all egocentered communities of a given node using this notion of multi-egocentered community.

Download