Problèmes sociologiques et méthodes mathématiques: la recherche du réseau squelette

Narciso Pizarro

Jeudi 22 janvier 2015 à 11h, salle 25-26/105

Lorsque on examine la nature des relations sociales qui sont représentées avec des réseaux, on constate quil sagit, dans presque tous les travaux empiriques, directes, binaires, entre acteurs sociaux. Ces acteurs sont trop souvent des individus, et les rapports directs entre eux sont des interactions intersubjectives, ce qui produit à la fois des grands réseaux et des données peu fiables. Depuis Simmel, nous savons que linteraction binaire nest pas encore une relation sociale, quil faut au moins trois individus pour constituer latome du social. Dautre part les groupes sociaux sont bien moins nombreux que les individus, et cependant, leurs intersections permettent dindividualiser univoquement tous les membres dune population de grandeur N: Lorrain a prouvé que le nombre minimum de cercles sociaux C nécessaires pour cette identification est: C=log2N Donc, travailler avec des groupes nexclut pas identifier leurs membres, sil le fallait pour quelque raison que ce soit. En plus, les réseaux bipartites constituent un modèle de nimporte quel réseau. Et ils permettent plus aisément lidentification del classes déquivalence structurelle des points du réseau. Les concepts de place et de réseaux de places que jai proposé constituent une approximation intéressante pour aborder ce problème. Le problème de lidentification des cliques, informatiquement compliqué, peut être contourné en partant des données sur des groupes sociaux.

Complex contagion process in spreading of online innovation

Mrton Karsai

Jeudi 15 janvier 2015 à 11h, salle 26-00/332

Slides

Diffusion of innovation can be interpreted as a social spreading phenomena governed by the impact of media and social interactions. Although these mechanisms have been identified by quantitative theories, their role and relative importance are not entirely understood, since empirical verification has so far been hindered by the lack of appropriate data. Here we analyse a dataset recording the spreading dynamics of the world’s largest Voice over Internet Protocol service to empirically support the assumptions behind models of social contagion. We show that the rate of spontaneous service adoption is constant, the probability of adoption via social influence is linearly proportional to the fraction of adopting neighbours, and the rate of service termination is time-invariant and independent of the behaviour of peers. By implementing the detected diffusion mechanisms into a dynamical agent-based model, we are able to emulate the adoption dynamics of the service in several countries worldwide. This approach enables us to make medium-term predictions of service adoption and disclose dependencies between the dynamics of innovation spreading and the socioeconomic development of a country.

Dynamiques des réseaux sociaux en ligne, recommandations et interaction

Stéphane Raux

Jeudi 04 décembre 2014 à 11h, salle 26-00/332

Slides

Le succès de plateformes comme Facebook ou Twitter, qui s’appuient sur les interactions entre leurs utilisateurs pour artager des informations a profondément changé la manière dont nous utilisons le web. Cette thèse propose d’exploiter des méthodes d’analyse de grands graphes et de réseaux sociaux, mais aussi des techniques de *web mining* et d’analyse de texte pour élaborer des outils et des méthodes d’analyse des usages de ces sites de réseaux sociaux. Nous nous intéressons en particulier à deux types d’interactions : la conversation, que nous analysons à partir de réseaux de commentaires ou de mentions d’utilisateurs, et la recommandation, qui repose essentiellement sur des pratiques de citations de liens hypertextes. Une première analyse porte sur la dynamique des commentaires de Flickr et sur la manière dont ce réseau se construit. Nous proposons ensuite une méthode d’échantillonage de Twitter qui permet de capter en continu un corpus d’utilisateurs centré sur le web français, et d’élaborer une méthode de détection et de suivi des sujets à partir des citations de liens dans les données ainsi collectées. Il est ainsi possible de réaliser une typologie des utilisateurs en fonction de leur activité et de proposer une méthode de reconstitution des cascades de diffusion de liens sur Twitter. Ces travaux ont étés réalisés au sein de la société Linkfluence et ont donné lieu au développement de plusieurs programmes, dont le système de captation continue de messages sur Twitter et l’application Algopol, qui a permis de recruter plus de 12 000 participants pour une enquête sociologique et de collecter leurs profils Facebook dans le cadre d’un projet de recherche pluridisciplinaire.

Prendre en compte le capitalisme social dans la mesure de linfluence sur Twitter

Maximilien Danisch, Nicolas Dugué, Anthony Perez

MARAMI 2014

L’influence sur Twitter est un sujet particulièrement discuté avec l’explosion de l’utilisation de ce service de micro-blogging. En effet, afin de fouiller efficacement dans la masse de tweets produite par les millions d’utilisateurs de Twitter, de déterminer les tendances et les informations pertinentes, il est important de pouvoir détecter les utilisateurs influents. Ainsi, plusieurs outils fournissant un score d’influence ont été proposés et font référence. Cependant, les algorithmes utilisés par les sociétés qui les développent restent secrets. Dans des travaux récents, il a été montré que des comptes automatiques peuvent obtenir des scores élevés sans raison. De façon à étendre et compléter ces travaux, nous montrons que ces outils sont incapables de distinguer les utilisateurs réels de ceux appelés capitalistes sociaux, qui obtiennent à tort des scores d’influence élevés. Afin de résoudre ce problème, nous définissons un classifieur qui réalise cet objectif et rétablit ainsi des scores réalistes pour les capitalistes sociaux. Pour réaliser ce classifieur, nous avons réuni un jeu de données contenant des exemples de capitalistes sociaux et d’utilisateurs réguliers du réseau ainsi que leurs informations de profils et d’utilisation. Pour finir, nous avons développé une application en ligne qui utilise ce classifieur.

Download

On the Use of Intrinsic Time Scale for Dynamic Community Detection and Visualization in Social Networks

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand

Proceedings of the 8th IEEE International Conference on Research Challenges in Information Science (RCIS 2014)

The analysis of social networks is a challenging research area, in particular because of their dynamic features. In this paper, we study such evolving graphs through the evolution of their community structure. More specifically, we build on existing approaches for the identification of stable communities over time. This paper presents two contributions. We first propose a new way to compute such stable communities, using a different time scale, called intrinsic time. This intrinsic time is related to the dynamics of the graph (e.g., in terms of link appearance or disappearance) and independent from traditional (extrinsic) time units, like the second. We then show how visualization both at intrinsic and extrinsic time scales can help validating and interpreting the obtained communities. Our results are illustrated on a social network made of contacts among the participants of the 2006 edition of the Infocom conference.

Download

Controlling Information Flow in Social Networks

Soumajit Pramanik

Vendredi 03 octobre 2014 à 11h, salle 26-00/101 (Noguez)

Slides

Social information flow is basically the spread of any information among socially connected (friends, family, colleagues etc.) people. In real life, this type of information flow is very hard to capture but in case of digital world this phenomenon can be investigated with the help of Online Social Networks (OSNs) like Facebook, Twitter, Foursquare etc. In OSNs, whenever a user shares any information, her direct neighbors (friends/followers) can automatically get exposed to that and may decide to propagate it or not. This type of information propagation can be logged and used as a proxy of real-world social information diffusion. In case of information propagation in OSNs, there is a specific role of mediators/information-brokers who help to spread the information beyond the immediate reach of social neighbors. For instance, in Twitter, « Mention » is such a mediator utility. « Mention » is enabled in a tweet by adding « @username ». All the users mentioned in a tweet will receive a mention notification and are able to retrieve the tweet from their personal « Mention » tabs. So, by using « Mention », one can draw attention from a specific user (may not belong to his set of followers), or highlight a place or organization anytime. So, the main research question we are trying to address is- « how this mediators (e.g. « Mention ») facilitates any information flow in an OSN (e.g. Twitter). »

RankMerging: Learning to rank in large-scale social networks

Lionel Tabourier, Anne-Sophie Libert, and Renaud Lambiotte

2014, DyNakII, 2nd International Workshop on Dynamic Networks and Knowledge Discovery (PKDD 2014 workshop)

In this work, we consider the issue of unveiling unknown links in a social network, one of the difficulties of this problem being the small number of unobserved links in comparison of the total number of pairs of nodes. We define a simple supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. As an illustration, we apply the method to the case of a cell phone service provider, which uses the network among its contractors as a learning set to discover links existing among users of its competitors. We show that our method substantially improves the performance of unsupervised metrics of classification. Finally, we discuss how it can be used with additional sources of data, including temporal or semantic information

Download

Complexité de l’exploration des graphes dynamiques T-intervalle-connexes

Ahmed Wade

Jeudi 02 octobre 2014 à 11h, salle 25-26/101

Slides

Dans cet exposé, je vais parler de l’étude des graphes dynamiques T-intervalle-connexes du point de vue du temps nécessaire à leur exploration par une entité mobile (agent). Un graphe dynamique est T-intervalle-connexe (T >= 1) si pour chaque fenêtre de T unités de temps, il existe un sous-graphe couvrant connexe stable. Cette propriété de stabilité de connexion au cours du temps a été introduite par Kuhn, Lynch et Oshman (STOC 2010).

Community detection and Role extraction in networks

Arnaud Browet

Jeudi 18 septembre à 11h, salle 25-26/101

Slides

Network clustering, also known as graph partitioning, has focused the attention of many researchers during the last decade due to the increasing number of potential applications of network science. However, the amount of collected data makes their analysis a real challenge and dedicated algorithms must be developed in order to comprehend them. In this talk, I will first introduce recent developments on community detection and present a fast and highly parallelizable algorithm which outperforms existing methods in term of computational time. Yet, a community structure is not always representative of the actual distribution of the nodes in a network. For example, bipartite or cycle graphs often do not contain communities and are in fact more closely represented by what is known as a role structure. In a second part of the talk, I will present a pairwise node similarity measure which allows to extract those role structures and demonstrate its performance on synthetic and real graphs. Website: http://perso.uclouvain.be/arnaud.browet/ Author Thesis:http://perso.uclouvain.be/arnaud.browet/files/thesis/thesis.pdf  

DNS Monitoring, looking out for anomalies using the time frame Name – IP association

Lautaro Dolberg

Jeudi 09 octobre 2014 à 10h, salle 25-26/101

Slides

DNS is an essential service in the Internet as it allows to translate human language based domain names into IP addresses. DNS traffic reflects the user activities and behaviours It is thus a helpful source of information in the context of large scale network monitoring. In particular, passive DNS monitoring garnered much interest for the security perspectives by highlighting the services the machines want to access. Im going to show a method for assessing the dynamics of the match between DNS names and IP subnetworks using an efficient aggregating scheme combined with relevant steadiness metrics. The evaluation relies on real data collected over several months and is able to detect anomalies related to malicious domains.

Modeling time-varying multilayer networks (and beyond?)

Artur Ziviani

Jeudi 09 octobre 2014 à 11h, salle 25-26/101

Slides

The talk will present a recent on-going work on the modeling of time-varying multilayer networks developed collaboratively between LNCC in Brazil and ENS-Lyon/INRIA in France. The proposed model has a strong relationship with traditional directed graphs, thus leading to a useful theoretical framework for the analysis of complex networked systems. In the specific case of time-varying graphs, we show that this theoretical framework is a unifying model able to represent several previous (classes of) models for dynamic networks found in the recent literature, which in general are unable to represent each other.

Link Prediction and Threads in Email Networks

Qinna Wang

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

We tackle the problem of predicting future links in dynamic networks. For this, we work with the Debian Mailing Lists. In this dataset, a user can post a question to the debian list and other users can reply it by email forming a thread. We show that the number of threads shared in the past between users is a better feature to predict future email exchanges than classical features, like the number of common neighbors. We also show that the structure of a thread do not match the traditional definition of a community, particularly a thread does not have many triangles and has many outgoing connections. While the number of shared (detected) communities is also a better feature to predict future email exchanges than traditional features, is not as good as the number of shared threads. We believe our work should raise interests in characterizing and detecting thread-like structures in dynamic networks.

Download

On the importance of considering social capitalism when measuring influence on Twitter

Maximilien Danisch, Nicolas Dugué and Anthony Perez

2014 International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC 2014)

Influence on Twitter has drawn a lot of attention these past few years since this microblogging service is used to share, seek or debate about any kind of information. Several tools providing so-called influential scores have thus been proposed. However, the algorithms behind them are kept secret and it is not clear how they consider influence. Yet, many users rely on such tools to evaluate and even try to improve their influence in the Twitter network. In a recent work, it has been shown that automatic accounts can obtain high influential scores with no intuitive reason. Extending and completing this work, we show that such measures fail at distinguishing so-called social capitalists from real, truthful users. This enlights the fact that actual scores do not seem to consider the way followers and interactions are obtained on the network. To overcome this issue, we define a classifier that discriminates social capitalists from truthful users. To that aim, we crawled the Twitter network to gather examples of certified social capitalists and regular users and obtained features related to the profile and behavior of each user. We then use such a classifier to balance Klout’s score to adjust influential scores. We also developed an application that allows using our classifier online. We believe our work should raise the question of the legitimacy of influence on Twitter, and lead to significant improvements in the way it is measured.

Download

Learning a Proximity Measure to Complete a Community

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

In large-scale online complex networks (Wikipedia, Facebook, Twitter, etc.) finding nodes related to a specific topic is a strategic research subject. This article focuses on two central notions in this context: communities (groups of highly connected nodes) and proximity measures (indicating whether nodes are topologically close). We propose a parameterized proximity measure which, given a set of nodes belonging to a community, learns the optimal parameters and identifies the other nodes of this community, called multi-ego-centered community as it is centered on a set of nodes. We validate our results on a large dataset of categorized Wikipedia pages and on benchmarks, we also show that our approach performs better than existing ones. Our main contributions are (i) a new ergonomic parametrized proximity measure, (ii) the automatic tuning of the proximity’s parameters and (iii) the unsupervised detection of community boundaries.

Download

Direct Generation of Random Graphs Exactly Realising a Prescribed Degree Sequence

Darko Obradovi, Maximilien Danisch

The 6th International Conference on Computational Aspects of Social Networks (CASON2014)

This paper intends to extend the possibilites available to researchers for the evaluation of directed networks with the use of randomly generated graphs. The direct generation of a simple network with a prescribed degree sequence still seems to be an open issue, since the prominent configuration model usually does not realise the degree distribution exactly. We propose such an algorithm using a heuristic for node prioritisation. We demonstrate that the algorithm samples approximately uniformly. In comparison to the switching Markov Chain Monte Carlo algorithms, the direct generation of edges allows an easy modification of the linking behaviour in the random graph, introducing for example degree correlations, mixing patterns or community structure. That way, more specific random graphs can be generated (non-uniformly) in order to test hypotheses on the question, whether specific network features are due to a specific linking behaviour only. Or it can be used to generate series of synthetic benchmark networks with a specific community structure, including hierarchies and overlaps.

Download

Mining bipartite graphs to improve semantic paedophile activity detection

Raphaël Fournier, Maximilien Danisch

IEEE International Conference on Research Challenges in Information Science 2014 (RCIS2014)

Peer-to-peer (P2P) networks are popular to exchange large volumes of data through the Internet. Pedophile activity is a very important topic for our society and some< works have recently attempted to gauge the extent of pedophile exchanges on P2P networks. A key issue is to obtain an efficient detection tool, which may decide if a sequence of keywords is related to the topic or not. We propose to use social network analysis in a large dataset from a P2P network to improve a state-of-the-art filter for pedophile queries. We obtain queries and thus combinations of words which are not tagged by the filter but should be. We also perform some experiments to explore if the original four categories of paedophile queries were to be found by topological measures only.

Download

Comparing Pedophile Activity in Different P2P Systems

Raphaël Fournier, Thibault Cholez, Matthieu Latapy, Isabelle Chrisment, Clémence Magnien, Olivier Festor and Ivan Daniloff

Soc. Sci. 2014, 3(3), 314-325

Peer-to-peer (P2P) systems are widely used to exchange content over the Internet. Knowledge of pedophile activity in such networks remains limited, despite having important social consequences. Moreover, though there are different P2P systems in use, previous academic works on this topic focused on one system at a time and their results are not directly comparable. We design a methodology for comparing KAD and eDonkey, two P2P systems among the most prominent ones and with different anonymity levels. We monitor two eDonkey servers and the KAD network during several days and record hundreds of thousands of keyword-based queries. We detect pedophile-related queries with a previously validated tool and we propose, for the first time, a large-scale comparison of pedophile activity in two different P2P systems. We conclude that there are significantly fewer pedophile queries in KAD than in eDonkey (approximately 0.09% vs. 0.25%).

Download

In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

Sebastiano Vigna

Mercredi 28 mai 2014 à 11h, salle 25-26/101

Slides

We approach the problem of computing geometric centralities, such as closeness and harmonic centrality, on very large graphs; traditionally this task requires an all-pairs shortest-path computation in the exact case, or a number of breadth-first traversals for approximated computations, but these techniques yield very weak statistical guarantees on highly disconnected graphs. We rather assume that the graph is accessed in a semi-streaming fashion, that is, that adjacency lists are scanned almost sequentially, and that a very small amount of memory (in the order of a dozen bytes) per node is available in core memory. We leverage the newly discovered algorithms based on HyperLogLog counters, making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of closeness was attempted in the MapReduce framework, our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using “just” 2TiB of RAM. Moreover, the computations we describe are inherently parallelizable, and scale linearly with the number of available cores.

L’invention des concepts en informatique

Baptiste Mélès

Jeudi 12 juin 2014 à 11h, salle 25-26/101

Comment les informaticiens inventent-ils des concepts ? L’informaticien a parfois l’image d’un bricoleur d’une nouvelle génération, dont les concepts, en compromission permanente avec les contingences de la matière, n’auraient pas la pureté de sciences plus nobles que seraient la logique ou les mathématiques. Telle est la conception que nous remettrons en cause. Le philosophe et résistant Jean Cavaillès (1903-1944) a proposé dans son ouvrage posthume Sur la Logique et la théorie de la science (1947) ce que l’on peut voir comme une grammaire de l’invention des concepts mathématiques, dont les deux procédés principaux s’appellent paradigme et thématisation. En nous appuyant sur l’histoire des concepts de fichier et de processus de CTSS à Unix, nous montrerons que l’on retrouve ces deux procédés, et peut-être plus encore, dans l’invention des concepts informatiques. L’invention de concepts informatiques serait dès lors aussi pure, d’un point de vue rationnel, que l’invention des concepts mathématiques. Ceci nous conduira à caractériser l’informatique comme une science a priori — c’est-à-dire indépendante de toute expérience — à part entière, et dont l’apport est irréductible à la logique et aux mathématiques : bien plutôt qu’une science a posteriori des ordinateurs, elle peut être vue comme la science a priori de l’inscription des processus rationnels dans la matière. Nous conclurons en montrant sur quelques exemples en quoi l’informatique déborde largement la science des ordinateurs.  

Network Decomposition into Fixed Points of Degree Peeling

James Abello and François Queyroi

Social Network Analysis and Mining, Springer, 2014, 4 (1), pp.191

Degree peeling is used to study complex networks. It is a decomposition of the network into vertex groups of increasing minimum degree. However, the peeling value of a vertex is non-local in this context since it relies on the number of connections the vertex has to groups above it. We explore a different way to decompose a network into edge layers such that the local peeling value of the vertices on each layer does not depend on their non-local connections with the other layers. This corresponds to the decomposition of a graph into subgraphs that are invariant with respect to degree peeling, i.e. they are fixed points. We introduce a general method to partition the edges of an arbitrary graph into fixed points of degree peeling, called the iterative-edge-core decomposition. Information from this decomposition is used to formulate a novel notion of vertex diversity based on Shannon’s entropy. We illustrate the usefulness of this decomposition on a variety of social networks including weighted graphs. Our method can be used as a preprocessing step for community detection and graph visualization.

Download