Predicting links in ego-networks using temporal information

Lionel Tabourier, Anne-Sophie Libert and Renaud Lambiotte

In EPJ Data Science (2016) 5: 1

Link prediction appears as a central problem of network science, as it calls for unfolding the mechanisms that govern the micro-dynamics of the network. In this work, we are interested in ego-networks, that is the mere information of interactions of a node to its neighbors, in the context of social relationships. As the structural information is very poor, we rely on another source of information to predict links among egos’ neighbors: the timing of interactions. We define several features to capture different kinds of temporal information and apply machine learning methods to combine these various features and improve the quality of the prediction. We demonstrate the efficiency of this temporal approach on a cellphone interaction dataset, pointing out features which prove themselves to perform well in this context, in particular the temporal profile of interactions and elapsed time between contacts.

Download

Characterizing and predicting mobile application usage

Keun-Woo Lim, Stefano Secci, Lionel Tabourier and Badis Tebbani

In Computer Communications, 2016, vol. 95, p. 82-94

In this paper, we propose data clustering techniques to predict temporal characteristics of data consumption behavior of different mobile applications via wireless communications. While most of the research on mobile data analytics focuses on the analysis of call data records and mobility traces, our analysis concentrates on mobile application usages, to characterize them and predict their behavior. We exploit mobile application usage logs provided by a Wi-Fi local area network service provider to characterize temporal behavior of mobile applications. More specifically, we generate daily profiles of “what” types of mobile applications users access and “when” users access them. From these profiles, we create usage classes of mobile applications via aggregation of similar profiles depending on data consumption rate, using three clustering techniques that we compare. Furthermore, we show that we can utilize these classes to analyze and predict future usages of each mobile application through progressive comparison using distance and similarity comparison techniques. Finally, we also detect and exploit outlying behavior in application usage profiles and discuss methods to efficiently predict them.

Download

Analysis of the temporal and structural features of threads in a mailing-list

Noé Gaumont, Tiphaine Viard, Raphaël Fournier-S’niehotta, Qinna Wang and Matthieu Latapy

In Complex Networks VII: Proceedings of the 2016 Workshop on Complex Networks.

A link stream is a collection of triplets (t,u,v) indicating that an interaction occurred between u and v at time t. Link streams model many real-world situations like email exchanges between individuals, connections between devices, and others. Much work is currently devoted to the generalization of classical graph and network concepts to link streams. In this paper, we generalize the existing notions of intra-community density and inter-community density. We focus on emails exchanges in the Debian mailing list, and show that threads of emails, like communities in graphs, are dense subsets loosely connected from a link stream perspective.

Computing maximal cliques in link streams

Tiphaine Viard, Matthieu Latapy and Clémence Magnien

Theoretical Computer Science (TCS), Volume 609, Part 1, 4 January 2016, Pages 245–252.

A link stream is a collection of triplets (t, u, v) indicating that an interaction occurred between u and v at time t. We generalize the classical notion of cliques in graphs to such link streams: for a given , a -clique is a set of nodes and a time interval such that all pairs of nodes in this set interact at least once during each sub-interval of duration . We propose an algorithm to enumerate all maximal (in terms of nodes or time interval) cliques of a link stream, and illustrate its practical relevance on a real-world contact trace.

Efficient and simple generation of random simple connected graphs with prescribed degree sequence

Fabien Viger and Matthieu Latapy

Journal of Complex Networks (2015) 4 (1): 15-37

We address here the problem of generating random graphs uniformly from the set of simple connected graphs having a prescribed degree sequence. Our goal is to provide an algorithm designed for practical use both because of its ability to generate very large graphs (efficiency) and because it is easy to implement (simplicity). We focus on a family of heuristics for which we introduce optimality conditions, and show how this optimality can be reached in practice. We then propose a different approach, specifically designed for real-world degree distributions, which outperforms the first one. Based on a conjecture which we argue rigorously and which was confirmed by strong empirical evidence, we finally reduce the best asymptotic complexity bound known so far.

RankMerging: Apprentissage supervisé de classements pour la prédiction de liens dans les grands réseaux sociaux

Lionel Tabourier, Anne-Sophie Libert et Renaud Lambiotte

EGC 2015, 15ème conférence internationale sur l’extraction et la gestion des connaissances

Trouver les liens manquants dans un grand réseau social est une tâche difficile, car ces réseaux sont peu denses, et les liens peuvent correspondre à des environnements structurels variés. Dans cet article, nous décrivons RankMerging, une méthode d’apprentissage supervisé simple pour combiner l’information obtenue par différentes méthodes de classement. Afin d’illustrer son intérêt, nous l’appliquons à un réseau d’utilisateurs de téléphones portables, pour montrer comment un opérateur peut détecter des liens entre les clients de ses concurrents. Nous montrons que RankMerging surpasse les méthodes à disposition pour prédire un nombre variable de liens dans un grand graphe épars.

Revealing intricate properties of communities in the bipartite structure of online social networks

Raphaël Tackx, Jean-Loup Guillaume and Fabien Tarissan

In IEEE Ninth International Conference on Research Challenges in Information Science (RCIS’15), Athènes, Greece, 2015

Many real-world networks based on human activities exhibit a bipartite structure. Although bipartite graphs seem appropriate to analyse and model their properties, it has been shown that standard metrics fail to reproduce intricate patterns observed in real networks. In particular, the overlapping of the neighbourhood of communities is difficult to capture precisely. In this work, we tackle this issue by analysing the structure of 4 real-world networks coming from online social activities. We first analyse their structure using standard metrics. Surprisingly, the clustering coefficient turns out to be less relevant than the redundancy coefficient to account for overlapping patterns. We then propose new metrics, namely the dispersion and the monopoly coefficients, and show that they help refining the study of bipartite overlaps. Finally, we compare the results obtained on real networks with the ones obtained on random bipartite models. This shows that the patterns captured by the redundancy and the dispersion coefficients are strongly related to the real nature of the observed overlaps.

Download

Temporal properties of legal decision networks: a case study from the International Criminal Court

Fabien Tarissan and Raphaëlle Nollez-Goldbach

In 28th International Conference on Legal Knowledge and Information Systems (JURIX’15), Braga, Portugal, 2015.

Many studies have proposed to apply artificial intelligence techniques to legal networks, whether it be for highlighting legal reasoning, resolving conflict or extracting information from legal databases. In this context, a new line of research has recently emerged which consists in considering legal decisions as elements of complex networks and conduct a structural analysis of the relations between the decisions. It has proved to be efficient for detecting important decisions in legal rulings. In this paper, we follow this approach and propose to extend structural analyses with temporal properties. We define in particular the notion of relative in-degree, temporal distance and average longevity and use those metrics to rank the legal decisions of the two first trials of the International Criminal Court. The results presented in this paper highlight non trivial temporal properties of those legal networks, such as the presence of decisions with an unexpected high longevity, and show the relevance of the proposed relative in-degree property to detect landmark decisions. We validate the outcomes by confronting the results to the one obtained with the standard in-degree property and provide juridical explanations of the decisions identified as important by our approach.

Download

Augmenter les retweets sur Twitter : comment tirer parti des mentions ?

Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Mohit Sharma, Sumanth Bandi, Jean-Loup Guillaume, Stéphane Raux and Bivas Mitra

6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015

Alors que Twitter est devenu incontournable, la propagation des tweets et hashtags est toujours largement incomprise. Le propagation d’information sur Twitter est principalement due aux retweets et aux mentions mais, alors que les retweets ne permettent d’atteindre que les abonnés d’un individu, les mentions permettent d’atteindre n’importe qui directement. De nombreuses études ont montré que les mentions sont largement utilisées sur Twitter, mais surtout qu’elles sont fondamentales pour augmenter la popularité des tweets et hashtags. Des méthodes automatiques pour choisir les bons utilisateurs à mentionner pourraient donc permettre d’augmenter la visibilité des tweets. Dans cet article nous proposons un système de recommandation de mentions en temps réel pour aug- menter la popularité d’un tweet. Ce système est basé sur un modèle de propagation de tweet dans un graphe multiplexe construit à partir d’une étude de données réelles. Il permet de clairement faire la différence entre les propagations dues aux mentions et celles dues aux abonnements. Les simulations du modèle donnent des résultats similaires aux observations empiriques et sont également fondées sur des résultats analytiques. En utilisant ces différents résultats nous proposons une stratégie de recom- mandation effective et une application Twitter associée.

Download

Déplier la structure communautaire dun réseau en mesurant la proximité aux représentants de communauté

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015

Nous proposons un algorithme pour déplier la structure communautaire des grands graphes de terrain. L’algorithme est basé sur la détection de la communauté de chaque représentant communau- taire : nœud contenu dans une seule communauté et important en son sein. Cette détection est faite avec une approche à base de mesure de proximité développée récemment. Par comparaison avec d’autres méthodes de l’état de l’art nous montrons que notre algorithme a des performances équivalentes voire meilleures et est capable de traiter les plus grands graphes de terrain.

Download

Détection de communautés dans les flots de liens par optimisation de la modularité

Emmanuel Orsini

6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015.

L’article qui suit propose de donner un sens à la modularité dans les flots de liens et ainsi de bénéficier de certaines de ses propriétés, et des heuristiques qui l’optimisent. Cette no- tion de modularité aboutira après quelques simplifications à un algorithme capable de calculer une partition sur un jeu de données de 400 000 emails. Pour ce faire on construira une nou- velle modélisation où le temps est complètement continu, sur laquelle la modularité se définit naturellement et de manière pertinente. Cette modélisation apporte une nouvelle interpretation des réseaux dynamiques, qui se veut suffisamment générale pour s’adapter à différents types de données.

Download

Temporal Patterns of Pedophile Activity in a P2P Network: First Insights about User Profiles from Big Data

Raphaël Fournier and Matthieu Latapy

International Journal of Internet Science ARTICLE IN PRES S 2015, 10 (1), ISSN 1662-5544

Recent studies have shown that child abuse material is shared through peer-to-peer (P2P) networks, which allow users to exchange files without a central server. Obtaining knowledge on the extent of this activity has major consequences for child protection, policy making and Internet regulation. Previous works have developed tools and analyses to provide overall figures in temporally-limited measurements. Offenders’ behavior is mostly studied through small-scale interviews and there is few information on the times at which they engage in such activity. Here we show that the proportion of search-engine queries for pedophile content gradually has grown by a factor of almost 3 in three years. We also find that during the day, certain hours are, on average, privileged by seekers. Our results demonstrate that P2P networks are actively used to search for pedophile content and we find new and large-scale results on pedophile offenders’ profile, indicating that a substantial proportion is well-integrated into family life and professional work activities.

Download

Time Evolution of the Importance of Nodes in dynamic Networks

Clémence Magnien and Fabien Tarissan.

In proceedings of the International Symposium on Foundations and Applications of Big Data Analytics (FAB), in conjunction with ASONAM, 2015.

For a long time now, researchers have worked on defining different metrics able to characterize the importance of nodes in networks. Among them, centrality measures have proved to be pertinent as they relate the position of a node in the structure to its ability to diffuse an information efficiently. The case of dynamic networks, in which nodes and links appear and disappear over time, led the community to propose extensions of those classical measures. Yet, they do not investigate the fact that the network structure evolves and that node importance may evolve accordingly. In the present paper, we propose temporal extensions of notions of centrality, which take into account the paths existing at any given time, in order to study the time evolution of nodes’ importance in dynamic networks. We apply this to two datasets and show that the importance of nodes does indeed vary greatly with time. We also show that in some cases it might be meaningless to try to identify nodes that are consistently important over time, thus strengthening the interest of temporal extensions of centrality measures.

Download

A reliable and evolutive web application to detect social capitalists

Nicolas Dugué, Anthony Perez, Maximilien Danisch, Florian Bridoux, Amélie Daviau, Tennessy Kolubako, Simon Munier and Hugo Durbano.

IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2015, Paris. (Demo track paper)

On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in order to gain followers and to be retweeted. Their methods are successful enough to make them appear as influent users. Indeed, applications dedicated to the influence measurement such as Klout and Kred give high scores to most of these users. Meanwhile, their high number of retweets and followers are not due to the relevance of the content they tweet, but to their social capitalism techniques. In order to be able to detect these users, we train a classifier using a dataset of social capitalists and regular users. We then implement this classifier in a web application that we call DDP. DDP allows users to test whether a Twitter account is a social capitalist or not and to visualize the data we use to make the prediction. DDP allows administrator to crawl data from a lot of users automatically. Furthermore, administrators can manually label Twitter accounts as social capitalists or regular users to add them into the dataset. Finally, administrators can train new classifiers in order to take into account the new Twitter accounts added to the dataset, and thus making evolve the classifier with these new recently collected data. The web application is thus a way to collect data, make evolve the knowledge about social capitalists and to keep detecting them efficiently.

Download

Revealing contact patterns among high-school students using maximal cliques in link streams

Tiphaine Viard, Matthieu Latapy, Clémence Magnien

First International Workshop on Dynamics in Networks (DyNo), in conjunction with ASONAM, 2015.

Interaction traces between humans are usually rich in information concerning the patterns and habits of individuals. Such datasets have been recently made available, and more and more researchers address the new questions raised by this data. A link stream is a sequence of triplets (t, u, v) indicating that an interaction occurred between u and v at time t, and as such is a natural representation of these data. We generalize the classical notion of cliques in graphs to such link streams: for a given , a -clique is a set of nodes and a time interval such that all pairs of nodes in this set interact at least every during this time interval. We proceed to compute the maximal -cliques on a real-world dataset of contact among students, and show how it can bring new interpretation to patterns of contact.

Download

Suppression Distance Computation for Hierarchical Clusterings

François Queyroi, Sergey Kirgizov

Information Processing Letters, Volume 115, Issue 9, September 2015, Pages 689–693 http://www.sciencedirect.com/science/article/pii/S0020019015000678

We discuss  the computation  of a  distance between  two hierarchical clusterings of the  same set. It is defined as  the minimum number of  elements that  have to  be removed so  the remaining  clusterings are  equal. The problem of distance  computing was extensively studied for partitions. We prove it can be  solved in polynomial time in the case of hierarchies  as it gives  birth to a  class of perfect  graphs. We  also  propose an  algorithm  based on  recursively computing  maximum assignments.

Download

Calcul de cliques maximales dans les flots de liens

Tiphaine Viard, Matthieu Latapy et Clémence Magnien

ALGOTEL 2015 — 17èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications, juin 2015, Beaune, France

Un flot de liens est une séquence de triplets (t,u,v), signifiant que u et v ont interagi au temps t. Nous généralisons la notion de cliques à ces flots de liens : pour un delta donné, une delta-clique est un ensemble de nœuds et un intervalle de temps, tels que toutes les paires de nœuds dans cet ensemble interagissent au moins tous les delta sur cet intervalle. Nous proposons un premier algorithme permettant d’énumérer les delta-cliques dans un flot de liens.

Download

An empirical approach towards an efficient whom to mention? Twitter app

Soumajit Pramanik, Maximilien Danisch, Qinna Wang and Bivas Mitra

[extended abstract] Twitter for Research, 1st International & Interdisciplinary Conference, 2015

We developed a Twitter app to suggest users to mention in a tweet in order to maximise the spread of an information. Users that are popular, active on Twitter and interested in the content of the tweet are targeted. The problem is mapped to the knapsack problem, the length of the screen name of a user being an important variable. Collected data (who retweets among the suggested users and features of these users) will be used to improve the app and theory/models of information spread on Online Social Networks. The application is available at: http://bit.ly/1BKZURE

Download

On the Termination of Some Biclique Operators on Multipartite Graphs.

Christophe Crespelle, Matthieu Latapy, Ha Duong Phan.

Discrete Applied Mathematics, Volume 195, 20 November 2015, Pages 59–73

We define a new graph operator, called the weak-factor graph, which comes from the context of complex network modelling. The weak-factor operator is close to the well-known clique-graph operator but it rather operates in terms of bicliques in a multipartite graph. We address the problem of the termination of the series of graphs obtained by iteratively applying the weak-factor operator starting from a given input graph. As for the clique-graph operator, it turns out that some graphs give rise to series that do not terminate. Therefore, we design a slight variation of the weak-factor operator, called clean-factor, and prove that its associated series terminates for all input graphs. In addition, we show that the multipartite graph on which the series terminates has a very nice combinatorial structure: we exhibit a bijection between its vertices and the chains of the inclusion order on the intersections of the maximal cliques of the input graph.

Download

Expected Nodes: a quality function for the detection of link communities

Noé Gaumont, François Queyroi, Clémence Magnien et Matthieu Latapy.

In Complex Networks VI (pp. 57-64). Springer International Publishing. 2015

Many studies use community detection algorithms in order to understand complex networks. Most papers study node communities, i.e. groups of nodes, which may or may not overlap. A widely used measure to evaluate the quality of a community structure is the modularity. However, sometimes it is also relevant to study link partitions rather than node partitions. In order to evaluate a link partition, we propose a new quality function: Expected Nodes. Our function is based on the same inspiration as the modularity and compares, for a given link group, the number of incident nodes to the expected one. In this short note, we discuss the advantages and drawbacks of our quality function compared to other ones on synthetics graphs. We show that Expected Nodes is able to pass some fundamental sanity criteria and is the one that best identifies the most relevant partition in a more realistic context.

Download