OTMedia+ : Graphes et Propagation d’information

Nicolas Hervé

9 Juillet, 2019, 11:00hrs. Salle 26-00/332, Jussieu.

OTMedia (Observatoire TransMedia) est une plateforme logicielle dédiée aux projets de recherche qui permet d’analyser de grandes quantités de données diverses, multimodales, transmédia liées à l’actualité française et francophone. OTMedia collecte, traite et indexe en permanence des milliers de flux provenant de la télévision, de la radio, du Web, de la presse, des agences de presse et de Twitter. Dans le contexte de ce projet, nous souhaiterions étudier la propagation d’informations et d’images sur le Web en utilisant la théorie des graphes pour nous aider à extraire les caractéristiques/indicateurs pour décrire les événements médiatiques.

Degree-based Outlier Detection within IP Traffic Modelled as a Link Stream (extended version)

Audrey Wilmet, Tiphaine Viard, Matthieu Latapy and Robin Lamarche-Perrin

Computer Networks, 2019

This paper aims at precisely detecting and identifying anomalous events in IP traffic. To this end, we adopt the link stream formalism which properly captures temporal and structural features of the data. Within this framework, we focus on finding anomalous behaviours with respect to the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed heterogeneously, preventing us to directly find anomalies. To deal with this challenge, we design a method to detect outliers as well as precisely identify their cause in a sequence of similar heterogeneous distributions. We apply it to several MAWI captures of IP traffic and we show that it succeeds in detecting relevant patterns in terms of anomalous network activity.

Download

Drawing and Visualising Event-Based Dynamic Graphs.

Daniel Archambault

May 27th, 11h Room 24-25-405. UPMC – Sorbonne Université. 4 Place de Jussieu, 75005 Paris.

One of the most important types of data in data science is the graph or network. Networks encode relationships between entities:  people in social network, genes in biological network, and many others forms of data.  These networks are often dynamic and consist of a set of events — edges/nodes with individual timestamps.  In the complex network literature, these networks are often referred to as temporal networks.  As an example, a post to a social media service creates an edge existing at a specific time and a series of posts is a series of such events.  However, the majority of dynamic graph visualisations use the timeslice, a series of snapshots of the network at given times, as a basis for visualisation. In this talk, I present two recent approaches for event-based network visualisation:  DynNoSlice and the Plaid. DynNoSlice is a method for embedding these networks directly in the 2D+t space-time cube along with methods to explore the contents of the cube.  The Plaid is an interactive system for visualising long in time dynamic networks and interaction provenance through interactive timeslicing.

Combining path-constrained random walks to recover link weights in heterogeneous information networks

Hong-Lan Botterman and Robin Lamarche-Perrin

CompleNet, 2019

Heterogeneous information networks (HIN) are abstract representations of systems composed of multiple types of entities and their relations. Given a pair of nodes in a HIN, this work aims at recovering the exact weight of the incident link to these two nodes, knowing some other links present in the HIN. Actually, this weight is approximated by a linear combination of probabilities, results of path-constrained random walks i.e., random walks where the walker is forced to follow only a specific sequence of node types and edge types which is commonly called a meta path, performed on the HIN. This method is general enough to compute the link weight between any types of nodes. Experiments on Twitter data show the applicability of the method.

Download

Multidimensional Outlier Detection in Interaction Data: Application to Political Communication on Twitter

Audrey Wilmet and Robin Lamarche-Perrin

CompleNet, 2019

We introduce a method which aims at getting a better understanding of how millions of interactions may result in global events. Given a set of dimensions and a context, we find different types of outliers: a user during a given hour which is abnormal compared to its usual behavior, a relationship between two users which is abnormal compared to all other relationships, etc. We apply our method on a set of retweets related to the 2017 French presidential election and show that one can build interesting insights regarding political organization on Twitter.

Download

RankMerging: a supervised learning-to-rank framework to predict links in large social networks

Lionel Tabourier, Daniel F. Bernardes, Anne-Sophie Libert and Renaud Lambiotte

Machine Learning, 2019

Uncovering unknown or missing links in social networks is a difficult task because of their sparsity and because links may represent different types of relationships, characterized by different structural patterns. In this paper, we define a simple yet efficient supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. We illustrate our method on three different kinds of social networks and show that it substantially improves the performances of unsupervised methods of ranking as well as standard supervised combination strategies. We also describe various properties of RankMerging, such as its computational complexity, its robustness to feature selection and parameter estimation and discuss its area of relevance: the prediction of an adjustable number of links on large networks.

Download

Applications through human mobility lens

Vsevolod Salnikov

jeudi 4 avril 2019, 14h, salle 26-00/332, LIP6, Sorbonne Université

Slides

In this talk I will present various data-oriented projects we have done recently. The general line will focus on human mobility sensing and different applications of such datasets from more theoretical ones towards extremely applied, which are on the border of research and commercial activities.Moreover we will discuss different stages: from data collection towards models and application as well as the ‘in-the-field’ validation of model predictions. I will propose few ways of data collection, which permitted to get impressive and reliable datasets with almost no cost. These datasets are already used for studies, but I would be also happy to discuss various applications and ways to collaborate!

Modélisation du contrôle des utilisateurs sur leurs données personnelles

Pablo Rauzy

Vendredi 12 avril 2019, 11h, salle 25-26/105, LIP6, Sorbonne Université

Du point de vue d’un utilisateur ou d’une utilisatrice d’un système d’informations, la privacy correspond au contrôle qu’il ou elle peut exercer sur ses données personnelles dans ce système. Cette vision de la privacy est essentielle si l’on veut contribuer au développement de technologies émancipatrices, c’est à dire aux services de leurs utilisateurs et utilisatrices seulement. L’étude et l’évaluation rigoureuse de la privacy offerte par un système nécessite donc une caractérisation formelle de ce contrôle. Nous proposons un cadre formel basé sur des capacités qui permet de spécifier et de raisonner sur ce contrôle et ses propriétés. Nous verrons au travers d’exemples que cela permet notamment la comparaison de mises en oeuvre alternatives d’un même système (un réseau social basique dont nous comparons trois implémentations possibles), et donc la possibilité d’étudier et d’optimiser la privacy dès la phase de conception.

An information-theoretic framework for the lossy compression of link streams

Robin Lamarche-Perrin

Theoretical Computer Science, 2019

Graph compression is a data analysis technique that consists in the replacement of parts of a graph by more concise structural patterns in order to reduce its description length. It notably provides interesting exploration tools for the study of real, large-scale, and complex graphs which cannot be grasped at first glance. This article proposes a framework for the compression of temporal graphs, that is for the compression of graphs that evolve with time. This framework first builds on a simple and limited scheme, exploiting structural equivalence for the lossless compression of static graphs, then generalises it to the lossy compression of link streams, a recent formalism for the study of temporal graphs. Such generalisation builds on the natural extension of (bidimensional) relational data by the addition of a third temporal dimension. Moreover, we introduce an information-theoretic measure to quantify and to control the information that is lost during compression, as well as an algebraic characterisation of the space of possible compression patterns to enhance the expressiveness of the initial compression scheme. These contributions lead to the definition of a combinatorial optimisation problem, that is the Lossy Multistream Compression Problem, for which we provide an exact algorithm.

Comparaison des méthodes de classification pour l’identification des noeuds importants dans les graphes dynamiques

Marwan Ghanem

Rencontres jeunes chercheurs en RI, 2019

De nos jours, nous nous intéressons à la détection d’entités importantes, ceci peut être des mots-clés importants dans un document ou Twitter, ou des individus importants dans un réseau de mouvement. Nous pouvons modéliser ces données sous la forme d’un graphe dynamique et utiliser des métriques de centralité telle que la centralité de proximité temporelle. Malheureusement, cela peut être coûteux. Dans ce travail, nous comparons la précision de plusieurs méthodes de classification supervisée, les unes par rapport aux autres, à la détection de ces nœuds importants. Sur seize jeux de données de natures différentes, nous montrons que ces méthodes réussissent à différencier les nœuds importants de nœuds insignifiants. Nous montrons également que prendre en compte la nature des données diminue la qualité de résultats. Enfin, nous examinons le temps du calcul de chacune de ces méthodes contre le temps du calcul de méthodes exact.

Download

Neighbour-distinguishing decompositions of graphs

Mohammed SENHAJI

Vendredi 15 mars 2019, 14hrs, salle 25-26/105, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

The main question that we explore was introduced by Karonski, Luczak and Thomason in 2004 : Can we weight the edges of a graph G , with weights 1 ,2 , and 3 , such that any two of adjacent vertices of G are distinguished by the sum of their incident weights ? This question later becomes the famous 1-2-3 Conjecture.In this presentation we explore several variants of the 1-2-3 Conjecture, and their links with locally irregular decompositions. We are interested in both optimisation results and algorithmic problems. We first introduce an equitable version of the neighbour-sum-distinguishing edge-weightings, that is a variant where we require every edge weight to be used the same number of times up to a difference of 1. After that we explore how neighbour-sum-distinguishing weightings behave if we require sums of neighbouring vertices to differ by at least 2. Namely, we present results on the smallest maximal weight needed to construct such weightings for some classes of graphs, and study some algorithmic aspects of this problem. Due to the links between neighbour-sum-distinguishing edge weightings and locally irregular decompositions, we also explore the locally irregular index of subcubic graphs, along with other variants of the locally irregular decomposition problem. Finally, we present a more general work toward a general theory unifying neighbour-sum-distinguishing edge-weightings and locally irregular decompositions.

Minorities in Networks

Claudia Wagner

Lundi 28 janvier 2018, 11hrs, salle 24-25/405, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

Networks are the infrastructure of our social and professional life andalso of modern information systems where billions of documents andentities are interlinked. However, not all nodes are equal in thesenetworks. Often we observe attributes (e.g. gender or ethnicity) thatdefine the group membership of a node. In this talk I will explore therole of minorities in social networks and information networks, provideempirical evidence for the disadvantage of minorities and discussfactors that may place minorities at a disadvantage.

What graphs can contribute to a more transparent artificial intelligence

Tiphaine Viard

January 17th 2019, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

AI and machine learning are commonly described as « black boxes » that are efficient, but opaque. While complete opacity would be an exageration, it is true that many methods for explainability rely on forms of retro-engineering: we try to infer the model from its (partial, intermediary, final) results. These methods are typically based on large-scale, efficient matrix manipulation. Graphs and their extensions have shown to be visualisable and interpretable, even at large scales. In their classical formulation, they are also very similar to matrices. However, few to no machine learning method explored what graphs could contribute to its models.  This is partly due to the fact that graph computations have long been expensive, typically having polynomial running times, which is incompatible with the scale of data in most of today’s machine learning applications. However, the situation has changed: (i) the impact of AI on society makes it no longer acceptable to favour efficiency despite transparency, and (ii) recent advances in algorithmic methods on graphs demonstrates that due to the nature of real-world graphs, even some NP-hard problems become tractable. The aim of this talk is to explore this avenue of research. We will discuss the state-of-the art in learning from graph data, present some recent results showing that structure-based features indeed have the potential to make machine learning more transparent at no extra cost, and finally we will discuss future tracks of research.

Easy-Mention: a model-driven mention recommendation heuristic to boost your tweet popularity

Soumajit Pramanik, Mohit Sharma, Maximilien Danisch, Qinna Wang, Jean‑Loup Guillaume, Bivas Mitra

International Journal of Data Science and Analytics, vol. 7 (2), 2018

This paper investigates the role of mentions on tweet propagation. We propose a novel tweet propagation model SIR MF based on a multiplex network framework which allows to analyze the effects of mentioning on final retweet count. The basic bricks of this model are supported by a comprehensive study of multiple real datasets, and simulations of the model show a nice agreement with the empirically observed tweet popularity. Studies and experiments also reveal that follower count, retweet rate and profile similarity are important factors for gaining tweet popularity and allow to better understand the impact of the mention strategies on the retweet count. Interestingly, we experimentally identify a critical retweet rate regulating the role of mention on the tweet popularity. Finally, our data-driven simulations demonstrate that the proposed mention recommendation heuristic Easy-Mention outperforms the benchmark Whom-To-Mention algorithm.

Download

A Modular Overlapping Community Detection Algorithm: Investigating the « From Local to Global » Approach

Maximilien Danisch, Noé Gaumont, Jean‑Loup Guillaume

16th Cologne-Twente Workshop on Graphs and Combinatorial Optimization, 2018

We propose an overlapping community detection algorithm following a “from local to global approach”: our algorithm finds local communities one by one by repetitively optimizing a quality function that measures the quality of a community. Then, as some extracted local communities can be very similar to each-other, a cleaning procedure is applied to obtain the global overlapping community structure. Our algorithm depends on three modules: (i) a quality function, (ii) an optimization heuristic and (iii) a cleaning procedure. Various such modules can be independently plugged in. We show that, using default modules, our algorithm improves over a state-of-the-art method on some real-world graphs with ground truth communities. In the future we would like to study which combination of modules performs best in practice and make our code parallel.

Download

Pattern Matching in Link Streams: a Token-based Approach

Clément Bertrand, Hanna Klaudel, Matthieu Latapy et Frédéric Peschanski

Petri Nets, 2018

Link streams model the dynamics of interactions in complex distributed systems as sequences of links (interactions) occurring at a given time. Detecting patterns in such sequences is crucial for many ap- plications but it raises several challenges. In particular, there is no generic approach for the specification and detection of link stream patterns in a way similar to regular expressions and automata for text patterns. To address this, we propose a novel automata framework integrating both timed constraints and finite memory together with a recognition algo- rithm. The algorithm uses structures similar to tokens in high-level Petri nets and includes non-determinism and concurrency. We illustrate the use of our framework in real-world cases and evaluate its practical per- formances.

Listing k-cliques in Sparse Real-World Graphs

Maximilien Danisch, Oana Balalau and Mauro Sozio

WWW, 2018

Motivated by recent studies in the data mining community whichrequire to efficiently list allk-cliques, we revisit the iconic algorithmof Chiba and Nishizeki and develop the most efficient parallel algo-rithm for such a problem. Our theoretical analysis provides the bestasymptotic upper bound on the running time of our algorithm forthe case when the input graph is sparse. Our experimental evalua-tion on large real-world graphs shows that our parallel algorithm isfaster than state-of-the-art algorithms, while boasting an excellentdegree of parallelism. In particular, we are able to list allk-cliques(for anyk) in graphs containing up to tens of millions of edges aswell as all10-cliques in graphs containing billions of edges, withina few minutes and a few hours respectively. Finally, we show howour algorithm can be employed as an effective subroutine for find-ing thek-clique core decomposition and an approximatek-clique densest subgraphs in very large real-world graphs.

La sainte famille des Cahiers du cinéma

Olivier Alexandre

Vrin, Philosophie et cinéma, 2018

Plus célèbre revue de cinéma au monde, les « Cahiers » occupent une place singulière dans le domaine de la critique. De crises en renaissances, ils continuent d’incarner un passé élevé au rang de mythe. Leur capacité à marier les contraires, entre gloire et marginalité, sens aigu de l’histoire et rendezvous manqués, révèle la part tragique du critique : ce travailleur sans métier, auteur sans profession, ni cinéaste ni enseignant, pas tout à fait journaliste ni complétement écrivain. À partir d’une enquête auprès de collaborateurs passés par les Cahiers du cinéma au cours des 50 dernières années, ce livre propose une réponse à cette question laissée en suspens depuis leur fondation : qu’est-ce qu’un critique?