Pattern Matching in Link Streams: Timed-Automata with Finite Memory

Clément Bertrand, Frédéric Peschanski, Hanna Klaudel and Matthieu Latapy

Scientific Annals of Computer Science, 2018

Link streams model the dynamics of interactions in complex distributed systems as sequences of links (interactions) occurring at a given time. Detecting patterns in such sequences is crucial for many applications but it raises several challenges. In particular, there is no generic approach for the specification and detection of link stream patterns in a way similar to regular expressions and automata for text patterns. To address this, we propose a novel automata framework integrating both timed constraints and finite memory together with a recognition algorithm. The algorithm uses structures similar to tokens in high-level Petri nets and includes non-determinism and concurrency. We illustrate the use of our framework in real-world cases and evaluate its practical performances.

Download

Degree-based Outliers Detection within IP Traffic Modelled as a Link Stream

Audrey Wilmet, Tiphaine Viard, Matthieu Latapy and Robin Lamarche-Perrin

TMA Conference 2018, Vienna

Precise detection and identification of anomalous events in IP traffic are crucial in many applications. This paper intends to address this task by adopting the link stream formalism which properly captures temporal and structural features of the data. Within this framework we focus on finding anomalous behaviours with the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed heterogeneously, preventing us to find anomalies. To deal with this challenge, we design a method to detect outliers as well as precisely identify their cause in a sequence of similar heterogeneous distributions. We apply it to a MAWI capture of IP traffic and we show that it succeeds at detecting relevant patterns in terms of anomalous network activity.Degree-based Outliers Detection within IP Traffic Modelled as a Link Stream

Download

Examining Supreme Court Networks to Understand its Operation

Patrick Doreian

Friday, June 22th 2018, 14h, salle 26-00/101

The nature of the Supreme Court, its decisions, the principle of Stare Decisis are described. This is followed by a listing of the Supreme Court networks that are considered. This includes: the citation network of decisions citing earlier decisions (1789-2001); coherent clusters of decisions obtained by examining the extent to which they are co-cited by later decisions; signed two-mode networks featuring Justices and their votes for annual terms of the Court; one-mode signed networks of justices with the extent to which they vote with or against each other; and signed networks of Courts overturning decisions of prior Courts or their own decisions. Strategies for analyzing these networks are presented along with the subsequent results. Throughout, the network results are linked to substantive topics and constitutional principles. A long-term research agenda is presented.

Hierarchical Clustering Beyond the Worst-Case

Vincent Cohen-Addad

Wednesday, May 16th, 2018, 11h, salle 26-00/101, Campus Jussieu

Slides

Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, Dasgupta recently proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective. In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic block model (HSBM), and show that in certain regimes the SVD approach of McSherry combined with specific linkage methods results in a clustering that give an O(1) approximation to Dasguptas cost function. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.

Applications of Graph Sampling

Shweta Jain

Thursday, May 3th, 2018, 11h, salle 26-00/101, Campus Jussieu

Slides

Large graphs have become commonplace making many of the traditional graph-theoretic algorithms infeasible. Moreover, sometimes we don’t have access to the whole graph. This has led us to revisit graph algorithms and necessitated graph sampling. In this talk, I will explore 2 applications of graph sampling. In the first application, called TurnShadow, we propose a method for efficient counting of k-cliques. It uses the classic result called Turn’s theorem to give provable error bounds. We also do extensive evaluation of the method on real-world graph instances and demonstrate that it is fast and extremely accurate. In the second application, we propose a method called SADDLES to estimate the degree distribution when we are given only limited access to the graph, and accessing the graph is costly. We assume we have access to uniform at random (u.a.r ) vertex, u.a.r neighbor of a vertex and the degree of the vertex. We compare SADDLES with many other state-of-the-art methods and demonstrate that SADDLES requires far fewer samples to achieve the same degree of accuracy.    

Story Forest: Organizing Massive News Documents via AI and Natural Language Processing

Di Niu

Friday, May 4th, 2018, 11h, salle 26-00/428, Campus Jussieu

Slides

I will describe our recent experience of implementing a news content organization system in collaboration with Tencent that can discover hot events from vast streams of breaking news and connect events into stories for easy viewing. Our real-world system has distinct requirements in contrast to previous studies on document topic modeling and detection, in that 1) an event does not only contain articles of a similar topic, but is a cluster of documents that report exactly the same physical incidence; 2) we must evolve news stories in a logical and online manner. In solving these challenges, we propose Story Forest, a state-of-the-art news content organization system based on artificial intelligence and natural language processing. I will briefly describe the key enabling technologies in Story Forest, including identifying the relationship between text objects, e.g., whether they talk about the same event or whether one article is a follow-up of another, based on deep learning. Our system has been deployed in Tencent QQ Browser mobile app.

OLCPM: An Online Framework for Detecting Overlapping Communities in Dynamic Social Networks

Souâad Boudebza, Rémy Cazabet, Faiçal Azouaou and Omar Noual

Computer Communications, Elsevier, In press, 2018.

Community structure is one of the most prominent features of complex networks. Community structure detection is of great importance to provide insights into the network structure and functionalities. Most proposals focus on static networks. However, finding communities in a dynamic network is even more challenging, especially when communities overlap with each other. In this article , we present an online algorithm, called OLCPM, based on clique percolation and label propagation methods. OLCPM can detect overlapping communities and works on temporal networks with a fine granularity. By locally updating the community structure, OLCPM delivers significant improvement in running time compared with previous clique percolation techniques. The experimental results on both synthetic and real-world networks illustrate the effectiveness of the method.

Download

When Diversity Is Needed… But Not Expected!

Sylvain Castagnos

Thursday, May 17th, 2018, 11h, salle 26-00/101, Campus Jussieu

Slides

Recent studies have highlighted the correlation between users’ satisfaction and diversity within recommenders, especially the fact that diversity increases users’ confidence when choosing an item. Understanding the reasons of this positive impact on recommenders is now becoming crucial. Based on this assumption, we designed a user study that focuses on the utility of this new dimension, as well as its perceived qualities. This study has been conducted on 250 users and it compared 5 recommendation approaches, based on collaborative filtering, content-based filtering and popularity, along with various degrees of diversity. Results show that, when recommendations are made explicit, diversity may reduce users’ acceptance rate. However, it helps increasing users’ satisfaction. Moreover, this study highlights the need to build users’ preference models that are diverse enough, so as to generate good recommendations.

Enumerating maximal cliques in link streams with durations

Tiphaine Viard, Clémence Magnien, and Matthieu Latapy

Information Processing Letters 133 (2018), p. 44-48

Link streams model interactions over time, and a clique in a link stream is defined as a set of nodes and a time interval such that all pairs of nodes in this set interact permanently during this time interval. This notion was introduced recently in the case where interactions are instantaneous. We generalize it to the case of interactions with durations and show that the instantaneous case actually is a particular case of the case with durations. We propose an algorithm to detect maximal cliques that improves our previous one for instantaneous link streams, and performs better than the state of the art algorithms in several cases of interest.

Download

Discovering Patterns of Interest in IP Traffic Using Cliques in Bipartite Link Streams

Tiphaine Viard, Raphaël Fournier-S’niehotta, Clémence Magnien and Matthieu Latapy

International Conference on Complex Networks (CompleNet), 2018.

Studying IP traffic is crucial for many applications. We focus here on the detection of (structurally and temporally) dense sequences of interactions, that may indicate botnets or coordinated network scans. More precisely, we model a MAWI capture of IP traffic as a link streams, i.e. a sequence of interactions (t1,t2,u,v) meaning that devices u and v exchanged packets from time t1 to time t2 . This traffic is captured on a single router and so has a bipartite structure: links occur only between nodes in two disjoint sets. We design a method for finding interesting bipartite cliques in such link streams, i.e. two sets of nodes and a time interval such that all nodes in the first set are linked to all nodes in the second set throughout the time interval. We then explore the bipartite cliques present in the considered trace. Comparison with the MAWILab classification of anomalous IP addresses shows that the found cliques succeed in detecting anomalous network activity.

Stream Graphs and Link Streams for the Modeling of Interactions over Time

Matthieu Latapy, Tiphaine Viard and Clémence Magnien

Social Networks Analysis and Mining, 8: 61, 2018

Graph theory provides a language for studying the structure of relations, and it is often used to study interactions over time too. However, it poorly captures the both temporal and structural nature of interactions, that calls for a dedicated formalism. In this paper, we generalize graph concepts in order to cope with both aspects in a consistent way. We start with elementary concepts like density, clusters, or paths, and derive from them more advanced concepts like cliques, degrees, clustering coefficients, or connected components. We obtain a language to directly deal with interactions over time, similar to the language provided by graphs to deal with relations. This formalism is self-consistent: usual relations between different concepts are preserved. It is also consistent with graph theory: graph concepts are special cases of the ones we introduce. This makes it easy to generalize higher-level objects such as quotient graphs, line graphs, k-cores, and centralities. This paper also considers discrete versus continuous time assumptions, instantaneous links, and extensions to more complex cases.

Comparaison de mesures de centralité basées sur les plus courts chemins dans les réseaux dynamiques

Marwan Ghanem , Clémence Magnien, Fabien Tarissan

Extraction et Gestion de Connaissances (EGC 2018)

Définir l’importance des noeuds dans les réseaux statiques est une question de recherche très étudiée depuis de nombreuses années. Dernièrement, des adaptations des métriques classiques ont été proposées pour les réseaux dynamiques. Ces méthodes reposent sur des approches très différentes dans leur façon d’évaluer l’importance des noeuds à un instant donné. Il est donc nécessaire de pouvoir les évaluer et les comparer. Dans cet article, nous comparons trois approches existes pour mieux comprendre ce qui les différencie. Nous montrons que la nature des jeux de données influe grandement sur le comportement des méthodes, et que pour certains d’entre eux, la notion d’importance n’est pas toujours  pertinente.

Culture et (in)dépendance – Les enjeux de lindépendance dans les industries culturelles

Olivier Alexandre, Sophie Noël and Aurélie Pinto

ICCA – Industries culturelles, création, numérique, 2017

« Indépendant », « alternatif », « indie », « underground », « avant-garde », « de création »… Depuis les années 1970, la revendication d’indépendance a pris une importance grandissante dans les univers de production culturelle. Qu’elle se rapporte à des contenus, des méthodes de travail ou des dispositifs de médiation, cette revendication propose une alternative à la domination des groupes et des productions mainstream. Son succès conduit cependant à s’interroger sur la cohérence même d’une notion progressivement transformée en label de qualité. À travers douze contributions traitant de l’édition, du cinéma, de la musique, des médias et de la vulgarisation scientifique, cet ouvrage montre en quoi l’indépendance relève d’une construction sociale tributaire de son environnement institutionnel et marchand. Des ondes aux écrans, de l’Europe aux États-Unis, des managers aux artistes, il met en évidence le balancement entre artisanat de création et recherche d’une structuration économique pérenne. En mettant à distance la dénonciation ritualisée de l’hégémonie des majors et autres « grands groupes » et en s’appuyant sur des terrains ancrés dans différents contextes nationaux, ce livre fait le pari d’une approche transversale pour mieux saisir la manière dont l’indépendance irrigue et structure des filières trop souvent envisagées de manière cloisonnée. Il éclaire ainsi une catégorie de référence des industries culturelles paradoxalement peu étudiée par les sciences sociales, et permet de saisir l’évolution des rapports de force dans des secteurs confrontés à une rationalisation économique et à des mutations technologiques de grande ampleur.

Using network analysis to identify generic legal trends in European Human Rights Law

Henrik Palmer Olsen

Jeudi 16 Novembre 2017, 14h, salle 26-00/332, Campus Jussieu

This presentation explores how the complete collected set of judgments from the European Court of Human Rights can be presented as a network of case to references, and how this network can be analysed via the use of already known network analysis approaches. The presentation will first show how network analysis can be used to calculate the main legal content of a case (i.e. which legal right the case is mostly concerned with and for which the case is therefore a precedent). It will then move on to showing how, Page-rank of a case in the total network of cases can be compared to the Page-rank of the same case in its local network and how this comparison can be used to further calculate generic centrality in the overall network. In using this novel approach, it is possible to identify cases that have strong precedent, not for a specific right, but for more generic legal content that the Court uses in dealing with applications from citizens.

Tracking bitcoin users activity using community detection on a network of weak signals.

Rémy Cazabet, Rym Baccour and Matthieu Latapy

The 6th International Conference on Complex Networks and Their Applications, Nov 2017, Lyon, France.

Abstract Bitcoin is a cryptocurrency attracting a lot of interest both from the gen- eral public and researchers. There is an ongoing debate on the question of users? anonymity: while the Bitcoin protocol has been designed to ensure that the activ- ity of individual users could not be tracked, some methods have been proposed to partially bypass this limitation. In this article, we show how the Bitcoin transac- tion network can be studied using complex networks analysis techniques, and in particular how community detection can be efficiently used to re-identify multiple addresses belonging to a same user.

Download

Dynamic Community Detection

Cazabet, Rémy, Giulio Rossetti, and Fredéric Amblard. 

Encyclopedia of Social Network Analysis and Mining, 2017

Impact of temporal features of cattle exchanges on the size and speed of epidemic outbreaks

Aurore Payen, Lionel Tabourier and Matthieu Latapy

ICCSA 2017, Part II, LNCS 10405 proceedings

Databases recording cattle exchanges offer unique opportunities for a better understanding and fighting of disease spreading. Most studies model contacts with (sequences of) networks, but this approach neglects important dynamical features of exchanges, that are known to play a key role in spreading. We use here a fully dynamic modeling of contacts and empirically compare the spreading outbreaks obtained with it to the ones obtained with network approaches. We show that neglecting time information leads to significant over-estimates of actual sizes of spreading cascades, and that these sizes are much more heterogeneous than generally assumed. Our approach also makes it possible to study the speed of spreading, and we show that the observed speeds vary greatly, even for a same cascade size.

Download

Combining structural and dynamic information to predict activity in link streams

Thibaud Arnoux, Lionel Tabourier and Matthieu Latapy

In proceedings of the International Symposium on Foundations and Applications of Big Data Analytics (FAB), in conjunction with ASONAM, 2017.

A link stream is a sequence of triplets (t, u, v) meaning that nodes u and v have interacted at time t. Capturing both the structural and temporal aspects of interactions is crucial for many real world datasets like contact between individuals. We tackle the issue of activity prediction in link streams, that is to say predicting the number of links occurring during a given period of time and we present a protocol that takes advantage of the temporal and structural information contained in the link stream. We introduce a way to represent the information captured using different features and combine them in a prediction function which is used to evaluate the future activity of links.

Download

Ego-betweenness centrality in link streams

Marwan Ghanem, Florent Coriat and Lionel Tabourier

6th Workshop on Social Network Analysis in Applications (SNAA 2017), in conjunction with ASONAM, 2017.

The ability of a node to relay information in a network is often measured using betweenness centrality. In order to take into account the fact that the role of the nodes vary through time, several adaptations of this concept have been proposed to time- evolving networks. However, these definitions are demanding in terms of computational cost, as they call for the computation of time-ordered paths. We propose a definition of centrality in link streams which is node-centric, in the sense that we only take into account the direct neighbors of a node to compute its centrality. This restriction allows to carry out the computation in a shorter time compared to a case where any couple of nodes in the network should be considered. Tests on empirical data show that this measure is relatively highly correlated to the number of times a node would relay information in a flooding process. We suggest that this is a good indication that this measurement can be of use in practical contexts where a node has a limited knowledge of its environment, such as routing protocols in delay tolerant networks.

Download