Tracking the evolution of temporal patterns of usage in bicycle-Sharing systems using nonnegative matrix factorization on multiple sliding windows

Remy Cazabet, Pablo Jensen, Pierre Borgnat

International Journal of Urban Sciences, 2017, p. 1-15

Bicycle-Sharing Systems (BSS) are growing quickly in popularity all over the world. In this article, we propose a method based on Nonnegative Matrix Factorization to study the typical temporal patterns of usage of the BSS of Lyon, France, by studying logs of rentals. First, we show how this approach allows us to understand the spatial and temporal usage of the system. Second, we show how we can track the evolution of these temporal patterns over several years, and how this information can be used to better understand the BSS, but also changes in the city itself, by considering the stations as social sensors.

Download

Gendarmes, Voleurs et Topologie algébrique

David Ellison

Jeudi 15 Juin 2017 à 11h, Salle 24-25/405, Campus Jussieu

Le jeu du gendarme et du voleur, introduit par Alain Quilliot dans sa thèse en 1978, est un jeu à deux joueurs sur un graphe. Le gendarme commence en choisissant son point de départ sur un sommet du graphe ; puis le voleur choisit le sien. Ensuite, ils se déplacent chacun leur tour le long des arêtes du graphe. La question est de savoir si le gendarme a une stratégie qui lui permet d’attraper le voleur. Dans le cas contraire, la question devient : combien faut-il de gendarmes pour attraper le voleur ? Quilliot a démontré dans sa thèse qu’un seul gendarme suffit à attraper le voleur si et seulement si le graphe est démontable, c’est-à-dire si et seulement si on peut le réduire à un seul sommet en retirant successivement des sommets où le voleur peut être coincé. Il s’ensuit que les graphes démontables correspondent à la classe d’homotopie du point, et que certains invariants homotopiques, comme les groupes d’homologie, permettent de découvrir des propriétés structurelles des graphes où le gendarme peut attraper le voleur.

Community detection in attributed graphs

Christine Largeron

Mardi 25 Avril 2017 à 11h, Salle 24-25/405, Campus Jussieu

A wide variety of applications require data modeled by a graph of interconnected nodes, known as social networks or information networks. When nodes are described by attributes, this is an attributed network.Examples of such networks are citation or collaboration networks or social media. One fundamental property of these networks is that they tend to naturally form communities and, many mining methods have been proposed to identify these communities but these methods are typically basedsolely on relationships between nodes in the graph. This is notably the case of Louvain, a greedy agglomerative clustering method which optimizes the modularity measure. Introduced by Newman, the modularity allows to estimate the quality of a partition butwithout taking into account the attributes associated with the nodes. For this reason, we have designed a complementary measure, based on inertia and specially conceived to evaluate the quality of a partition basedon real attributes describing the vertices. We have also proposed I-Louvain, a method for community detection in attributed graphs where real attributes are associated with the vertices. Our experiments showedthat combining the relational information with the attributes allows to detect the communities more efficiently with poor data.

Structure and dynamics of multiplex networks

Federico Battiston

11 Avril 2017, 11h, Salle 24-25/405

Many real-world complex systems consist of a set of elementary units connected by relationships of different kinds. All such systems are better described in terms of multiplex networks, where the links at each layer represent a different type of interaction between the same set of nodes, rather than in terms of (single-layer) networks. In this talk I will introduce multiplex networks and several metrics to measure multiplexity at different scales. Measures are validated and applied to real-world systems with examples from collaboration networks, terrorist networks and the brain. I will also show how multiplexity can produce the emergence of qualitatively novel dynamical behavior, focusing on the case of social dynamics. Resume I am currently a PostDoc in the Aramis Lab at the Brain & Spine Institute working with Mario Chavez and Fabrizio De Vico Fallani. I am also the Chair of the Young Researchers Network on Complex Systems, the younger branch of the Complex Systems Society. I earned my PhD in applied mathematics from Queen Mary University of London working in the complex systems and networks group of Vito Latora and as a part of the European Project LASAGNE on multilayer networks. Before that, I got an undergraduate and a master degree in theoretical physics from Sapienza University of Rome. You can find more information about me at: http://www.maths.qmul.ac.uk/~battiston/

Contribution à la qualité des informations dans les réseaux sociaux: Identifier et analyser les motifs récurrents pour détecter les phénomènes sociaux

Mezghani Manel

16 Mars 2017 – Salle 24-25/405

Lanalyse des données issues des réseaux sociaux pose des questions au niveau de la qualité des informations (cest à dire des informations identifiées, fiables et exactes) car on y trouve par exemple du spam social et des rumeurs, ce qui accroît la difficulté dobtenir des informations précises et pertinentes. Mon projet de recherche est axé sur la qualité des données dans les réseaux sociaux. Cette direction de recherche est très intéressante car elle touche différents domaines et peut être utile dans beaucoup dapplications. En effet, pouvoir évaluer la qualité de linformation peut être utilisé par exemple pour éliminer des spams/spammeurs, pour éviter danalyser des rumeurs, etc. Ceci permet par exemple davoir un meilleur résultat de recherche, déviter de mauvaises recommandations de produits ou de diffuser une bonne information aux utilisateurs. Une voie pertinente pour contribuer à lamélioration de la qualité de linformation et que je compte explorer est la détection et lanalyse de motifs récurrents des phénomènes sociaux (par exemple: spam, rumeur) au cours du temps dans les graphes de données issus des réseaux sociaux. Ceci permet de filtrer des données erronées et ainsi elles contribuent à fournir des données nettoyées qui peuvent être utilisées dans dautres applications. Ce séminaire sera loccasion de présenter mes travaux de recherche associé à lanalyse des réseaux sociaux; et de discuter de mon projet de recherche.

Time Weight Content-Based Extensions of Temporal Graphs for Personalized Recommendation

Armel Jacques Nzekon Nzeko’o, Maurice Tchuente, Matthieu Latapy

In WEBIST 2017

Recommender systems are an answer to information overload on the web. They filter and present to customer, a small subset of items that he is most likely to be interested in. Since user’s interests may change over time, accurately capturing these dynamics is important, though challenging. The Session-based Temporal Graph (STG) has been proposed by Xiang et al. to provide temporal recommendations by combining long- and short-term preferences. Later, Yu et al. have introduced an extension called Topic-STG, which takes into account topics extracted from tweets’ textual information. Recently, we pushed the idea further and proposed Content-based STG. However, in all these frameworks, the importance of links does not depend on their arrival time, which is a strong limitation: at any given time, purchases made last week should have a greater influence than purchases made a year ago. In this paper, we address this problem by proposing Time Weight Content-based STG, in which we assign a time-decreasing weight to edges. Using Time-Averaged Hit Ratio, we show that this approach outperforms all previous ones in real-world situations.

Download

Using Degree Constrained Gravity Null-Models to understand the structure of journeys networks in Bicycle Sharing Systems

Remy Cazabet, Pierre Borgnat, Pablo jensen

In ESANN 2017

Bicycle Sharing Systems are now ubiquitous in large cities around the world. In most of these systems, journeys’ data can be ex- tracted, providing rich information to better understand it. Recent works have used network based-machine learning, and in particular space-corrected node clustering, to analyse such datasets. In this paper, we show that spatial-null models used in previous methods have a systematic bias, and we propose a degree-contrained null-model to improve the results. We finally apply the proposed method on the BSS of a city.

Download

Enhancing Space-Aware Community Detection Using Degree Constrained Spatial Null Model

Remy Cazabet, Pierre Borgnat, Pablo Jensen

In Complenet 2017

Null models have many applications on networks, from testing the significance of observations to the conception of algorithms such as community detection. They usually preserve some network properties , such as degree distribution. Recently, some null-models have been proposed for spatial networks, and applied to the community detection problem. In this article, we propose a new null-model adapted to spatial networks, that, unlike previous ones, preserves both the spatial structure and the degrees of nodes. We show the efficacy of this null-model in the community detection case both on synthetic and collected networks.

Download

Characterising inter and intra-community interactions in link streams using temporal motifs

Jean Creusefond, Remy Cazabet

In Complenet 2017

The analysis of dynamic networks has received a lot of attention in recent years, thanks to the greater availability of suitable datasets. One way to analyze such dataset is to study temporal motifs in link streams , i.e. sequences of links for which we can assume causality. In this article, we study the relationship between temporal motifs and communities , another important topic of complex networks. Through experiments on several real-world networks, with synthetic and ground truth community partitions, we identify motifs that are overrepresented at the frontier –or inside of– communities.

Download

Rigorous Measurement of the Internet Degree Distribution

Matthieu Latapy, Elie Rotenberg, Christophe Crespelle, Fabien Tarissan

In Complex Systems, volume 26, number 1 (2017)

The degree distribution of the internet, i.e. the fraction of routers with k links for any k, is its most studied property. It has a crucial influence on network robustness, spreading phenomena, and protocol design. In practice, however, this distribution is observed on partial, biased and erroneous maps. This raises serious concerns about the true knowledge we actually have of this key property. Here, we design and run a drastically new measurement approach for the reliable estimation of the degree distribution of the internet, without resorting to any map. It consists in sampling random core routers and precisely estimating their degree with probes sent from many monitors scattered over the internet. Our measurement shows that the true degree distribution significantly differs from classical assumptions: it is heterogeneous but it decreases sharply, in a way incompatible with a heavy-tailed power law.

Download

Un nouvel algorithme de clustering pour la détection danomalies

Damien Nogues

salle 24-25/405 à 11h00

La détection dintrusion est une problématique importante en cyber-sécurité. Cependant, de nombreuses approches classiques utilisent une approche à base de règles. Pour chaque attaque, unesignature de détection est créée. Ces approches ne permettent donc pas de détecter de nouvelles attaques, ou des attaques connues mais modifiées. Pour cette raison, nous nous sommesintéressés aux approches de détection danomalies. Ces méthodes utilisent des techniques dapprentissage non supervisé pour détecter des déviations à la norme. Ainsi, les techniques declustering sont fréquemment utilisées. Mais les algorithmes classiques de clustering, tel que k-means, ne sont pas très adaptés à ce problème. En effet, ces derniers nécessitent souvent de fixera priori le nombre de classes désiré. De plus, ces algorithmes sont en général conçus pour traiter des données quantitatives, ou parfois qualitatives. En détection dintrusions, les données sontsouvent hétérogènes. Nous détaillerons donc un nouveau critère de clustering adapté aux données hétérogènes, ainsi quun algorithme doptimisation de ce critère. Enfin, nous verronscomment il est possible dutiliser la partition obtenue pour effectuer une détection danomalies efficace sur des flux réseaux.

Towards understanding and leveraging the structure of real-world graphs

Maximilien Danisch

Campus Jussieu, Salle 26-00/124 à 11h00

Real-world graphs (a.k.a. complex networks) are ubiquitous: the web, Facebook, Twitter, brain networks, protein interaction networks, etc. Studying these graphs and solving computational problems on them (say maximum clique, partitioning or dense subgraph) has applications in many fields.I will first show that the structure of some real-world graphs is special. In particular, they are not adversarial and some difficult problems (say NP-complete or NP-hard problems) can be solved on some huge real-world graphs (say 2G edges or more).I will then present two works along the lines of « understanding and leveraging the structure of real-world graphs in order to build better algorithms »: (i) Enumerating all k-cliques and (ii) Computing the density-friendly graph decomposition.

Coalition games on interaction graphs

Nicolas Bousquet

15 Novembre 2016, salle 26-00/124

We consider cooperative games where the viability of a coalition is determined by whether or not its members have the ability to communicate amongst themselves. This necessary condition for viability was proposed by Myerson and is modeled via an interaction graph; a coalition S of vertices is then viable if and only if the graph induced graph S is connected. The non-emptiness of the core of a coalition game can be tested by a well-known covering LP. Moreover, the integrality gap of its dual packing LP defines exactly the multiplicative least-core and the relative cost of stability of the coalition game. This gap is upper bounded by the packing-covering ratio which is known to be at most the treewidth of the interaction graph plus one. We examine the packing-covering ratio and integrality gaps of graphical coalition games in more detail. First we introduce a new graph parameter, called the vinewidth (a parameter derived from the treewidth), which characterizes the worst packing-covering ratio. Then we will show that this new parameter correctly evaluates both primal and dual integrality gaps. Joint work with Zhentao Li and Adrian Vetta.

Finding remarkably dense sequences of contacts in link streams

Noé Gaumont, Clémence Magnien and Matthieu Latapy

In Social Network Analysis and Mining (2016) 6: 87. doi:10.1007/s13278-016-0396-z

A link stream is a set of quadruplets (b, e, u, v) meaning that a link exists between u and v from time b to time e. Link streams model many real-world situations like contacts between individuals, connections between devices, and others. Much work is currently devoted to the generalization of classical graph and network concepts to link streams. We argue that the density is a valuable notion for understanding and characterizing links streams. We propose a method to capture specific groups of links that are structurally and temporally densely connected and show that they are meaningful for the description of link streams. To find such groups, we use classical graph community detection algorithms, and we assess obtained groups. We apply our method to several real-world contact traces (captured by sensors) and demonstrate the relevance of the obtained structures.

Download

Mining Time-stamped Network Data Spectral Methods and Centrality Measures

Ingo Scholtes

Mercredi 19 Octobre 2016à 11h00 salle 24-25/405

Recent research has highlighted limitations of studying complex systems with time-varying topologies from the perspective of static, time-aggregated networks. Non-Markovian characteristics resulting from the specific ordering of interactions in temporal networks were identified as one important mechanism that alters causality and affects dynamical processes. So far, an analytical explanation for this phenomenon and for the significant variations observed across different systems is missing. Summarizing our recent research in this area, in this talk I will introduce a framework that allows to analyze temporal networks with non-Markovian characteristics. The framework is based on higher-order aggregate networks, a simple generalization of the commonly used static representation of temporal network data. I will show that spectral properties of such higher-order aggregate networks can explain the slow-down of diffusion processes compared to aggregate networks, which has been observed in a number of empirical data sets. I further show that we can derive an exact analytical prediction for the magnitude of this change compared to the weighted, time-aggregate network. I finally present recent results on the analysis of node centralities in non-Markovian temporal networks, concluding that this approach provides interesting perspectives for (i) temporal community detection by spectral clustering, (ii) refined measures of centrality for time-evolving networks, and (iii) analytical studies of dynamical processes in complex systems with time-evolving interaction topologies.

On the notion of hyperbolicity in graphs

David Coudert

Salle 24-25/405

The Gromov hyperbolicity is an important parameter for analyzing complex networks since it is a measure of the tree-likeness of a graph from a metric perspective. This notion has been recently applied in different contexts, such as the design of routing schemes, network security, computational biology, the analysis of graph algorithms, and the classification of complex networks. For instance, it gives bounds on the best possible stretch of some greedy-routing schemes in Internet-like graphs. The best known algorithm for computing hyperbolicity has running-time O(n^{3.69}), which is clearly prohibitive for big graphs. In this talk, we will present recent advances on the computation of this parameter. We will present an algorithm that performs well in practice. Although its time complexity is in O(n^4), it can compute the hyperbolicity of graphs with 100.000 nodes in short time. We will discuss the limitations of this algorithm and related open problems. We will also provide some results on the time complexity lower bound. In addition, we will show how to use some simple properties of the graph to get tight bounds on its hyperbolicity, e.g., being bipartite, a line graph, vertex transitive, etc. These properties are used for establishing bounds on the hyperbolicity of various interconnection network topologies proposed for large data centers.

Classification of online discussions: is tree structure sufficient enough?

Mattias Mano

Vendredi 30 Septembre à 10h30, Salle 26-00/332

Slides

Over the past years, online communities have considerably grown. Especially, a new kind of forums emerged: the « Questions and Answers » (Q&A) forums. They cover lots of different fields, from opinion questions (such as Yahoo! Answers, Reddit, Quora, …) to really technical questions in Computer Science (Stack Overflow, Bugzilla, …) or in Mathematics (Math Overflow). We study the opinion forum « Reddit – Change My View ». Participants debate on society subject. Using graph theory modeling, I wonder if structural informations of the discussion (graph) are sufficient enough to be characteristic of what happen in the discussion.

P2PTV Multi-channel Peers Analysis

Marwan Ghanem, Olivier Fourmaux, Fabien Tarissan and Takumi Miyoshi

In The 18th Asia-Pacific Network Operations and Management Symposium (APNOMS’16), Kanazawa, Japan, 2016.

After being the support of the data and voice convergence, the Internet has become one of the main video providers such as TV-stream. As an  alternative to limited or expensive technologies, P2PTV has turned out to be a promising support for such applications. This infrastructure strongly relies on the overlay composed by the peers that consume and diffuse video contents at the same time. Understanding the dynamical properties of this overlay, and in particular how the users switch from one overlay to another, appears to be a key aspect if one wants to improve the quality of P2PTV. In this paper, we investigate the question of relying on non-invasive measurement techniques to track the presence of users on several channels of P2PTV. Using two datasets obtained by using network measurement on P2PTV infrastructure, we show that such an approach contains sufficient information to track the presence of users on several channels. Besides, exploiting the view provided by sliding time windows, we are able to refine the analysis and track users that switch from one channel to another, leading to the detection of super-peers and providing explanations of the different roles they can play in the infrastructure. In addition, by comparing the results obtained on the two datasets, we show how such analyses can shed some light on the evolution of the infrastructure policy.

Download

Predicting links in ego-networks using temporal information

Lionel Tabourier, Anne-Sophie Libert and Renaud Lambiotte

In EPJ Data Science (2016) 5: 1

Link prediction appears as a central problem of network science, as it calls for unfolding the mechanisms that govern the micro-dynamics of the network. In this work, we are interested in ego-networks, that is the mere information of interactions of a node to its neighbors, in the context of social relationships. As the structural information is very poor, we rely on another source of information to predict links among egos’ neighbors: the timing of interactions. We define several features to capture different kinds of temporal information and apply machine learning methods to combine these various features and improve the quality of the prediction. We demonstrate the efficiency of this temporal approach on a cellphone interaction dataset, pointing out features which prove themselves to perform well in this context, in particular the temporal profile of interactions and elapsed time between contacts.

Download

Characterizing and predicting mobile application usage

Keun-Woo Lim, Stefano Secci, Lionel Tabourier and Badis Tebbani

In Computer Communications, 2016, vol. 95, p. 82-94

In this paper, we propose data clustering techniques to predict temporal characteristics of data consumption behavior of different mobile applications via wireless communications. While most of the research on mobile data analytics focuses on the analysis of call data records and mobility traces, our analysis concentrates on mobile application usages, to characterize them and predict their behavior. We exploit mobile application usage logs provided by a Wi-Fi local area network service provider to characterize temporal behavior of mobile applications. More specifically, we generate daily profiles of “what” types of mobile applications users access and “when” users access them. From these profiles, we create usage classes of mobile applications via aggregation of similar profiles depending on data consumption rate, using three clustering techniques that we compare. Furthermore, we show that we can utilize these classes to analyze and predict future usages of each mobile application through progressive comparison using distance and similarity comparison techniques. Finally, we also detect and exploit outlying behavior in application usage profiles and discuss methods to efficiently predict them.

Download