Temporal Patterns of Pedophile Activity in a P2P Network: First Insights about User Profiles from Big Data

Raphaël Fournier and Matthieu Latapy

International Journal of Internet Science ARTICLE IN PRES S 2015, 10 (1), ISSN 1662-5544

Recent studies have shown that child abuse material is shared through peer-to-peer (P2P) networks, which allow users to exchange files without a central server. Obtaining knowledge on the extent of this activity has major consequences for child protection, policy making and Internet regulation. Previous works have developed tools and analyses to provide overall figures in temporally-limited measurements. Offenders’ behavior is mostly studied through small-scale interviews and there is few information on the times at which they engage in such activity. Here we show that the proportion of search-engine queries for pedophile content gradually has grown by a factor of almost 3 in three years. We also find that during the day, certain hours are, on average, privileged by seekers. Our results demonstrate that P2P networks are actively used to search for pedophile content and we find new and large-scale results on pedophile offenders’ profile, indicating that a substantial proportion is well-integrated into family life and professional work activities.

Download

Densest subgraph computation and applications in finding events on social media

Oana Bllu

Vendredi 27 novembre 2015 à 11h, salle 24-25/405

Slides

Finding dense subgraphs in large graphs is a key primitive in a variety of real-world application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which might correspond to communities in social networks or interesting events. In this talk we present a natural generalization of the densest subgraph problem, where the main goal is to find at most k subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. We will also illustrate how finding dense subgraphs can be an important subroutine for event detection in social media. Social media has great potential, as apart from the traditional media sources, many users post updates on different events. The highly dynamic nature of social networks gives the benefit of timely updates and the huge amount of content the benefit of diversity and large coverage. However, finding events presents also non-trivial challenges given the large amount of noisy and irrelevant data present in social media.

Graph analysis of functional brain networks: theory, applications and issues

Fabrizio De Vico Fallani

Vendredi 4 décembre 2015 à 11h, salle 26-00/332

Slides

We have known for at least 100 years that the brain is organized as a network of connections between neuronal ensembles. However, only in the last 10 years there has been a rapid growth in our ability to quantify the complex topology of brain networks, using mathematical tools derived from graph theory. In this talk, I will present recent development of graph theoretical approaches to analyze brain networks and model (re)organizational principles of the neural function underlying behavior and outcome, in healthy and diseased conditions. The final part will be devoted to highlight some of the current issues related to complex brain network analysis.

Inferring synaptic connections from spike data of multiple neurons

Ryota Kobayashi

Vendredi 6 novembre 2015 à 11h, salle 26-00/332

Slides

Correlations in neuronal activity are defined as « functional connections » between pairs of neurons. It is still unclear how the functional connectivity is related to underlying (physiological) synaptic connectivity. Here, we develop a coupled escape rate model (CERM) to infer synaptic connections from spike data of multiple neurons. Estimation performance of the proposed method was compared to that of the previous methods by using a simulated data generated by a realistic cortical network model, which consists of thousands of detailed model neurons (Kitano & Fukai, 2007). We conclude that CERM method is a promising method to infer synaptic connections from multiple neural spike data. This is joint work with Katsunori Kitano.

Dependable Issues Resolved with Distributed Streams

Yann Busnel

Mardi 6 octobre 2015 à 11h, salle 24-25/405

The analysis of massive data streams is fundamental in many monitoring applications (e.g, Internet routers). For networks operators, it is a recurrent and crucial issue to determine whether huge data streams, received at their monitored devices, are correlated or not as it may reveal the presence of attacks. First, we propose a metric, called codeviation, that allows to evaluate the correlation between distributed streams. This metric is inspired from classical metric in statistics and probability theory, and as such enables to understand how observed quantities change together, and in which proportion. We then propose to estimate the codeviation in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We give upper and lower bounds on the quality of the codeviation, and provide both local and distributed algorithms that additively approximates the codeviation among data streams using sub-linear space. On the other hand, we consider the problem of identifying global iceberg attacks in massive and physically distributed streams. A global iceberg is a distributed denial of service attack, where some elements globally recur many times across the distributed streams, but locally, they do not appear as a deny of service. A natural solution to defend against global iceberg attacks is to rely on multiple routers that locally scan their network traffic, and regularly provide monitoring information to a server in charge of collecting and aggregating all the monitored information. Any relevant solution to this problem must minimise the communication between the routers and the coordinator, and the space required by each node to analyse its stream. We propose a distributed algorithm that tracks global icebergs on the fly with guaranteed error bounds, limited memory and processing requirements. We present a thorough analysis of our algorithm performance. In particular we derive an optimal upper bound on the number of bits communicated between the multiple routers and the coordinator in presence of an oblivious adversary. Finally, we present the main results of the experiments we have run on a cluster of single-board computers. Those experiments confirm the efficiency and accuracy of our algorithm to track global icebergs hidden in very large input data streams exhibiting different shapes.

Modèles de génération des graphes de collaboration multi-niveaux

Ghislain Romaric Meleu

Vendredi 2 octobre 2015 à 11h, salle 24-25/405

Slides

Les entêtes des articles scientifiques permettent de construire trois réseaux de collaborations : les réseaux des auteurs, des laboratoires et des institutions. Les réseaux sont corrélés du fait des relations d’affiliations qui existent entre les acteurs des trois réseaux. Nous appelons réseaux hiérarchiques (ou multi-niveaux) de tels réseaux; les réseaux de niveaux supérieurs étant déduits du réseau de co-publication des auteurs. Nous étudions une généralisation de ces réseaux hiérarchiques où un acteur est affilié à une organisation qui peut elle aussi être affiliée à une organisation de niveau supérieur et ainsi de suite. Les relations entre entités à un même niveau sont déduites de celles existantes entre entités de niveau inférieur. tre capable de générer artificiellement de tels réseaux suppose que l’on comprenne à la fois comment les acteurs (au niveau le plus bas) interagissent, mais aussi la dynamique des affiliations aux organisations. Nous proposons une première analyse de tels réseaux. Nous commençons par construire un modèle de génération de graphes de collaboration (de niveau 0) basé sur l’arrivée de petites cliques. Nous observons expérimentalement que ces réseaux ainsi que les réseaux de niveaux supérieurs déduits à partir d’un modèle d’affiliation par attachement préférentiel sont des réseaux small-world et scale-free. Les démonstrations sont faites pour le niveau 0.

Compact routing, main results and techniques

Christian Glacet

Vendredi 25 septembre 2015 à 11h, salle 24-25/405

Slides

Message routing is a central activity in any interconnection network. Route efficiency and memory requirements are two major central parameters in the design of a routing scheme. Routing along short paths is clearly desirable, and the storage of the routing information at each node must also be limited to allow quick routing decision, fast update, and scalability. There is a trade-off between the route efficiency (measured in terms of stretch) and the memory requirements (measured by the size of the routing tables). For a n nodes network, the classical shortest path routing scheme achieve stretch 1 with n entries per node (router). Compact routing techniques offer different tradeoffs by allowing routing detours. It is known that in order to guaranty that every route has a stretch strictly lower than 2k+1 the routing tables must have size of order n^1/k at least. Is it actually possible to design routing schemes that attain these lower bounds? This is the question I will answer during my talk, explaining in the mean time the main algorithmic ideas used to achieve the currently best known trade-offs. Finally I’ll also introduce the more specific problem of compact routing in « internet-like » graphs.

Strategic Analysis and Design of Robust and Resilient Interdependent Power and Communication Networks with a New Model of Interdependency

Arunabha Sen

Vendredi 11 septembre 2015 à 11h, salle 26-00/332

Slides

The critical infrastructures of the nation such as the power grid and the communication network are highly interdependent. Recognizing the need for a deeper understanding of the interdependency in a multi-layered network, significant efforts have been made in the research community in the last few years to achieve this goal. Accordingly a number of models have been proposed and analyzed. Unfortunately, most of the models are over simplified and as such they fail to capture the complex interdependency that exists between entities of power grid and communication networks involving a combination of conjunctive and disjunctive relations. To overcome the limitations of existing models, we have recently proposed a new model that is able to capture such complex interdependency relations. Utilizing this model, we have studied a number of problems, including (i) identification the k most vulnerable nodes of an interdependent network, (ii) entity hardening problem, (iii) progressive recovery problem, (iv) targeted attack problem and several others. In this talk, we first present the new model and then discuss several problems that has been studied utilizing this model.

Time Evolution of the Importance of Nodes in dynamic Networks

Clémence Magnien and Fabien Tarissan.

In proceedings of the International Symposium on Foundations and Applications of Big Data Analytics (FAB), in conjunction with ASONAM, 2015.

For a long time now, researchers have worked on defining different metrics able to characterize the importance of nodes in networks. Among them, centrality measures have proved to be pertinent as they relate the position of a node in the structure to its ability to diffuse an information efficiently. The case of dynamic networks, in which nodes and links appear and disappear over time, led the community to propose extensions of those classical measures. Yet, they do not investigate the fact that the network structure evolves and that node importance may evolve accordingly. In the present paper, we propose temporal extensions of notions of centrality, which take into account the paths existing at any given time, in order to study the time evolution of nodes’ importance in dynamic networks. We apply this to two datasets and show that the importance of nodes does indeed vary greatly with time. We also show that in some cases it might be meaningless to try to identify nodes that are consistently important over time, thus strengthening the interest of temporal extensions of centrality measures.

Download

A reliable and evolutive web application to detect social capitalists

Nicolas Dugué, Anthony Perez, Maximilien Danisch, Florian Bridoux, Amélie Daviau, Tennessy Kolubako, Simon Munier and Hugo Durbano.

IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2015, Paris. (Demo track paper)

On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in order to gain followers and to be retweeted. Their methods are successful enough to make them appear as influent users. Indeed, applications dedicated to the influence measurement such as Klout and Kred give high scores to most of these users. Meanwhile, their high number of retweets and followers are not due to the relevance of the content they tweet, but to their social capitalism techniques. In order to be able to detect these users, we train a classifier using a dataset of social capitalists and regular users. We then implement this classifier in a web application that we call DDP. DDP allows users to test whether a Twitter account is a social capitalist or not and to visualize the data we use to make the prediction. DDP allows administrator to crawl data from a lot of users automatically. Furthermore, administrators can manually label Twitter accounts as social capitalists or regular users to add them into the dataset. Finally, administrators can train new classifiers in order to take into account the new Twitter accounts added to the dataset, and thus making evolve the classifier with these new recently collected data. The web application is thus a way to collect data, make evolve the knowledge about social capitalists and to keep detecting them efficiently.

Download

Revealing contact patterns among high-school students using maximal cliques in link streams

Tiphaine Viard, Matthieu Latapy, Clémence Magnien

First International Workshop on Dynamics in Networks (DyNo), in conjunction with ASONAM, 2015.

Interaction traces between humans are usually rich in information concerning the patterns and habits of individuals. Such datasets have been recently made available, and more and more researchers address the new questions raised by this data. A link stream is a sequence of triplets (t, u, v) indicating that an interaction occurred between u and v at time t, and as such is a natural representation of these data. We generalize the classical notion of cliques in graphs to such link streams: for a given , a -clique is a set of nodes and a time interval such that all pairs of nodes in this set interact at least every during this time interval. We proceed to compute the maximal -cliques on a real-world dataset of contact among students, and show how it can bring new interpretation to patterns of contact.

Download

Classes of digraphs defined by forbidding induced subdigraphs and their chromatic-number

Pierre Aboulker

Jeudi 09 juillet 2015 à 11h, salle 26-00/332

Slides

A class of graphs is $chi$-bounded if there exists a function f such that for any graph G in the class, $chi(G)$ f ((G)). Gyrfas conjectured that for any tree T, the class of graphs that do not contain T as an induced subgraph is $chi$-bounded. We investigate the oriented analogue of this Conjecture. This a joint work with J. Bang-Jensen, N. Bousquet, P. Charbit, F. Havet, F. Maffray, S. Thomassé and J. Zamora

Inhomogeneous Hypergraphs

lie de Panafieu

Jeudi 02 juillet 2015 à 11h, salle 24-25/405

Slides

We introduce the inhomogeneous hypergraph model. Each edge can contain an arbitrary number of vertices, the vertices are colored, and each edge receives a weight which depends on the colors of the vertices it contains. This model provides a uniform setting to solve problems arising from various domains of computer science and mathematics. We will focus on applications to the enumeration of satisfied and satisfiable instances of Constraint Satisfaction Problems (CSP), and compute the limit probability for a random graph to be bipartit, the limit probability of satisfiability of systems of equations, the enumeration of properly k-colored graphs and investigate some graphs coming from social networks. We will present results on the asymptotics of inhomogeneous hypergraphs and their typical structure. Our main tool is analytic combinatorics.

Emergence of Superpeer Networks: A New Perspective

Bivas Mitra

Jeudi 11 juin 2015 à 11h, salle 24-25/405

Slides

In superpeer based p2p networks, the superpeer nodes are discovered through the process of bootstrapping, whereby resourceful peers get upgraded to superpeers. However, bootstrapping is influenced by several factors like limitation on the maximum number of connections a peer can have due to bandwidth constraints, limitation on the availability of information of existing peers due to cache size constraints and also by the attachment policy of the newly arriving peers to the resourceful peers. In this talk we propose an analytical framework which explains the emergence of superpeer networks on execution of the commercial peer-to-peer bootstrapping protocols by incoming nodes. Bootstrapping protocols exploit physical properties of the online peers like resource content, processing power, storage space, connectivity etc as well as take the finiteness of bandwidth of each online peer into consideration. With the help of rate equations, we show that execution of these protocols results in the emergence of superpeer nodes in the network – the exact degree distribution is evaluated. We also show that the cache parameters must also be suitably tuned so as to increase the fraction of superpeers in the network. We validate the developed framework through extensive simulation. The analysis of the results shows that the amount of superpeers produced in the network depends on the protocol as well as the properties of the joining nodes. As an application study, we show that our framework can explain the topological configuration of commercial Gnutella networks.

(Some) Social aspects of location privacy

Kévin Huguenin

Jeudi 18 juin 2015 à 11h, salle 24-25/405

Smartphones hold and collect ever-increasing amounts of personal and contextual data about their owners. This information is often shared on social networks or sent to on-line service providers, in order to benefit from social and context-aware features. This, however, raises serious privacy issues. My talk will revolve around the human and social aspect of privacy: I will discuss the privacy implications of the relationship between location, co-location (e.g., name tagging in posts and photos on mobile social networks) and social ties and the implications of the use of generalization-based privacy protection techniques on the utility of location-based services.

Suppression Distance Computation for Hierarchical Clusterings

François Queyroi, Sergey Kirgizov

Information Processing Letters, Volume 115, Issue 9, September 2015, Pages 689–693 http://www.sciencedirect.com/science/article/pii/S0020019015000678

We discuss  the computation  of a  distance between  two hierarchical clusterings of the  same set. It is defined as  the minimum number of  elements that  have to  be removed so  the remaining  clusterings are  equal. The problem of distance  computing was extensively studied for partitions. We prove it can be  solved in polynomial time in the case of hierarchies  as it gives  birth to a  class of perfect  graphs. We  also  propose an  algorithm  based on  recursively computing  maximum assignments.

Download

Analyse multi-échelles des systèmes complexes: théorie de l’information et optimisation combinatoire

Robin Lamarche-Perrin

Vendredi 15 mai 2015 à 11h, salle 24-25/405

Slides

Entre processus microscopiques et phénomènes émergents, l’analyse des systèmes complexes ne peut véritablement se passer d’une approche multi-échelles. Nous abordons en particulier deux problèmes : celui de la représentation multi-échelles de données structurées et celui de la prédiction multi-échelles des systèmes dynamiques. Premièrement, nous proposons d’exploiter des mesures classiques développées en théorie de l’information pour évaluer la pertinence des niveaux de représentation/prédiction en termes d’information et de complexité(gain d’entropie, divergence de Kullback-Leibler, Information Bottleneck). Le problème de la représentation/prédiction optimale est donc formalisé sous la forme d’un problème d’optimisation combinatoire à deux objectifs, visant à extraire et agréger l’information disponible au niveau microscopique pour apporter des éléments d’analyse macroscopique. Deuxièmement, afin d’engendrer des représentations exploitables en pratiques par les experts, nous garantissons leur cohérence avec les connaissances /a priori/ du système en contraignant l’espace de recherche du problème d’optimisation à partir des propriétés structurelles du système. Si le problème d’optimisation correspondant est NP-completen général, nous proposons néanmoins des algorithmes polynomiaux dans le cas de structures particulières (hiérarchies, ensembles d’intervalles, produit cartésien des deux structures). Cette approche est notamment appliquée à l’agrégation spatio-temporelle de données médiatiques pour l’analyse multi-échelles des relations internationales, et à la prédiction des dynamiques multi-échelles dans un modèle classique de diffusion d’opinion (Voter Model).

Calcul de cliques maximales dans les flots de liens

Tiphaine Viard, Matthieu Latapy et Clémence Magnien

ALGOTEL 2015 — 17èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications, juin 2015, Beaune, France

Un flot de liens est une séquence de triplets (t,u,v), signifiant que u et v ont interagi au temps t. Nous généralisons la notion de cliques à ces flots de liens : pour un delta donné, une delta-clique est un ensemble de nœuds et un intervalle de temps, tels que toutes les paires de nœuds dans cet ensemble interagissent au moins tous les delta sur cet intervalle. Nous proposons un premier algorithme permettant d’énumérer les delta-cliques dans un flot de liens.

Download

Graph Similarity Using Network Motif Profiles

Pdraig Cunningham

Jeudi 23 avril 2015 à 11h, salle 25-26/101

As we recognise the importance of information in graph or network format it is useful to be able to quantify how similar one graph is to another. This is particularly useful in the context of ego networks, the local networks around a nodes. This seminar focuses on network motif profiles (i.e. counts of specific network motifs) to characterise networks in much the same way that a bag-of-words strategy allows text documents to be compared in a vector space framework. This strategy has a long history in social science and bioinformatics and has considerable potential in social network analysis. The seminar will outline the general strategy and present case-studies from Wikipedia, YouTube spam campaigns and peer to peer lending in the Prosper marketplace.

Flash crashes, VPIN metric and market topology considerations

Alexandru Stan

Jeudi 12 mars 2015 à 11h, salle 24-25/405

This presentation aims to shed some light into the dynamics of market flash crashes by presenting several volume time models for the price evolution of financial assets. The theoretical foundation of our modeling relies upon the hypothesis that the flash crash represents the immediate repercussion of growing levels of order flow toxicity, as informed traders adversely select uninformed traders and, progressively, force them out of the market. We model the unfolding of this hostile behavior through a classic stochastic prey-predator model which segregates market transactions into toxic and harmless. Assuming that the price is following an Ito-process, we show how to predict the flash crash conditions during the flash crash unfolding. We also notice structural breaks in the dynamics of the market topology.