Detection and classification of network traffic anomalies

Johan Mazel

Mercredi 8 juin 2016 à 11h, salle 24-25/405

Internet plays a central role in our lives. However, it is an extremely diverse and complex system. Ranging from non-malicious unexpected events such as flash-crowds and failures, to network attacks such as denials-of-service and network scans, network traffic anomalies can have serious detrimental effects on the performance and integrity of the network. Anomaly detection is thus paramount in order to guarantee users’ access to Internet resources. In this talk, we will address recent advances in network traffic anomaly detection and classification, that leverage graph analysis techniques, machine learning techniques and big data.

Suppressing diffusion processes on arbitrary networks using treatment resources of limited efficiency

Argyris Kalogeratos

Lundi 13 juin 2016 à 11h, salle 26-00/332

Slides

In many real-life situations, it is critical to dynamically suppress or remove an undesired diffusion process (viruses, information, behaviors, etc.). The talk will present a framework for Dynamic Resource Allocation (DRA) assuming a continuous-time SIS epidemic model, and that a budget of treatment resources of limited efficiency are at the disposal of authorities. Special emphasis will be given on the macro- and microscopic (or local) properties of the network structure for the problem and two strategies will be presented that fall in this framework: a) a simple yet effective greedy approach, and b) a more sophisticated one that uses a precomputed priority plan of how the healing strategy should proceed on a specific network. Additionally, extensions in competitive scenarios will be discussed.

Kempe equivalence of colourings

Marthe Bonamy

Vendredi 3 juin 2016 à 11h, salle 24-25/405

Slides

Given a colouring of a graph, a Kempe change is the operation of picking a maximal bichromatic subgraph and switching the two colours in it. Two colourings are Kempe equivalent if they can be obtained from each other through a series of Kempe changes. Kempe changes were first introduced in a failed attempt to prove the Four Colour Theorem, but they proved to be a powerful tool for other colouring problems. They are also relevant for more applied questions, most notably in theoretical physics. Consider a graph with no vertex of degree more than some integer D. In 2007, Mohar conjectured that all its D-colourings are Kempe-equivalent. Feghali, Johnson and Paulusma proved in 2015 that this is true for D=3, with the exception of one single graph which disproves the conjecture in its generality. We settle the remaining cases by proving the conjecture holds for every integer D at least 4. This is a joint work with Nicolas Bousquet (LIRIS, Ecole Centrale Lyon, France), Carl Feghali (Durham University, UK) and Matthew Johnson (Durham University, UK).

Temporal density of complex networks and ego-community dynamics

Sergey Kirgizov

Lundi 04 julliet 2016 à 11h, salle 24-25/405

Slides

At first, we say that a ego-community structure is a probability measure defined on the set of network nodes. Any subset of nodes may engender its own ego-community structure around. Many community detection algorithms can be modified to yield a result of this type, for instance, the personalized pagerank. We also recall that community detection algorithms (including personalized pagerank) can be viewed from different perspectives: random walks, convergence of markov chain, spectral clustering, optimization, mincut(s), discrete cheeger inequality(ies), etc. Next, we present a continuous version of Viard-Latapy-Magnien link streams, that we call « temporal density ». Classical kernel density estimation is used to move from discrete link streams towards their continuous counterparts. Using matrix perturbation theory we can prove that ego-community structure changes smoothly when the network evolves smoothly. This is very important, for example, for visualization purposes. Combining the temporal density and personalized pagerank methods, we are able to visualize and study the evolution of the ego-community structures of complex networks with a large number of temporal links in order to extract interacting information. For example, we can detect events, trace the evolution of (ego-)community structure, etc. We illustrate and validate our approach using « Primary school temporal network data » provided by sociopatterns.org, and we show how the temporal density can be applied to the study of very large datasets, such as a collection of tweets written by European Parliament candidates during European Parliament election in 2014.

La structure communautaire : évaluation et analyse de motifs dans les flux de liens

Jean Creusefond

Vendredi 13 mai 2016 à 11h, salle 24-25/405

Slides

Cet exposé sera en deux parties relativement indépendantes : l’évaluation des structures communautaires par le biais des vérités de terrain et l’analyse des l’appartenance communautaire des motifs dans les flux de liens L’évaluation de structures communautaires de manière théorique est très délicate : de multiples propriétés structurelles sont considérées comme importantes, par conséquent considérer une structure comme meilleure qu’une autre implique des choix arbitraires sur ces préférences, matérialisé par le choix d’une fonction de qualité ou de benchmarks. Afin d’éviter ces problèmes, beaucoup de chercheurs évaluent maintenant leurs résultats par comparaison avec des structures communautaires extraites en même temps que des jeux de données, en argumentant que la proximité entre leurs résultats et la vérité de terrain est une preuve significative de pertinence. Dans cette partie, je vais discuter d’une méthodologie permettant de concilier les deux approches et d’identifier quelles vérités de terrain favorisent quelles fonctions de qualité. Je soulignerai notamment le choix de la fonction de comparaison de partitions, souvent considéré comme anodin, mais changeant en fait radicalement les résultats. Pour référence, le programme développé (incluant un grand nombre d’algorithmes de détection de communautés et de fonctions de qualité) est entièrement disponible à l’adresse suivante : https://codacom.greyc.fr/ En seconde partie, je discuterai de travaux en cours d’analyse de flots de liens : des graphe dont chaque arc est étiqueté par un temps et où les multiarcs sont possibles. Les flots de liens qui nous intéressent ici représentent des réseaux de communication, c’est-à-dire que chaque arc représente une interaction orientée entre deux utilisateurs. Fréquemment, les algorithmes de détection de communautés qui tentent de les analyser agglomèrent le réseau de communication sur des fenêtres temporelles, où des méthodes traditionnelles (ou adaptées) peuvent êtres appliquées. Dans ce cas, une information est perdue : la causalité entre les liens. Par exemple, si un ensemble de personnes ont systématiquement la même structure de communication (ex : quand « A » interagit avec « B », celui-ci intéragit ensuite systématiquement avec « C » et « D »), peut-on en déduire la structure communautaire associée? Afin d’évaluer l’impact de cette information, je me suis intéressé aux motifs : des chaînes de communication dont la causalité semble probable (la première interaction a probablement entraîné la suivante, etc.). Le lien entre ces motifs et la structure communautaire reste donc à analyser, et je présenterai les outils mis au point à ce dessein ainsi que quelques résultats préliminaires.

Degeneracy-based mining of social and information networks: dynamics and applications

Fragkiskos Malliaros

Vendredi 01 avril 2016 à 11h, salle 24-25/405

Slides

Networks have become ubiquitous as data from diverse disciplines can naturally be mapped to graph structures. The problem of extracting meaningful information from large scale graph data in an efficient and effective way has become crucial and challenging with several important applications and towards this end, graph mining and analysis methods constitute prominent tools. In this talk, I will present part of my work that builts upon computationally efficient graph mining methods in order to: (i) design models for analyzing the structure and dynamics of real-world networks towards unraveling properties that can further be used in practical applications; (ii) develop algorithmic tools for large-scale analytics on data with inherent (e.g., social networks) or without inherent (e.g., text) graph structure. Our approaches rely on the concepts of graph degeneracy and core decomposition in graphs. In particular, for the former point I will show how to model the engagement dynamics of large social networks and how to assess their vulnerability with respect to user departures from the network. In both cases, by unraveling the dynamics of real social networks, regularities and patterns about their structure and formation can be identified; such knowledge can further be used in various applications including churn prediction and anomaly detection. For the latter, I will present a core decomposition-based approach for locating influential nodes in complex networks, with direct applications to epidemic control and viral marketing.

Modelling influence and opinion evolution in online collective behaviour

Samuel Martin

Vendredi 18 mars 2016 à 11h, salle 26-00/332

Slides

Opinion evolution and judgment revision are mediated through social influence. In this talk, I will present a study based on a crowdsourced in vitro experiment. The study shows how a consensus model can be used to predict opinion evolution in online collective behaviour. The model is parametrized by the influenceability of each individuals, a factor representing to what extent individuals incorporate external judgments. Judgment revision includes unpredictable variations which limit the potential for prediction. The study also serves to measure this level of unpredictability via a specific control experiment. More than two thirds of the prediction errors are found to occur due to unpredictability of the human judgment revision process rather than to model imperfection.

Uncovering the spatial structure of mobility networks in cities – Methods and applications

Thomas Louail

Vendredi 19 février 2016 à 11h, salle 24-25/405

Slides

The increasing availability of individual geographic footprints has generated a broad scientific interest for human mobility networks, at various scales and in different geographical contexts. In this talk I will present some recent results related to urban mobility. I will first present a method we developed to extract an expressive and coarse-grained signature from a large, weighted and directed network. I will then discuss the results we obtained when we applied this method to mobility networks extracted from mobile phone data in 31 Spanish cities, in order to compare the structure of journey-to-work commuting in these cities. The method distinguishes different types of links/flows, and clearly highlights a clear relation between city size and the importance of these types of . In the second part of my talk, I will focus on the shopping trips networks extracted from credit card transactions, performed by hundreds of thousands of anonymized individuals in the two largest Spanish cities. Starting from the bipartite networks that link individuals and businesses of the city, I will show that it is possible to evenly distribute business income among neighbourhoods — and then mitigate spatial inequality — by reassigning only a very small fraction of each individual’s shopping trips. This spatial and bottom-up approach of wealth redistribution could be easily implemented in mobile apps, that would assist individuals in slightly reshaping their mobility routines. Our results hence illustrate the social benefits individuals are entitled to expect from the analysis of the data they daily produce.

A method for the Approximation of the Maximal Consensus Local Community detection problem in Complex Networks

Patricia Conde Céspedes

Vendredi 22 janvier 2016 à 11h, salle 26-00/332

Slides

Although the notion of community does not have a unanimous accepted definition, it is often related to a set of strongly interconnected nodes. Indeed, the density of links plays an important role because it measures the strength of the relationships in the community. The need of these well connected and dense communities has led to the notion of consensus community. An consensus community, is a group of nodes where each member is connected to more than a proportion of the other nodes. An consensus community is maximal if and only if adding a new node to the set breaks the rule. Consequently, an consensus community has a density greater than . Existing methods for mining consensus communities generally assume that the network is entirely known and they try to detect all such consensus communities. Detecting the local community of specific nodes is very important for applications dealing with huge networks, when iterating through all nodes would be impractical. In this paper, we propose an efficient algorithm, called RANK-NUM-NEIGHS (RNN), based on local optimizations to approximate the maximal consensus local community of a given node. The proposed method is evaluated experimentally on real and artificial complex networks in terms of quality, execution time and stability. We also provide an upper bound on the optimal solution. The experiments show that It provides better results than the existing methods.

Densest subgraph computation and applications in finding events on social media

Oana Bllu

Vendredi 27 novembre 2015 à 11h, salle 24-25/405

Slides

Finding dense subgraphs in large graphs is a key primitive in a variety of real-world application domains, encompassing social network analytics, event detection, biology, and finance. In most such applications, one typically aims at finding several (possibly overlapping) dense subgraphs which might correspond to communities in social networks or interesting events. In this talk we present a natural generalization of the densest subgraph problem, where the main goal is to find at most k subgraphs with maximum total aggregate density, while satisfying an upper bound on the pairwise Jaccard coefficient between the sets of nodes of the subgraphs. We will also illustrate how finding dense subgraphs can be an important subroutine for event detection in social media. Social media has great potential, as apart from the traditional media sources, many users post updates on different events. The highly dynamic nature of social networks gives the benefit of timely updates and the huge amount of content the benefit of diversity and large coverage. However, finding events presents also non-trivial challenges given the large amount of noisy and irrelevant data present in social media.

Graph analysis of functional brain networks: theory, applications and issues

Fabrizio De Vico Fallani

Vendredi 4 décembre 2015 à 11h, salle 26-00/332

Slides

We have known for at least 100 years that the brain is organized as a network of connections between neuronal ensembles. However, only in the last 10 years there has been a rapid growth in our ability to quantify the complex topology of brain networks, using mathematical tools derived from graph theory. In this talk, I will present recent development of graph theoretical approaches to analyze brain networks and model (re)organizational principles of the neural function underlying behavior and outcome, in healthy and diseased conditions. The final part will be devoted to highlight some of the current issues related to complex brain network analysis.

Inferring synaptic connections from spike data of multiple neurons

Ryota Kobayashi

Vendredi 6 novembre 2015 à 11h, salle 26-00/332

Slides

Correlations in neuronal activity are defined as « functional connections » between pairs of neurons. It is still unclear how the functional connectivity is related to underlying (physiological) synaptic connectivity. Here, we develop a coupled escape rate model (CERM) to infer synaptic connections from spike data of multiple neurons. Estimation performance of the proposed method was compared to that of the previous methods by using a simulated data generated by a realistic cortical network model, which consists of thousands of detailed model neurons (Kitano & Fukai, 2007). We conclude that CERM method is a promising method to infer synaptic connections from multiple neural spike data. This is joint work with Katsunori Kitano.

Dependable Issues Resolved with Distributed Streams

Yann Busnel

Mardi 6 octobre 2015 à 11h, salle 24-25/405

The analysis of massive data streams is fundamental in many monitoring applications (e.g, Internet routers). For networks operators, it is a recurrent and crucial issue to determine whether huge data streams, received at their monitored devices, are correlated or not as it may reveal the presence of attacks. First, we propose a metric, called codeviation, that allows to evaluate the correlation between distributed streams. This metric is inspired from classical metric in statistics and probability theory, and as such enables to understand how observed quantities change together, and in which proportion. We then propose to estimate the codeviation in the data stream model. In this model, functions are estimated on a huge sequence of data items, in an online fashion, and with a very small amount of memory with respect to both the size of the input stream and the values domain from which data items are drawn. We give upper and lower bounds on the quality of the codeviation, and provide both local and distributed algorithms that additively approximates the codeviation among data streams using sub-linear space. On the other hand, we consider the problem of identifying global iceberg attacks in massive and physically distributed streams. A global iceberg is a distributed denial of service attack, where some elements globally recur many times across the distributed streams, but locally, they do not appear as a deny of service. A natural solution to defend against global iceberg attacks is to rely on multiple routers that locally scan their network traffic, and regularly provide monitoring information to a server in charge of collecting and aggregating all the monitored information. Any relevant solution to this problem must minimise the communication between the routers and the coordinator, and the space required by each node to analyse its stream. We propose a distributed algorithm that tracks global icebergs on the fly with guaranteed error bounds, limited memory and processing requirements. We present a thorough analysis of our algorithm performance. In particular we derive an optimal upper bound on the number of bits communicated between the multiple routers and the coordinator in presence of an oblivious adversary. Finally, we present the main results of the experiments we have run on a cluster of single-board computers. Those experiments confirm the efficiency and accuracy of our algorithm to track global icebergs hidden in very large input data streams exhibiting different shapes.

Modèles de génération des graphes de collaboration multi-niveaux

Ghislain Romaric Meleu

Vendredi 2 octobre 2015 à 11h, salle 24-25/405

Slides

Les entêtes des articles scientifiques permettent de construire trois réseaux de collaborations : les réseaux des auteurs, des laboratoires et des institutions. Les réseaux sont corrélés du fait des relations d’affiliations qui existent entre les acteurs des trois réseaux. Nous appelons réseaux hiérarchiques (ou multi-niveaux) de tels réseaux; les réseaux de niveaux supérieurs étant déduits du réseau de co-publication des auteurs. Nous étudions une généralisation de ces réseaux hiérarchiques où un acteur est affilié à une organisation qui peut elle aussi être affiliée à une organisation de niveau supérieur et ainsi de suite. Les relations entre entités à un même niveau sont déduites de celles existantes entre entités de niveau inférieur. tre capable de générer artificiellement de tels réseaux suppose que l’on comprenne à la fois comment les acteurs (au niveau le plus bas) interagissent, mais aussi la dynamique des affiliations aux organisations. Nous proposons une première analyse de tels réseaux. Nous commençons par construire un modèle de génération de graphes de collaboration (de niveau 0) basé sur l’arrivée de petites cliques. Nous observons expérimentalement que ces réseaux ainsi que les réseaux de niveaux supérieurs déduits à partir d’un modèle d’affiliation par attachement préférentiel sont des réseaux small-world et scale-free. Les démonstrations sont faites pour le niveau 0.

Compact routing, main results and techniques

Christian Glacet

Vendredi 25 septembre 2015 à 11h, salle 24-25/405

Slides

Message routing is a central activity in any interconnection network. Route efficiency and memory requirements are two major central parameters in the design of a routing scheme. Routing along short paths is clearly desirable, and the storage of the routing information at each node must also be limited to allow quick routing decision, fast update, and scalability. There is a trade-off between the route efficiency (measured in terms of stretch) and the memory requirements (measured by the size of the routing tables). For a n nodes network, the classical shortest path routing scheme achieve stretch 1 with n entries per node (router). Compact routing techniques offer different tradeoffs by allowing routing detours. It is known that in order to guaranty that every route has a stretch strictly lower than 2k+1 the routing tables must have size of order n^1/k at least. Is it actually possible to design routing schemes that attain these lower bounds? This is the question I will answer during my talk, explaining in the mean time the main algorithmic ideas used to achieve the currently best known trade-offs. Finally I’ll also introduce the more specific problem of compact routing in « internet-like » graphs.

Strategic Analysis and Design of Robust and Resilient Interdependent Power and Communication Networks with a New Model of Interdependency

Arunabha Sen

Vendredi 11 septembre 2015 à 11h, salle 26-00/332

Slides

The critical infrastructures of the nation such as the power grid and the communication network are highly interdependent. Recognizing the need for a deeper understanding of the interdependency in a multi-layered network, significant efforts have been made in the research community in the last few years to achieve this goal. Accordingly a number of models have been proposed and analyzed. Unfortunately, most of the models are over simplified and as such they fail to capture the complex interdependency that exists between entities of power grid and communication networks involving a combination of conjunctive and disjunctive relations. To overcome the limitations of existing models, we have recently proposed a new model that is able to capture such complex interdependency relations. Utilizing this model, we have studied a number of problems, including (i) identification the k most vulnerable nodes of an interdependent network, (ii) entity hardening problem, (iii) progressive recovery problem, (iv) targeted attack problem and several others. In this talk, we first present the new model and then discuss several problems that has been studied utilizing this model.

Classes of digraphs defined by forbidding induced subdigraphs and their chromatic-number

Pierre Aboulker

Jeudi 09 juillet 2015 à 11h, salle 26-00/332

Slides

A class of graphs is $chi$-bounded if there exists a function f such that for any graph G in the class, $chi(G)$ f ((G)). Gyrfas conjectured that for any tree T, the class of graphs that do not contain T as an induced subgraph is $chi$-bounded. We investigate the oriented analogue of this Conjecture. This a joint work with J. Bang-Jensen, N. Bousquet, P. Charbit, F. Havet, F. Maffray, S. Thomassé and J. Zamora

Inhomogeneous Hypergraphs

lie de Panafieu

Jeudi 02 juillet 2015 à 11h, salle 24-25/405

Slides

We introduce the inhomogeneous hypergraph model. Each edge can contain an arbitrary number of vertices, the vertices are colored, and each edge receives a weight which depends on the colors of the vertices it contains. This model provides a uniform setting to solve problems arising from various domains of computer science and mathematics. We will focus on applications to the enumeration of satisfied and satisfiable instances of Constraint Satisfaction Problems (CSP), and compute the limit probability for a random graph to be bipartit, the limit probability of satisfiability of systems of equations, the enumeration of properly k-colored graphs and investigate some graphs coming from social networks. We will present results on the asymptotics of inhomogeneous hypergraphs and their typical structure. Our main tool is analytic combinatorics.

Emergence of Superpeer Networks: A New Perspective

Bivas Mitra

Jeudi 11 juin 2015 à 11h, salle 24-25/405

Slides

In superpeer based p2p networks, the superpeer nodes are discovered through the process of bootstrapping, whereby resourceful peers get upgraded to superpeers. However, bootstrapping is influenced by several factors like limitation on the maximum number of connections a peer can have due to bandwidth constraints, limitation on the availability of information of existing peers due to cache size constraints and also by the attachment policy of the newly arriving peers to the resourceful peers. In this talk we propose an analytical framework which explains the emergence of superpeer networks on execution of the commercial peer-to-peer bootstrapping protocols by incoming nodes. Bootstrapping protocols exploit physical properties of the online peers like resource content, processing power, storage space, connectivity etc as well as take the finiteness of bandwidth of each online peer into consideration. With the help of rate equations, we show that execution of these protocols results in the emergence of superpeer nodes in the network – the exact degree distribution is evaluated. We also show that the cache parameters must also be suitably tuned so as to increase the fraction of superpeers in the network. We validate the developed framework through extensive simulation. The analysis of the results shows that the amount of superpeers produced in the network depends on the protocol as well as the properties of the joining nodes. As an application study, we show that our framework can explain the topological configuration of commercial Gnutella networks.

(Some) Social aspects of location privacy

Kévin Huguenin

Jeudi 18 juin 2015 à 11h, salle 24-25/405

Smartphones hold and collect ever-increasing amounts of personal and contextual data about their owners. This information is often shared on social networks or sent to on-line service providers, in order to benefit from social and context-aware features. This, however, raises serious privacy issues. My talk will revolve around the human and social aspect of privacy: I will discuss the privacy implications of the relationship between location, co-location (e.g., name tagging in posts and photos on mobile social networks) and social ties and the implications of the use of generalization-based privacy protection techniques on the utility of location-based services.