Complex Systems approach to the emergence of socio-economic phenomena

Yérali Gandica

February 3rd 2022, 2pm; Room 26-00/332

Slides

In this talk, I will present some of my publications using Complex Systems approaches to understand the emergence of socio-economic phenomena. Complex System science aims to study the phenomena that emerge from the interactions between the constituents and, thus, cannot be understood by studying a singular, isolated component. The field has incorporated concepts and methods derived from many areas, ranging from statistical physics and biology to economics and sociology, which, in a constant process of cross-fertilization, have given rise to new types of questions framed into the field of Complex Systems. The study of interacting particle systems has, for along time, been an essential subject of physics. The use of statistical methods has allowed for significant advances in this area by providing a bridge between the microscopic interactions and the sizeable collective behaviour of the system. The application of the methods of statistical physics to social phenomena, where the interacting particles are now interacting human beings, has proven to be very fruitful in allowing for the understanding of many features of social collective behaviour. The success of the statistical physics approaches to explain social data is currently attracting much interest, as demonstrated by the rapidly increasing number of publications in the field from both natural and social scientists. Some of the works that I will present are based on simulations, while some others are based on real data. In some cases, the comparison between simulations and real data is achieved. I will also explain some works in progress and projects for the incoming medium and long term to give a flavour about the scientific path that I plan, to illustrate the coherence of my scientific line of research.

Random Spanning Forests on Graphs for Fast Laplacian-Based Computations

Nicolas Tremblay

lundi 17 janvier 2022, LIP6, Sorbonne Université

Graphs are ubiquitous tools to represent networks, may they be networks modeling data from neurosciences, sociology, molecular biology, chemistry, etc. A cornerstone of the analysis of graphs is the Laplacian matrix L that encodes their structure. From a linear algebra point-of-view, the analysis of L offers fundamental insights on key properties of the graph: the diffusion speed of an information or a disease on a network, the vulnerability of a network to targeted or random attacks, the redundancy of certain parts of the network, the network’s structure in more or less independent modules, are all examples of characteristics of a network one may extract from the Laplacian matrix.

In this work, we concentrate on two specific problems that often arise in the context of graph-based data: i/ compute inverse traces of the form Tr( (L+qI)^(-1) ), ii/ compute smoothing operations of the form (L+qI)^(-1) y where q>0 and y some vector defined over the nodes of the graph. These two problems arise in many well-known graph-based algorithms, such as semi-supervised learning, label propagation, graph Tikhonov regularization, graph inpainting, etc.

In the context of large graphs, the required inverse which scales as O(n^3) in the worst-case, is often too expensive in practice. Many approaches have been developed in the state-of-the-art to circumvent this problem: polynomial approximation and (preconditioned) conjugate gradient are the two most well-known.

In this work, we develop a new class of techniques based on random spanning forests. We show that these forests are natural candidates to provide original, efficient, and easy-to-implement estimators.

This is joint work with Pierre-Olivier Amblard, Luca Avena, Simon Barthelmé, Alexandre Gaudillière and Yusuf Yigit Pilavci

Presentation by Esteban Bautista on Fractional PageRank

Esteban Bautista

Lundi 25 Janvier 2021, à 14h Présentiel: salle 25-26-105,  Revoir La Présentation:

Slides

Graph-Based Semi-Supervised Learning (G-SSL) techniques learn from both labelled and unlabelled data to build better classifiers. This classification paradigm has received considerable attention since modern applications allow to collect large amounts of unlabelled but structured data, naturally encoded by a graph, in a relatively easy and inexpensive manner, while annotated data is expensive to obtain.  Despite successes, the performance of G-SSL techniques can still be improved, particularly in challenging data settings or unbalanced scenarios. To address such limitations, I will present a novel G-SSL method based on the positive γ -th powers of the graph Laplacian, referred to as the Lγ-PageRank method. I will present a theoretical analysis of the new method, showing that (i) for γ < 1, it extends the standard PageRank algorithm to Lévy processes: where random walkers can now perform far-distant jumps in a single step; and (ii) for γ > 1, it classifies data on signed graphs: where nodes belonging to one same class are more likely to share positive edges while nodes from different classes are more likely to be connected with negative edges. I will also show the existence of an optimal Laplacian power maximizing performance, for which I will propose an algorithm for its automatic estimation. Lastly, I will show the significant classification improvements allowed by the proposed approach on several real-world datasets commonly used for classification.  

Temporal Cliques Admit Sparse Spanners

Jason Schoeters

Mardi 10 Novembre 2020, à 11h en salle 25-26/105, Jussieu

Let G=(V,E) be an undirected graph on n vertices and λ:E →2^N a mapping that assigns to every edge a non-empty set of positive integer labels. These labels can be seen as discrete times when the edge is present. Such a labeled graph {\cal G}=(G,λ) is said to be temporally connected if a path exists with non-decreasing times from every vertex to every other vertex. In a seminal paper, Kempe, Kleinberg, and Kumar (STOC 2000) asked whether, given such a temporal graph, a sparse subset of edges can always be found whose labels suffice to preserve temporal connectivity—a temporal spanner. Axiotis and Fotakis (ICALP 2016) answered negatively by exhibiting a family of Θ(n^2)-dense temporal graphs which admit no temporal spanner of density o(n^2). The natural question is then whether sparse temporal spanners can always be found in some classes of dense graphs. In this paper, we answer this question affirmatively, by showing that if the underlying graph G is a complete graph, then one can always find temporal spanners of density O(n log n). The best known result for complete graphs so far was that spanners of density (n choose 2)−⌊n/4⌋=O(n^2) always exist. Our result is the first positive answer as to the existence of o(n^2) sparse spanners in adversarial instances of temporal graphs since the original question by Kempe et al., focusing here on complete graphs. The proofs are constructive and directly adaptable as an algorithm.

Modèles et algorithmes pour les graphes dynamiques

Mathilde Vernet

Jeudi 22 Octobre 2020 à 11h en salle 25-26/105, Jussieu

Slides

Les problèmes de graphes ont été largement étudiés dans le cas des graphes statiques. Cependant, ces graphes ne permettent pas de prendre en compte la dimension temporelle, qui est souvent une donnée importante pour les situations à modéliser. Les graphes dynamiques viennent combler ces lacunes en permettant de modéliser des évolutions dans le temps. On peut alors s’interroger sur ces mêmes problèmes de graphes dans un contexte dynamique. Cela passe d’abord par la définition du modèle de graphes dynamiques le plus approprié et la modélisation précise du problème sur ces graphes. Lorsque le problème ne peut pas être résolu efficacement en appliquant directement des méthodes connues sur les graphes statiques, il faut alors concevoir un algorithme de résolution spécifique aux graphes dynamiques et l’analyser théoriquement et expérimentalement. En suivant cette démarche, l’objectif de cette thèse est de s’interroger sur l’extension aux graphes dynamiques des problèmes bien connus sur les graphes statiques. Ce travail s’intéresse à plusieurs problèmes de graphes en contexte dynamique en se focalisant sur les aspects algorithmiques et en s’abstrayant des domaines d’applications. Nous nous intéressons d’abord aux problèmes de flot dans les graphes dynamiques et proposons en particulier pour le problème du flot de cout minimum un algorithme polynomial permettant de résoudre le problème de façon optimale pour un modèle de graphe dynamique spécifique. Des problèmes liés à la connexité des graphes dynamiques sont aussi étudiés. Les composantes connexes persistantes, extension des composantes connexes aux graphes dynamiques, traduisent la connexité d’un ensemble de nœuds pendant un certain nombre de pas de temps consécutifs. De façon analogue à la notion de maximalité des composantes connexes dans un graphe statique, une notion de dominance entre composantes connexes persistantes est définie. Un algorithme polynomial permettant d’identifier toutes les composantes connexes persistantes non dominées est proposé. Plusieurs extensions à la définition de composantes connexes persistantes sont étudiées. Nous proposons enfin des extensions possibles du problème de Steiner aux graphes dynamiques. Nous nous concentrons sur un cas particulier et montrons la NP-complétude de ce problème.  

The networks underlying collaborative learning and solving

Marc Santolini

Lundi 12 Octobre 2020 à 11h en salle 25-26/105, Jussieu

Slides

In this talk, I will review some of our recent work in understanding collaborative learning and solving using network approaches on large empirical datasets. First, using fine-grained quantitative data from digital lab notebooks of more than 2,000 teams who participated to the science and engineering iGEM competition in the past 10 years, I will exhibit shared aspects of team work, team structure and team dynamics, as well as features underlying team performance and team improvement throughout participations. I will then introduce our ongoing ‘iGEM TIES’ project aimed at mapping high-resolution team interactions in the lab using a bluetooth-enabled smartphone app. I will contrast these results with behavior observed in large, distributed open-source communities from GitHub. Finally, I will introduce our recent work on collaborative learning using fine-grain social data from online forums and phone call records, and show how interaction data can help predict learning outcomes and identify peer influence in performance and engagement. 

Exploration of Interactions for Influence Modelling in Online Social Networks

Monika Rakoczy

Vendredi 31 novembre 2020, à 14h, en salle 26-00/332, Jussieu

Slides

Online social networks are constantly growing in popularity. They enable users to interact with one another and mapping their relations to the virtual world. Users utilize social media platforms as a mean for a rich variety of activities, such as share and exchange information, create relations, and others. Such online human interactions take place within a dynamic environment where we can observe and distinguish many qualities related to relations between users, concerning influential, trusted or popular individuals. In this talk, we will focus on the qualities of users connected to four important concepts: influence, reputation, trust, and popularity, in the scope of SNA for influence modelling. We will examine some of the existing works utilizing these notions and emphasize the most important features that these concepts should include in order to measure them based on the SNA information. Using the notions, we will concentrate on a practical model for influence estimation, called Action-Reaction Influence Model (ARIM). This model considers the type, quality, quantity, and frequency of actions performed by users in SN, and is adaptive to different SN types. Furthermore, we will discuss a notion not yet explored much in SNA discipline — micro-influence, which targets new phenomena of users with a small but highly involved audience, who are observed to be still highly impactful. Finally, we will focus on the quantification of influence over time and representation of influence causal effect. Particularly, we will consider a specific kind of SN – citation network, which is highly time-sensitive. Accordingly, we will discuss another influence estimation model, which determines influence during a particular time period between communities within time-dependent citation networks.

Tools and Methods for Human Mobility Analysis with Mobile Phone Data

Vincent Gauthier

Mardi 19 novembre 2019, à 14h, en salle 26-00/332, Jussieu

Slides

During recent years, the study of population dynamics from mobile traffic data has proven to offer rich insights into human mobility laws, disaster recovery, infective disease epidemics, commuting patterns, urban planning, measurement of air pollution in cities, and measurement of energy consumption of cities. These studies have demonstrated how data collected by mobile network operators can effectively complement, or even replace, traditional sources of demographic data, such as censuses and surveys. We present here a series of works that we developed method to extract mobility information from mobile phone data aim for public transport authorities. With the first study, we developed an unsupervised algorithm that enables the mapping of mobile phone traces over a multimodal transport network. One of the main strengths of our work was its capability to map noisy sparse cellular multimodal trajectories over a multilayer transportation network where the layers have different physical properties and not only to map trajectories associated with a single layer. In a second study, we proposed a new approach to infer population density at urban scales, based on aggregated mobile network traffic metadata. Our approach allowed estimating both static and dynamic populations, achieved a significant improvement in terms of accuracy with respect to state-of-the-art solutions in the literature and was validated on different city scenarios.

Presentation of a library for stream-graphs

Yiannis Siglidis

10 Octobre 2019, 14:00. Salle 26-00 332, Jussieu.

Slides

The study of dynamical graphs, i.e. a data-model capable of modelling temporal-structures, becomes more and more important nowadays as the amount of data produced and the storing capabilities of the existing computational infrastructure associates information more and more with time. In their paper « Stream Graphs and Link Streams for the Modeling of Interactions over Time » of 2017, Matthieu Latapy, Tiphaine Viard, Clémence Magnien, defined a theoretical framework for modelling temporal-networks, that attempts a consistent temporal-generalisation of graph-theory. Under the founding of the ODYCCEUS(https://www.odycceus.eu/) interdisciplinary european program, a first attempt for implementing a library for stream-graphs, by Yiannis Siglidis (developer) and Robin Lamarche- Perrin (supervisor). Their implementation stands as a prototype of a generic basis that would support all existing implementations of stream-graph algorithms, while allowing enrichment and re-design of elementary data-structures. Its contribution shouldn’t only be considered as a technical one, in the sense of producing a new baseline-tool, but also as a theoretical one, as we examined and uncovered some of the immaturities of current from the view of implementation. We hope that this tool will show a small part of the computational possibilities that are provided by the formalism of stream-graphs and as so raise the importance of theoretical research in that direction as well as theoretical research around data-structures, supporting this novel approach on dynamical networks. Their current implementation supports continuous, discrete, instantaneous, weighted and unweighted instances of directed stream-graphs. The libary can be found at https://github.com/ysig/stream_graph and is licensed under GNU-GPL3.

Randomized reference models for temporal networks

Christian Lyngby Vestergaard

mercredi 18 décembre 2019 à 10h30, salle 24-25-405, Jussieu

Many dynamical systems can be successfully analyzed by representing them as networks. Empirically measured networks and dynamic processes that take place in these situations show heterogeneous, non-Markovian, and intrinsically correlated topologies and dynamics. This makes their analysis particularly challenging. Randomized reference models (RRMs) have emerged as a general and versatile toolbox for studying such systems. Defined as ensembles of random networks with given features constrained to match those of an input (empirical) network, they may for example be used to identify important features of empirical networks and their effects on dynamical processes unfolding in the network. RRMs are typically implemented as procedures that reshuffle an empirical network, making them very generally applicable. However, the effects of most shuffling procedures on network features remain poorly understood, rendering their use non-trivial and susceptible to misinterpretation. Here we propose a unified framework for classifying and understanding microcanonical RRMs (MRRMs). Focusing on temporal networks, we use this framework to build a taxonomy of MRRMs that proposes a canonical naming convention, classifies them, and deduces their effects on a range of important network features. We furthermore show that certain classes of compatible MRRMs may be applied in sequential composition to generate over a hundred new MRRMs from the existing ones surveyed in this article. We provide two tutorials showing applications of the MRRM framework to empirical temporal networks: 1) to analyze how different features of a network affect other features and 2) to analyze how such features affect a dynamic process in the network. We finally survey applications of MRRMs found in literature. Our taxonomy provides a reference for the use of MRRMs, and the theoretical foundations laid here may further serve as a base for the development of a principled and automatized way to generate and apply randomized reference models for the study of networked systems.

Mesurer le degré de polarisation de l’espace médiatique et politique sur YouTube : structure des chaînes dabonnés, similarité daudience des fans et diversité des recommandations algorithmiques.

Bilel Benbouzid

8 Octobre, 2019, 11:00hrs. Salle 25-26/105, Jussieu.

Cette présentation vise à rendre compte des résultats d’une recherche sur l’espace politique et médiatique de la plateforme YouTube. Deux questions principales motivent notre étude qui se situe encore dans une phase exploratoire.  Tout d’abord, YouTube n’étant pas un réseau social en tant que tel – c’est à la fois un réseau d’utilisateurs (chaque utilisateur est une chaîne), un espace de stockage auquel on a accès via un moteur de recherche et une plateforme de streaming qui dépend en grande partie d’un système de recommandation – comment rendre compte de sa structure et catégoriser les chaînes qui la composent? Deuxièmement, nous cherchons à mesurer le degré de polarisation de l’espace médiatique et politique sur YouTube : l’analyse relationnelle des chaînes révèle-t-elle une structure équivalente à celle de l’écosystème médiatique révélé par le Médialab de Sciences Po à partir d’une analyse des réseaux d’hyperliens des sites d’information et leur dynamique de circulation sur Twitter (Cardon, 2019) ? Ou bien, au contraire, les relations entre des chaînes sur YouTube fait-elle apparaître une structure spécifique à l’architecture de la plateforme ? Pour répondre à cette question, il faut analyser YouTube du point de vue des différents modes de mise en visibilité des chaînes, qu’ils soient humains ou algorithmique. C’est pourquoi, nous proposons de présenter la structuration des chaînes sur YouTube en trois dimensions qui sont autant de manières différentes, mais complémentaires de rendre compte de la polarisation : le réseau social qui correspond au réseau des chaînes abonnées ou recommandées par les chaînes elles-mêmes ; le réseau des chaînes qui partagent des communautés de fans ; et le réseau de chaînes formé par les vidéos recommandées conjointement par l’algorithme. Ces trois dimensions donnent à voir les facettes multiples de YouTube. Alors que la première permet de comprendre la manière dont les humains se recommandent les chaînes entre eux, la seconde montre les publics partagés par les chaînes, la troisième rend compte de l’espace médiatique et politique formé par une machine (la recommandation algorithmique). Dans cette phase exploratoire, nous avons comparé ces trois dimensions selon la diversité des contenus qu’elles valorisent (une catégorisation des chaînes a été réalisée manuellement). Dans cette présentation nous montrerons 1) les réseaux de chaînes qui se donnent à voir selon les trois dimensions, 2) les matrices de modularité en fonction des catégories de chaîne et 3) une comparaison de la distribution des catégories de chaines selon les trois dimensions à partir des résultats des expériences de marche aléatoire et des calculs de la perplexité qui leur sont associés. Nous montrons que les réseaux de chaînes produits par les humains (les réseaux de chaînes amis et celui des publics de fan) polarisent moins que le réseau produit par la recommandation algorithmique. Nous montrons aussi que si bulle de filtre il y a, elle ne se situe pas où elle est attendue : c’est moins les contenus à caractère sensationnels et « complotistes » qui sont les plus recommandés par la plateforme (ce qui est souvent dénoncé dans le débat public), mais les chaines de médias traditionnels – un des effets sans doute de l’action menée par YouTube faire évoluer le système de recommandation. Ces résultats invitent à discuter de la nature de l’espace public que construit YouTube dans ce contexte de problème de désinformation. YouTube semble accorder un soutien particulier aux contenus des professionnels de l’information, bien plus qu’à celui des chaînes alternatives. Dans cette discussion, nous proposons de débattre de la nature de la « démocratie YouTube » au travers des interprétations des métriques et des modèles d’analyse de réseaux.

OTMedia+ : Graphes et Propagation d’information

Nicolas Hervé

9 Juillet, 2019, 11:00hrs. Salle 26-00/332, Jussieu.

OTMedia (Observatoire TransMedia) est une plateforme logicielle dédiée aux projets de recherche qui permet d’analyser de grandes quantités de données diverses, multimodales, transmédia liées à l’actualité française et francophone. OTMedia collecte, traite et indexe en permanence des milliers de flux provenant de la télévision, de la radio, du Web, de la presse, des agences de presse et de Twitter. Dans le contexte de ce projet, nous souhaiterions étudier la propagation d’informations et d’images sur le Web en utilisant la théorie des graphes pour nous aider à extraire les caractéristiques/indicateurs pour décrire les événements médiatiques.

Drawing and Visualising Event-Based Dynamic Graphs.

Daniel Archambault

May 27th, 11h Room 24-25-405. UPMC – Sorbonne Université. 4 Place de Jussieu, 75005 Paris.

One of the most important types of data in data science is the graph or network. Networks encode relationships between entities:  people in social network, genes in biological network, and many others forms of data.  These networks are often dynamic and consist of a set of events — edges/nodes with individual timestamps.  In the complex network literature, these networks are often referred to as temporal networks.  As an example, a post to a social media service creates an edge existing at a specific time and a series of posts is a series of such events.  However, the majority of dynamic graph visualisations use the timeslice, a series of snapshots of the network at given times, as a basis for visualisation. In this talk, I present two recent approaches for event-based network visualisation:  DynNoSlice and the Plaid. DynNoSlice is a method for embedding these networks directly in the 2D+t space-time cube along with methods to explore the contents of the cube.  The Plaid is an interactive system for visualising long in time dynamic networks and interaction provenance through interactive timeslicing.

Applications through human mobility lens

Vsevolod Salnikov

jeudi 4 avril 2019, 14h, salle 26-00/332, LIP6, Sorbonne Université

Slides

In this talk I will present various data-oriented projects we have done recently. The general line will focus on human mobility sensing and different applications of such datasets from more theoretical ones towards extremely applied, which are on the border of research and commercial activities.Moreover we will discuss different stages: from data collection towards models and application as well as the ‘in-the-field’ validation of model predictions. I will propose few ways of data collection, which permitted to get impressive and reliable datasets with almost no cost. These datasets are already used for studies, but I would be also happy to discuss various applications and ways to collaborate!

Modélisation du contrôle des utilisateurs sur leurs données personnelles

Pablo Rauzy

Vendredi 12 avril 2019, 11h, salle 25-26/105, LIP6, Sorbonne Université

Du point de vue d’un utilisateur ou d’une utilisatrice d’un système d’informations, la privacy correspond au contrôle qu’il ou elle peut exercer sur ses données personnelles dans ce système. Cette vision de la privacy est essentielle si l’on veut contribuer au développement de technologies émancipatrices, c’est à dire aux services de leurs utilisateurs et utilisatrices seulement. L’étude et l’évaluation rigoureuse de la privacy offerte par un système nécessite donc une caractérisation formelle de ce contrôle. Nous proposons un cadre formel basé sur des capacités qui permet de spécifier et de raisonner sur ce contrôle et ses propriétés. Nous verrons au travers d’exemples que cela permet notamment la comparaison de mises en oeuvre alternatives d’un même système (un réseau social basique dont nous comparons trois implémentations possibles), et donc la possibilité d’étudier et d’optimiser la privacy dès la phase de conception.

Neighbour-distinguishing decompositions of graphs

Mohammed SENHAJI

Vendredi 15 mars 2019, 14hrs, salle 25-26/105, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

The main question that we explore was introduced by Karonski, Luczak and Thomason in 2004 : Can we weight the edges of a graph G , with weights 1 ,2 , and 3 , such that any two of adjacent vertices of G are distinguished by the sum of their incident weights ? This question later becomes the famous 1-2-3 Conjecture.In this presentation we explore several variants of the 1-2-3 Conjecture, and their links with locally irregular decompositions. We are interested in both optimisation results and algorithmic problems. We first introduce an equitable version of the neighbour-sum-distinguishing edge-weightings, that is a variant where we require every edge weight to be used the same number of times up to a difference of 1. After that we explore how neighbour-sum-distinguishing weightings behave if we require sums of neighbouring vertices to differ by at least 2. Namely, we present results on the smallest maximal weight needed to construct such weightings for some classes of graphs, and study some algorithmic aspects of this problem. Due to the links between neighbour-sum-distinguishing edge weightings and locally irregular decompositions, we also explore the locally irregular index of subcubic graphs, along with other variants of the locally irregular decomposition problem. Finally, we present a more general work toward a general theory unifying neighbour-sum-distinguishing edge-weightings and locally irregular decompositions.

Minorities in Networks

Claudia Wagner

Lundi 28 janvier 2018, 11hrs, salle 24-25/405, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

Networks are the infrastructure of our social and professional life andalso of modern information systems where billions of documents andentities are interlinked. However, not all nodes are equal in thesenetworks. Often we observe attributes (e.g. gender or ethnicity) thatdefine the group membership of a node. In this talk I will explore therole of minorities in social networks and information networks, provideempirical evidence for the disadvantage of minorities and discussfactors that may place minorities at a disadvantage.

What graphs can contribute to a more transparent artificial intelligence

Tiphaine Viard

January 17th 2019, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

AI and machine learning are commonly described as « black boxes » that are efficient, but opaque. While complete opacity would be an exageration, it is true that many methods for explainability rely on forms of retro-engineering: we try to infer the model from its (partial, intermediary, final) results. These methods are typically based on large-scale, efficient matrix manipulation. Graphs and their extensions have shown to be visualisable and interpretable, even at large scales. In their classical formulation, they are also very similar to matrices. However, few to no machine learning method explored what graphs could contribute to its models.  This is partly due to the fact that graph computations have long been expensive, typically having polynomial running times, which is incompatible with the scale of data in most of today’s machine learning applications. However, the situation has changed: (i) the impact of AI on society makes it no longer acceptable to favour efficiency despite transparency, and (ii) recent advances in algorithmic methods on graphs demonstrates that due to the nature of real-world graphs, even some NP-hard problems become tractable. The aim of this talk is to explore this avenue of research. We will discuss the state-of-the art in learning from graph data, present some recent results showing that structure-based features indeed have the potential to make machine learning more transparent at no extra cost, and finally we will discuss future tracks of research.

Mining the Integrated Connectedness of Biomedical Systems

Prof. Dr. Natasa Przulj

December 7th 2018, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

We are faced with a flood of molecular and clinical data. Various bio-molecules interact in a cell to perform biological function, forming large, complex systems. Large-scale patient-specific omics datasets are increasingly becoming available, providing heterogeneous, but complementary information about cells, tissues and diseases. The challenge is how to mine these interacting, complex, complementary data systems to answer fundamental biological and medical questions.  Dealing with them is nontrivial, because many questions we ask to answer from them fall into the category of computationally intractable problems, necessitating the development of heuristic methods for finding approximate solutions. We develop methods for extracting new biomedical knowledge from the wiring patterns of systems-level, heterogeneous, networked biomedical data.  Our methods link the patterns in molecular networks and the multi-scale network organization with biological function.  In this way, we translate the information hidden in the wiring patterns into domain-specific knowledge. In addition, we introduce a versatile data fusion (integration) framework that can effectively integrate the information obtained from mining molecular networks with patient-specific somatic mutation data and drug chemical data to address key challenges in precision medicine: better stratification of patients, prediction of driver genes in cancer, and re-purposing of approved drugs to particular patients and patient groups. Our new methods stem from novel network science approaches coupled with graph-regularized non-negative matrix tri-factorization, a machine learning technique for dimensionality reduction and co-clustering of heterogeneous datasets. We utilize our new framework to develop methodologies for performing other related tasks, including disease re-classification from modern, heterogeneous molecular level data, inferring new Gene Ontology relationships, and aligning multiple molecular networks.