Research Areas

The brain project utilizes concepts and tools from network science to understand the structural principles of and functional implications for connectomes across species, from the nervous system of the model organism Caenorhabditis elegans, to the mouse, to the human. The brain is inherently multiscale in nature and may be conceptualized as a network at each level; from that of individual neurons and synapses to the integration of macroscopic brain regions. Recent rapid advances in neuroimaging technology and large collaborative efforts are driving an explosion of a wide variety of high quality data, which demand innovative approaches to understand and combine. We aim to tease apart and explain the roles of randomness and order in the complex geometry of and patterns within neural connections, and to develop experimentally testable hypotheses as regards to the fundamental principles behind the observed structure such as the necessity for the brain to control itself and the body to survive.

The Foodome project is a part of a large research project dedicated to developing a systematic approach to analyzing the lifestyle factors that contribute to coronary heart disease (CHD). Our lab aims to develop the tools and computational/measurement framework to accurately detect the relation between diet and CHD.

One of the most important issues today is improving healthcare quality on a large scale. We have begun looking at administrative healthcare data from California in the form of millions of individual patient hospital visits. Our goal is to understand how healthcare quality emerges as a network property from hospital networks and the ripple effects any one hospital node can have on the system.

We are working on a number of studies that develop mathematical and theoretical models for understanding internal control mechanisms for complex self-organized systems. One can control the behavior of a large network by taking control actions on a comparatively small number of nodes because the network structure broadcasts the influence of these "driver nodes" to distant parts of the network. These findings have tremendous implications for designing, disrupting, or facilitating system capabilities, including physical systems (e.g., climate change and resilience of habitats), technological systems, and biological systems.

The fundamental principle behind the Network Medicine and Biological Networks project is that disease phenotypes emerge from genotypes via the network properties of interactions between the underlying biological components. These phenotypes are best conceptualized as consequences of perturbations to disease modules of the biological networks in the cell, whether at the node level (disease genes) or the link level (disease edgotypes). We integrate patient-specific gene expression and protein interaction data to elucidate the precise basis of conditions from Parkinson's to asthma to heart disease. With the further analysis of drug-disease association and drug-target association data, we investigate the effects - therapeutic and undesired - of the associated medication. Understanding the molecular level networks allows us to understand the connections between different diseases and the effects of drugs designed to target them, paving the way for personalized treatments based on one's own interactome.

The goal of the Science of Success project is to develop measures, models and predictions that offer actionable information towards a quantitative evaluation of success in a diverse range of competitive settings, from science to sports and software development. Our work is driven by the hypothesis that success can become predictable to a substantial extent if we see it not as an individual phenomenon, but rather as a collective one. For a scientific finding, an athlete, or a software product to be successful, it is not enough to be novel, fundamental or high performing - the community must agree that it is worthy of praise and follow-up. Our aim is to understand the fundamental patterns that govern community impact by analyzing the evolution of career paths, of individual and team performances, and the dynamics of impact, using large-scale data sets that provide quantitative information on performance and success.

Featured projects

The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. While numerous heuristic methods exist that successfully pinpoint disease associated modules, the basic underlying connectivity patterns remain largely unexplored. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.

Launch project

How does impact change over a scientific career? Does impact, arguably the most relevant performance measure, follow predictable patterns? Can we predict the timing of a scientist's outstanding achievement? Driven by these questions, we studied the evolution of productivity and impact throughout thousands of scientific careers. We reconstructed the publication record of scientists from seven disciplines, connecting each paper with its long-term impact on the scientific community as quantified by citation metrics. We found that the highest impact work in a scientists career is randomly distributed within her body of work. That is, the highest-impact work has the same probability of falling anywhere in the sequence of papers published by a scientist. It could be the first publication, appear mid-career, or emerge last. This result is known as the random impact rule.

In this visualization, we show the random impact rule in all its power. You can explore careers in different disciplines, rank scientists according to different career parameters, or select a subset of them. You will always find the impact peaks occurring all over the place, from the beginning of a career on the left to the end of a career on the right.

Launch project

Reading remains the preferred leisure activity for most individuals, continuing to offer a unique path to knowledge and learning. As such, books remain an important cultural product, consumed widely. Yet, while over 3 million books are published each year, very few are read widely and less than 500 make it to the New York Times bestseller lists. And once there, only a handful of authors can command the lists for more than a few weeks. Here we bring a big data approach to book success by investigating the properties and sales trajectories of bestsellers. We find that there are seasonal patterns to book sales with more books being sold during holidays, and even among bestsellers, fiction books sell more copies than nonfiction books. General fiction and biographies make the list more often than any other genre books, and the higher a book’s initial place in the rankings, the longer the book stays on the list as well. Looking at patterns characterizing authors, we find that fiction writers are more productive than nonfiction writers, commonly achieving bestseller status with multiple books. Additionally, there is no gender disparity among bestselling fiction authors but nonfiction, most bestsellers are written by male authors. Finally we find that there is a universal pattern to book sales. Using this universality we introduce a statistical model to explain the time evolution of sales. This model not only reproduces the entire sales trajectory of a book but also predicts the total number of copies it will sell in its lifetime, based on its early sales numbers. The analysis of the bestseller characteristics and the discovery of the universal nature of sales patterns with its driving forces are crucial for our understanding of the book industry, and more generally, of how we as a society interact with cultural products.

Launch project

The concept of the cosmic web - viewing the universe as a set of discrete galaxies held together by gravity - is deeply ingrained in cosmology. Yet, little is known about architecture of this network or its characteristics. Our research used data from 24,000 galaxies to construct multiple models of the cosmic web, offering complex blueprints for how galaxies fit together. These three interactive visualizations help us imagine the cosmic web, show us differences between the models, and give us insight into the fundamental structure of the universe.

Launch project

A visual and data analytic exploration of success in tennis: Uncovering the relationship between performance and popularity. The life of a professional athlete is not a smooth ride, it is full of ups and downs, life-changing victories and crushing defeats, serious injuries and awe-inspiring recovery. IT is also glamorous. Athletes are cherished, admired, and often criticized as celebrities. Succeeding in the world of tennis means both excelling in the game and being popular enough to attract good endorsement deals. Here we delve deep into how success is achieved, both performance and popularity-wise and how those two relate to each other.

Launch project