The observation that disease associated proteins often interact with each other has fueled the development of network-based approaches to elucidate the molecular mechanisms of human disease. Such approaches build on the assumption that protein interaction networks can be viewed as maps in which diseases can be identified with localized perturbation within a certain neighborhood. The identification of these neighborhoods, or disease modules, is therefore a prerequisite of a detailed investigation of a particular pathophenotype. In this work we aim to fill this gap by analyzing the network properties of a comprehensive corpus of 70 complex diseases. We find that disease associated proteins do not reside within locally dense communities and instead identify connectivity significance as the most predictive quantity. This quantity inspires the design of a novel Disease Module Detection (DIAMOnD) algorithm to identify the full disease module around a set of known disease proteins. We study the performance of the algorithm using well-controlled synthetic data and systematically validate the identified neighborhoods for a large corpus of diseases.
How does impact change over a scientific career? Does impact, arguably the most relevant performance measure, follow predictable patterns? Can we predict the timing of a scientist's outstanding achievement? Driven by these questions, we studied the evolution of productivity and impact throughout thousands of scientific careers. We reconstructed the publication record of scientists from seven disciplines, connecting each paper with its long-term impact on the scientific community as quantified by citation metrics. We found that the highest impact work in a scientist's career is randomly distributed within her body of work. That is, the highest-impact work has the same probability of falling anywhere in the sequence of papers published by a scientist. It could be the first publication, appear mid-career, or emerge last. This result is known as the random impact rule. In this visualization, we show the random impact rule in all its power. You can explore careers in different disciplines, rank scientists according to different career parameters, or select a subset of them.
Reading remains the preferred leisure activity for most individuals, continuing to offer a unique path to knowledge and learning. As such, books remain an important cultural product, consumed widely. Yet, while over 3 million books are published each year, very few are read widely and less than 500 make it to the New York Times bestseller lists. And once there, only a handful of authors can command the lists for more than a few weeks. Here we bring a big data approach to book success by investigating the properties and sales trajectories of bestsellers. The analysis of the bestseller characteristics and the discovery of the universal nature of sales patterns with its driving forces are crucial for our understanding of the book industry, and more generally, of how we as a society interact with cultural products.
The concept of the cosmic web - viewing the universe as a set of discrete galaxies held together by gravity - is deeply ingrained in cosmology. Yet, little is known about architecture of this network or its characteristics. Our research used data from 24,000 galaxies to construct multiple models of the cosmic web, offering complex blueprints for how galaxies fit together. These three interactive visualizations help us imagine the cosmic web, show us differences between the models, and give us insight into the fundamental structure of the universe.
A visual and data analytic exploration of success in tennis: Uncovering the relationship between performance and popularity. The life of a professional athlete is not a smooth ride, it is full of ups and downs, life-changing victories and crushing defeats, serious injuries and awe-inspiring recovery. IT is also glamorous. Athletes are cherished, admired, and often criticized as celebrities. Succeeding in the world of tennis means both excelling in the game and being popular enough to attract good endorsement deals. Here we delve deep into how success is achieved, both performance and popularity-wise and how those two relate to each other.
The brain project utilizes concepts and tools from network science to understand the structural principles of and functional implications for connectomes across species, from the nervous system of the model organism Caenorhabditis elegans, to the mouse, to the human. The brain is inherently multiscale in nature and may be conceptualized as a network at each level; from that of individual neurons and synapses to the integration of macroscopic brain regions. Recent rapid advances in neuroimaging technology and large collaborative efforts are driving an explosion of a wide variety of high quality data, which demand innovative approaches to understand and combine. We aim to tease apart and explain the roles of randomness and order in the complex geometry of and patterns within neural connections, and to develop experimentally testable hypotheses as regards to the fundamental principles behind the observed structure such as the necessity for the brain to control itself and the body to survive.
The Foodome project is a part of a large research project dedicated to developing a systematic approach to analyzing the lifestyle factors that contribute to coronary heart disease (CHD). Our lab aims to develop the tools and computational/measurement framework to accurately detect the relation between diet and CHD.
One of the most important issues today is improving healthcare quality on a large scale. We have begun looking at administrative healthcare data from California in the form of millions of individual patient hospital visits. Our goal is to understand how healthcare quality emerges as a network property from hospital networks and the ripple effects any one hospital node can have on the system.
We are working on a number of studies that develop mathematical and theoretical models for understanding internal control mechanisms for complex self-organized systems. One can control the behavior of a large network by taking control actions on a comparatively small number of nodes because the network structure broadcasts the influence of these "driver nodes" to distant parts of the network. These findings have tremendous implications for designing, disrupting, or facilitating system capabilities, including physical systems (e.g., climate change and resilience of habitats), technological systems, and biological systems.
The fundamental principle behind the Network Medicine and Biological Networks project is that disease phenotypes emerge from genotypes via the network properties of interactions between the underlying biological components. These phenotypes are best conceptualized as consequences of perturbations to disease modules of the biological networks in the cell, whether at the node level (disease genes) or the link level (disease edgotypes). We integrate patient-specific gene expression and protein interaction data to elucidate the precise basis of conditions from Parkinson's to asthma to heart disease. With the further analysis of drug-disease association and drug-target association data, we investigate the effects - therapeutic and undesired - of the associated medication. Understanding the molecular level networks allows us to understand the connections between different diseases and the effects of drugs designed to target them, paving the way for personalized treatments based on one's own interactome.
The goal of the Science of Success project is to develop measures, models and predictions that offer actionable information towards a quantitative evaluation of success in a diverse range of competitive settings, from science to sports and software development. Our work is driven by the hypothesis that success can become predictable to a substantial extent if we see it not as an individual phenomenon, but rather as a collective one. For a scientific finding, an athlete, or a software product to be successful, it is not enough to be novel, fundamental or high performing - the community must agree that it is worthy of praise and follow-up. Our aim is to understand the fundamental patterns that govern community impact by analyzing the evolution of career paths, of individual and team performances, and the dynamics of impact, using large-scale data sets that provide quantitative information on performance and success.