DiffRank

Ranking Differential Hubs in Homogenous Networks

Omar Odibat and Chandan K. Reddy

MOTIVATION:

Networks have been extensively used to model various complex systems such as online social networks, co-authorship and citation networks and gene networks. Due to different kinds of variations such as temporal, spatial, topic and phenotypic variations, several variants of the same network may exist. For several practical problems, identifying the nodes that are changing between the networks provide vital information regarding the dynamics of the network states. Given two networks where the nodes are the same in both networks, but the edges are different, we consider the problem of identifying a set of hubs that best explain the differences between the two networks. For example, identifying the genes that change between two conditions, such as normal versus cancer, is a crucial task in understanding the causes of diseases, and it requires studying the relationships between the genes in both conditions. Differential networking has emerged as powerful approach to detect the changes in network structure and to identify the differentially connected genes among two networks. However, current differential networking methods primarily depend on pair-wise comparisons of the genes in two networks based on their degrees. Therefore, these methods cannot capture all the topological changes in the network structure.

DIFFRANK METHOD:

Here, we propose a novel ranking algorithm, DiffRank, which ranks the nodes of two networks based on their differential behavior between the two networks. To achieve this goal, we define two novel scoring measures: a local structure measure, differential connectivity, to capture the local differences between two networks based on their weighted edges, and a global structure measure, differential betweenness centrality, to capture the global differences between two networks based on the shortest paths. These measures are optimized by propagating through the network. Finally, the nodes are ranked based on the propagated scores.

Results: We demonstrate the effectiveness of DiffRank on synthetic and real datasets. For the synthetic dataset, we develop a simulator for generating synthetic differential scale-free networks, and we compare our method with existing methods. The comparisons show that our algorithm outperforms the existing methods. For the real datasets, we apply the proposed algorithm on collaboration networks and and on five gene expression datasets.

Co-authorship Networks: In scientific co-authorship networks, the nodes are authors of academic papers and the edges represent co-authorship (or collaboration) relationships between the authors. Two authors may have different relationships in two different research topics such as data mining and database. The differential hubs in this case include the authors who are highly active in one topic but not active in the other topic, or they may include the authors who are active in both topics but with different collaborators in each topic. Differential networking can also be used to analyze two co-authorship networks that are constructed from two mutually exclusive time intervals to identify the authors whose collaborations change over time (The DBLP dataset). The networks and the results are availabl in the paper. .

Biological Networks: Microarray studies are used to measure the expression level of thousands of genes under different conditions. These conditions could be different tissue types (normal vs cancerous), different subject types (e.g., male vs female), different group types (African-American and Caucasian American) , different stage of cancer (early stage vs developed stage) or different time points. Here, the nodes are the genes, and the edges represent the interactions between the genes. Since the genes that have strongly altered connectivity play an important role in the disease phenotype, finding the differential genes can be used in several applications such as identifying disease-causing genes and examining the effects of a certain treatment. Links to the five gene expression datastes used in the paper The Leukemia dataset The Medulloblastoma dataset The Lung cancer dataset The Colon cancer dataset The Gastric cancer dataset The results of the gene expression datasets were evaluated using the DAVID tool. In addition, we compare our results with the previously published results. We show that the proposed method provides biologically interesting rankings.The Results of the gene expression data is available here

The R source code for the proposed algorithm and the simulator is available here and readme.txt is available here

PUBLICATIONS:

Omar Odibat, Chandan K. Reddy "Mining Differential Hubs in Homogenous Networks", In Proceedings of KDD workshop on Mining and Learning with Graphs (MLG'11), San Diego, CA, 2011. [ PDF ]

Omar Odibat and Chandan K. Reddy, "Ranking Differential Genes in Co-expression Networks", Journal of Bioinformatics and Computational Biology (JBCB), 2011. Invited Paper (in press.) [ PDF ]