Projects :: Evolution of complex networks - how does the Internet grow?
The principal goal of the project is familiarizing with the basic concepts and models of complex networks, studying the structure of complex networks by modelling their growth, developing the software for the analysis of complex networks, application to technological and/or sociological and/or biological problems. Special emphasis is put on the description of complex networks appearing in nature and society, such as the Internet. The model of preferential attachment of Albert and Barabási, along with some of its elaborations, is studied in particular.
Complex networks serve as a new way of representing very diverse systems in nature, biology, economics and society. Their main elements are nodes and links.
Description of complex networks
- node degree k is the number of links attached to that node
- node degree distribution P(k) is the probability that a randomly chosen node in a network has the degree k
- clustering coefficient is the ratio of the number of realized links between a node's first neighbouring nodes and the maximal possible number of links between these nodes
- average shortest path length
Scale-free networks are networks which are characterized by a straight line in the log k - log P(k) plot. A large majority of complex networks in nature and society are scale-free networks.
Exponential networks are networks which have an exponential log k - log P(k) plot.
Burning algorithm is used to determine individual components of a complex network and to calculate the average shortest path. It functions like spreading of fire through the network, each node that is on fire spreads the fire to its first neighbours unless they are already on fire.
Adjacency matrices are used for representation of complex networks . In an adjacency matrix each member (Cij) equals 1 if nodes i and j are connected and 0 if they are not connected. There are also some other properties of adjacency matrices and these are: Cij = Cji and Cii = 0.
Examining various network models
We decided that the best point to start was the Albert- Barabási model. We studied available material and then created a computer program that grew an AB network. Each new node in the AB network was preferentially attached to an old node. The new node was more likely to attach itself to an existing node with more connections. Probability of attachment was therefore proportional to the degree of the old node. Then we could study the network closely and analyse it. We made probability distribution and cumulative probability distribution plots and saw a straight line in a log - log plot. After running the simulation for a few times, each time we saw a straight line. We concluded that it was one of AB network properties and that all AB networks would have a straight line in the log - log plot. This is why AB networks are scale-free networks. If we take any part of the plot, we do not know what part of the plot it is because every part looks the same (straight line).
We also studied one alternative to preferential attachment which is called random attachment. In random attachment each new node is attached to one old node, but the probability of attachment is the same for all nodes. After running the simulation and creating the cumulative distribution plot, we noticed that it was not a straight line in the log - log plot. It had exponential distribution. Therefore we concluded that preferential linking was needed in order to obtain scale-free networks.
Now after looking at these two models we decided to create a hybrid model by combining power law and exponential networks. We introduced a new parametar (p) into the simulation. Now when a new node was added, we randomly chose between the Albert-Barabási model (with probability p) and the random model (with probability 1-p). After looking at the cumulative probability distributions for various values of p, we noticed that as p was closer to 1, the network was more like a scale-free network than exponential network, and as it was closer to 0 the network was more like exponential than scale-free. If p = 1 then we had a scale-free network and if p = 0 we had an exponential network.
We also created a model in which we used generalized preferential linking to attach a new node. A new parameter b was introduced. Now the probability of a new node attaching itself to an existing node was proportional to the node degree of the old node to the power of b. We tried different values for b and noticed that as b = 0 we had an exponential network. As b approached 1, the network was more and more like a scale-free network, and if b > 1, then we noticed condensation of links to a single node (most of the nodes were connected to the same node).
In order to increase the efficiency of a network, we need to optimize it. First of all, we need to use the burning algorithm to identify individual components of a complex network. After that we calculate the average shortest path length. For example, if we are dealing with a two-component network, we have to connect one node from the first component to one node from the second component and then calculate the average shortest path length of the thus formed network. We repeat the procedure for all such pairs of nodes. In the end, the combination of one node from the first and one from the second component with the minimum average shortest path is used as best optimization of a network using the addition of one link.
Analysis of the human TNF- a /NF- K B signal transduction pathway network
While creating simulations, we also developed programs for complex network analysis. Thanks to Danijel (from the molecular biology team), who found a network we could analyse, we used these programs to analyse a real complex network. We analysed the human TNF- a /NF- K B signal transduction pathway network. The proteins in this network were nodes and protein interactions were links between the nodes. First of all, from a protein interaction list we created an adjacency matrix that our programs could then use to analyse the network. After the analysis we created a cumulative probability distribution for the node degree plot. We concluded that this network was a real network since the tail of the log k - log Pcum(k) plot was more or less a straight line. We could also observe a sharp cut-off in the tail because this is a small network. (Click here for image with better resolution.)
After the initial analysis we tried to remove individual nodes and then looked how that affected the number of components and the average shortest path of the network. We could clearly see that the number of components increased greatly only after removal of certain nodes. We could also see that the average shortest path acted in the same way. This behaviour is typical of real networks (scale-free networks). Real networks are resistant to failure of most nodes, but are not resistant to terrorist attacks (when nodes of great importance are purposely removed). It was also very interesting that after removing nodes that produced a large difference in number of components, the average shortest path increased the most and vice versa. (Click here for image with better resolution.)
The team has performed numerous simulations using the software developed by the team members. We have also created many plots, animations and graphical simulations to better present out results. A real biological network has been completely analysed by the software developed during the project.
You can download the final presentation here (PDF, 1.5 MB).
Complex networks team (from left to right): Branko Durdevic, Nino Antulov Fantulin,
Hrvoje Stefancic, Fabian Bremer. See more photos.