Real world graphs often span all the number of nodes and interactions between them. Large graph comparison are usually
computationally hard due to the NP completeness of the underlying subgraph isomorphism problem.
Thus graph comparisons rely on easily computable heuristics called graph properties.
Graph properties will give an overall view of the network but might not be detailed enough to capture complex
topological characteristic of large networks.
Here in this section, we review some of the most popular properties of the graph.
Later, you will learn alternative ways to compare and analyze graphs.
As you recall, the degree of a node
refers to the number of edges incident to the node. Now if we average the degrees over all nodes in the graph
we have a global measure for the whole system.
Let P(k) be the percentage of nodes of degree k in the network.
The degree distribution is the distribution of P(k) over all K.
Much of the recent research on the structure of biological networks and other real networks has focused on determining
the form of their degree distributions.
However, degree distributions are weak predictors of network structure.
For instance G1 and G2 are of the same size and the same degree distribution
but they may have very different
topologies.
Clustering coefficients were introduced by Watts and Strogatz in
1998 as a way to measure how close a node or vertex and its neighbors are from being a complete subgraph.
Formally, clustering coefficient Cv of a node v is equal to the number of edges in the neighborhood of v over the
maximum possible number of edges
in the neighborhood of v.
For vertex v of degree 0 or 1, by definition,
Cv equals to 0.
The clustering coefficient
for the entire system is the average of the clustering coefficient over all nodes in the network.
Clustering coefficient Cv is between 0 and 1 and can be viewed as the probability that two neighbors of v are
connected. In a highly clustered network, the neighbors of given nodes are very likely to be themselves linked by an edge.
Clustering spectrum is the distribution of the average of clustering coefficients of all nodes of degree k in the network
over all k.
Consider this adjacency matrix. The possible number of connections in this matrix is 6 and
the clustering coefficient for v1 and v2 is...
2/3. Node v3 and v4 each has a clustering coefficient of...
1/3. So the average of those four clustering coefficients is 0.5
Typically the first step in studying the clustering and modular properties of a network is to calculate its average
clustering coefficient and its distribution.
It has been shown that the clustering coefficients of a metabolic network was at least an order of magnitude
higher than that of the corresponding random network.
The distance between two nodes is the smallest number of links that have to be traveled to get from one node to the other.
The path that achieves that distance is called shortest path.
The average network diameter is the average of shortest path lengths over all pairs of nodes in a network.
So as an alternative way one can represent a graph using distance between its nodes called distance matrix,
where each element dij is the distance between node i and node j.
For instance, in this graph the shortest way to go from node 1 to node 5
travels along
three links,
so d1,5 equals 3. For the same graph, if we average over all
shortest path lengths, we have average network diameter.
In a sense, the average path length in a network is an indicator of how readily
information can be transmitted through it.
In biological networks,
it has been observed that only a small number of intermediate steps are necessary for any of protein gene
metabolites to influence the characteristic or behavior of another.
We often have reason to believe that elements at the center of every system or very important. The problem of
identifying the most important nodes in a large complex network is of fundamental importance in a number of application areas
including biology,
communication, sociology, and management.
Today, several measures have been devised for ranking the nodes in a complex network and quantifying their relative importance.
For instance
centrality measure has been used to predict centrality of a gene or protein based on network topology.
A gene or protein is said to be essential for an organism if an organism cannot survive without it.
In this section
we describe three standard centrality measures that capture a wide range of importance in a network: degree,
closeness, and betweenness.
The most intuitive notation of centrality focuses on degree.
Nodes with a large number of neighbors have high centrality. Therefore, we have decreased centrality
Cd(v) = deg(v).
Nodes with significantly higher degree are called hub nodes. The removal of these hub nodes has a far greater impact on the
topology and connectedness of the network than the removal of a node of low degree.
Degree centrality
however can be deceiving because it is a purely local measure.
A second measure of centrality is closeness centrality. A node is considered important if it is relatively close to all other node.
Closeness centrality is defined in terms of the geodesic distance between nodes in a graph or network.
Nodes with short paths to all other nodes in the network have high closeness centrality.
Closeness of a node formally defined as the inverse of the summation of
distance of each node to every other node in the network.
Last but not least, we introduce betweenness centrality.
Freeman in 1978
introduced the concept of betweenness centrality as a means of quantifying an
individual's influence within a social network.
The idea behind the centrality measure is an
important node will lay on a high proportion of a paths between other nodes in a graph.
Formally betweenness centrality counts the number of shortest paths between i and k that node j resides on.
Consider this graph.
As we can see based on different centrality measures, different notes are important.
D has the highest degree centrality. F and G have the highest closeness
centrality, and H has highest betweenness centrality.
Eigenvalues are a special set of scholars associated with a matrix.
The determination the eigenvalue and eigenvector of a system is extremely important in physics and engineering.
Each eigenvalue is paired with a corresponding so-called
eigenvector.
To obtain the eigenvalues, we should solve the equation a nu = lambda nu.
Lambda is eigenvalue and nu is eigenvector.
If we solve this equation for adjacency matrix of a graph then we have eigenvalue assigned to that graph.
The set of all such scholars is called graph spectrum or graph spectra.
However in physics the set of eigenvalues of the laplacian matrix of a graph is called graph spectrum.
Graph laplacian denoted by L is constructed by subtracting degree from adjacency matrix.
The smallest eigenvalue of L is 0. The corresponding eigenvector is the constant 1 vector.
And L has n non-negative real-valued
eigenvalues.
Two graphs are called cospectral if their adjacency matrices have equal multi set of eigenvalues.
Cospectral graphs do not need to be isomorphic but isomorphic graphs are always called cospectral.
Here is an example of two cospectral graph.
The final aspect of network structure which we shall discuss here is concerned with small topological patterns.
Milou and others showed that networks from diverse fields, biological and non-biological,
contains several small
topological patterns that are so frequent that is unlikely to occur by chance.
Moreover, different networks tend to have different sets of such frequent local structures.
These patterns refer to as networks motives are recognized as the simple building block of complex network.
An algorithm for finding n-node networks motives those as follows: first find all in node circuits in the real graph.
For example, there are 13 3-node subgraphs,
199 4-node sub graphs, and so on.
Then find all n-node subgraphs in a set of randomized graphs with the same distribution
of incoming and outgoing arrows.
Finally, assign p value probability for any one of the n-node subgraphs to occur more at random than in the real
graph.
So a network motive is a small
over-represented partial subgraph of real a network.
Here, over-represented means that it is
over-represented when compared to a network coming from a random graph model, but what is expected at random?
I mean which network null
model to use to identify
motives?
