Explain how dead ends are handled in Page Rank.

 

A dead end is a Web Page with no links out. The presence of dead ends will cause the Page Rank of some or all the pages to go to 0 in the iterative computation, including pages that are not dead ends.
Dead ends can be eliminated before undertaking a Page Rank calculation by recursively dropping nodes with no arcs out. Note that dropping one node can cause another which linked only to it to become a dead end, so the process must be recursive.
Two approaches to deal with dead ends:

  1. We can drop the dead ends from the graph, and also drop their incoming arcs. Doing so may create more dead ends which also have to be dropped recursively. However eventually, we wind up with a strongly-connected component (SCC) none of whose nodes are dead ends. Recursive deletion of dead ends will remove parts of the out-components, tendrils, and tubes but leave the SCC and the in-component, as well as parts of any small isolated components.
  2. We can modify the process by which random surfers are assumed to move about the Web. This method which we refer to as 'taxation' also solves the problem of spider traps. Here, we modify the calculation of Page Rank by allowing each random suffer a small probability of teleporting to a random page, rather than following an out-link from their current page, The iterative step where we compute a new vector estimate of Page Rank v' from the current Page Rank estimate v and the transition matrix 'M' is,
    v'=βMv+(1β)en
    )Mv
    where β is the chosen constant usually in the range of 0.8 to 0.9,
    e is a vector of all 1's with the appropriate number of components,
    n is the number of nodes in the Web graph.
The term (1-β)e/n does not depend on the sum of the components of the vector v, there will always be some fraction of a surfer operating on the Web. That is, when there are dead ends, the sum of the components of v, may be less than 1, but it will never reach 0.