Page Ranking Algorithms – A Comparison Part 2 - LinkPay India No.1 Trusted Link Shortener

Page Ranking Algorithms – A Comparison Part 2

B. Weighted Page Rank

The weighted page rank algorithm3 was proposed by Wenpu Xing and Ghorbani Ali. The weighted PageRank algorithm is an extension of the PageRank algorithm. Weighted Page Rank allocates a higher rank values to more significant pages instead of dividing the rank value of a page evenly among its outgoing linked web pages. Each outgoing link gets a value proportional to its significance.

A brief explanation of the weighted page rank algorithm is given below. In the weighted page rank algorithm, more important (popular) web pages are assigned larger page rank values.

The popularity of a web page depends on the number of its links and outlineson and each web page gets a proportional page rank value. The popularity of each page can be obtained using the in and out weights, as given below:

Here r (v) is the set of all Web pages that have links from node v (reference page list of page v). These weights depend on the number of links and outlines on page u and the sum of the number of links and outlines of all reference pages of page v, respectively. The initial page rank for each of the n Web pages is given by PR0 = (PR0 (1), PR0 (2),… PR0 (n)) and their value is set as 1. The formula for computing the weighted page rank of Web page v is given by

Where B (u) is the set of all web pages that point to u and d denotes the damping factor.

Advantages:

• The quality of the pages returned by this algorithm is high as compared to the PageRank algorithm.
• It is more efficient than PageRank because the rank value of a page is divided among its outline pages according to the importance of that page.

Disadvantages:

• As this algorithm considers only link structure not the content of the page, it returns less relevant pages to the user query.

C. HITS Algorithm

Kleinberg developed a WSM-based algorithm named Hyperlink-Induced Topic Search (HITS) which presumes that for every query given by the user, there is a set of authority pages that are relevant and accepted focusing on the query and a set of hub pages that contain useful links to relevant pages/sites including links to many authorities. Thus, a fine hub page for a subject points to many authoritative pages on that content, and a good authority page points to many fine hub pages on the same subject. Hubs and Authorities are shown in Fig. 5.

Kleinberg states that a page may be a good hub and a good authority at the same time. This spherical relationship leads to the definition of an iterative algorithm called HITS (Hyperlink Induced Topic Search). The HITS algorithm treats WWW as a directed graph G(V, E), where V is a set of Vertices representing pages and E is a set of edges that match up to links.

There are two major steps in the HITS algorithm. The first step is the Sampling Step and the second step is the Iterative Step. In the Sampling step, a set of relevant pages for the given query are collected i.e. a sub-graph S of G is retrieved which is high in influence pages. This algorithm starts with a root set R, and a set of S is obtained, keeping in mind that S is comparatively small, rich in relevant pages about the query, and contains most of the good authorities. The second step, Iterative step, finds hubs and authorities using the output of the sampling step using:

Where Hp is the hub weight, Ap is the Authority weight, and I(p) and B(p) denote the set of reference and referrer pages of page p. The page’s authority weight is proportional to the sum of the hub weights of pages that it links to it; similarly, a page’s hub weight is proportional to the sum of the influence weights of pages that it links to. Fig. 6 shows an example of the calculation of authority and hub scores.

Advantages of HITS

• HITS scores due to its ability to rank pages according to the query string, resulting in relevant authority and hub pages.
• The ranking may also be combined with other information retrieval-based rankings.
• HITS is sensitive to user queries (as compared to PageRank).
• Important pages are obtained based on calculated authority and hubs value.
• HITS is a general algorithm for calculating authority and hubs to rank the retrieved data.
• HITS induces Web graph by finding a a set of pages with a search on a given query string.
• Results demonstrate that HITS calculates authority nodes and hubness correctly.

Drawbacks of the HITS algorithm

• Since HITS is a query-dependent algorithm the query time evaluation is expensive.
• The rating or scores of authorities and hubs could rise due to flaws done by the web page designer. HITS assumes that when a user creates a web page he links a hyperlink from his page to another authority page, as he honestly believes that the authority page is in some way related to his page (hub).
• A situation may occur when a page that contains links to a large number of separate topics may receive a high hub rank that is not relevant to the given query. Though this page is not the most relevant source for any information, it still has a very high hub rank if it points to highly ranked authorities.
• HITS emphasizes mutual reinforcement between authority and hub web pages. A good hub is a page that points to many good authorities and a good authority is a page that is pointed to by many good hubs.
• Topic drift occurs when there are irrelevant pages in the root set and they are strongly connected. Since the root set itself contains non-relevant pages, this will reflect on to the pages in the base set. Also, the web graph constructed from the pages in the base set, will not have the most relevant nodes and as a result, the algorithm will not be able to find the highest-ranked authorities and hubs for a given query.
• HITS invokes a traditional search engine to obtain a set of pages relevant to it, expands this set with its inlinks and outlinks, and then attempts to find two types of pages, hubs (pages that point to many pages of high quality) and authorities (pages of high quality).

CONCLUSIONS

Web mining is the Data Mining technique that automatically discovers or extracts the information from web documents. Page Rank and Weighted Page Rank algorithms are used in Web Structure Mining to rank the relevant pages. The standard search engines usually result in a large number of pages in response to users' queries, while the user always desires to get the best in a petite time. The page ranking algorithms, which are an application of web mining, play a major character in making the user search navigation easier in the results of search engines. The PageRank and Weighted Page Rank algorithms give importance to links rather than the content of the pages, the HITS algorithm is anxieties about the content of web pages as well as links. Page Rank and Weighted Page Rank algorithms are used in Web Structure Mining to rank the relevant pages. After going through an exhaustive analysis of algorithms for ranking of web pages against the various parameters such as methodology, input parameters, relevancy of results, and importance of the outcome, it is concluded that on-hand techniques have limitations, particularly in terms of time response, accuracy of results, importance of the outcome and relevancy of results. An efficient web page ranking algorithm should meet out these challenges efficiently with compatibility with global principles of web technology.

Page Ranking Algorithms – A Comparison Part 2