Parse Tree Validation Phishing technique is the key technique in Cyber Security for detecting the targets of hackers.
In the parser phase, parsing starts from the root node and follows the Depth-First Search (DFS) algorithm to check whether any child node has the same value as the root node.
How to find legitimate or Phishing website?
Based on the repetition of the root node, the probability of occurrence of the root node is calculated for the phishing and legitimate website.
To calculate the probability value, first, there is a need to calculate the total number of nodes in a tree.
If the number of repetitions of the root node is greater than half of the total number of nodes, the probability of legitimacy is high.
If the number of repetitions of the root node is equal to the half of the number of nodes, then the probability of the legitimacy is medium.
Otherwise, If the number of repetitions of the root node is less than that, the probability of phishing is high, and that particular website will be suspected as a phishing one.
Legitimate Website – Targets of Hackers
The tree constructed for the legitimate website Onlinesbi.com is shown in the picture. Onlinesbi.com is the root node, and the next levels are the top ten results related to onlinesbi.com.
In this example, the total number of nodes is 56, and the root node Onlinesbi.com is repeated 28 times.
So the probability of the occurrence of repeating the root node for the legitimate website is high. Hence, it is concluded that Onlinesbi.com is a legitimate website.
The next website considered is paypai.com. The tree construction for the paypai.com is shown in the picture.
Here the total number of nodes is 54. But the root node papypai.com has not repeated anywhere in the inner hyperlinks. So, the probability of the phishing website is high.
Moreover, the original website paypal.com occurred three times. The www.paypal.com checks with the whitelist and is available in the whitelist. So paypal.com is the phishing target of paypai.com.
Discovery of targets of hackers
The advantage of the tree construction method is not only to detect the phishing website but also to detect the targets of hackers. Thus, to find the hackers target, some steps have been followed.
Each node will be matched with the Whitelist; if any node matched with the whitelist node then that node will be the phishing target website.
The second step is, to check if any of the child nodes doesn’t have the value of root node, but most of the child nodes will point to one particular node and that is the phishing target.
For example, for paypai.com, no child node has the value paypai.com, but most of the nodes will have the value paypal.com.
Hence, the phishing target of paypai.com is paypal.com.
If any of the above two steps have been matched, then the text matching has to be done between the phishing target and the root node; i.e the phishing site.
In the above example, the matching is between the paypai.com and paypal.com.
Text matching in the discovery of phishing target
A phishing webpage usually uses similar or even the same text content to its target page in order to cheat the visitors.
If the text on a suspicious webpage is very similar to that on an associated well-known webpage, but the domain name of these two webpages is different, it is a phishing webpage.
|google.com.net||Phishing URL||Not Allowed|
|paypai.com||Phishing URL||Not Allowed|
|convert.money.net||Having phishing hyper link||Not Allowed|
|184.108.40.206||Phishing IP in hyper link||Not Allowed|
|Facebook-cgi.com||Phishing String||Not Allowed|
|wordnet.com/index/www.skfi.net/root/login.jsp||Fake URL in the URL path||Not Allowed|
|gaccount.com/upate/rDir=http://www.update.net/login.jsp||Redirect the link into some other website||Not Allowed|
|onlineindianbank.net||Phishing look like a legitimate||NotAllowed|
Here you can see some of the sample phishing data,
Here, we have discussed the parse tree validation for finding the phishing or legitimate website. It is also helpful for finding the targets of hackers. Hence, here we can easily say that Hackers are trying to phish which website.
Here’s a few links of favorite resources on this topic..