The client-side phishing model or client-side phishing filtering techniques includes URL verification, parse tree validation and the behavioural model approach.
1. URL based client side phishing filtering Techniques
URL plays the main role in client side phishing.
Faked URLs have used to change the original sites into phishing ones. The figure shows an example of a phishing page. In this figure, the URL used is http://www.paypal.login.hu, but the original URL is www.paypal.com. This is taken from netcraft.com.
The first client-side phishing filtering technique is URL verification. Hence, the URL verification implements and tests on a set of URLs.
This technique includes the URL validator, blacklist verifier, and hyperlink validator.
In a URL validator when a user requests a URL for the address of the browser, the requested URL intercepts and tokenizes for structure verification.
If the tokenized structure consists of a not standard part of the URL, then the request is not for the legitimate website.
But http://www.paypai.com is a URL which is in the standard structure, but not a legitimate URL. This type of phishing site would be prevented by the blacklist URLs. The blacklist database used here is phishtank.com.
The user requested URLs are matched against the blacklist URLs based on the domain name. With reference to the URL, the entire source code of the web page reads. The hyperlinks are available in that exact page intercepts.
The hyperlinks compose and the suspected ones would be parsed and examined, based on the scheme, domain name, path and fragment identifier, to create an XML (Extended Markup Language) file, since all the links are standard in structure.
The XML file is standard in structure and describes the scheme called a protocol, the actual domain with a visible link, and the corresponding path and parameter for the actual domain.
The generated XML file validates against the XML schema. If the XML file gets parsed successfully, then the hyperlink is not suspicious.
Otherwise, the hyperlink leads to a phishing website. The well-structured phishing URLs, and those not available in the blacklist, are not identified as phishing URLs.
Moreover, the main hyperlinks collect and evaluate but not all the inner level ones.
Hence, the second method of client-side phishing filtering technique can be used.
2. Client-side phishing filtering technique using Parse Tree Validation
The domain name is the keyword to confirm whether the requested site is a phished one or not.
Hence, the domain name from the URL places in the google search engine, by using the Google API. The top ten results identify based on the Google Page Ranking algorithm.
These are assumed to be more relevant to the domain name. The parse tree has to be constructed based on the hyperlinks.
The domain name is to be the root node for the parse tree. So, the first ten results and all the internal hyperlinks of each result uses to establish the tree from the root node.
One of the best methods for finding any child node has the same value as the root node is the Depth-First Algorithm (DFA).
If the root node appears more than n times, then the probability of a legitimate website is high. If it is from 1 to n times, the probability is medium, and if it is less than one, the probability of a phishing website is high.
During the tree traversal the phishing target identifies. If the text on a suspicious webpage is very similar to that on the targeted webpage.
But the domain names of those two web pages are different. It is highly possible that this suspicious webpage is a phishing webpage.
The analytical results confirm that this client-side phishing filtering technique achieves a better performance.
The above two client-side phishing filtering techniques do not validate either the behavioural response of the user or the forms.
So, to reduce the false-negative rate, the third method of client-side phishing filtering technique uses the behavioural response of the user.
3. Behavioural Model for client-side Phishing Techniques
The third client-side phishing filtering technique, the behavioural response approach, develops based on the heuristics techniques.
The DOM parser initially parses all the content from the website, retrieves all the form fields and redirected links from the website.
Stores them in the database. The random inputs are supplied to the form fields, to find the maximum error occurs.
There are three types of heuristics identified, specifically,
- Link analysis heuristics
- Decision-based heuristics
- Content-based heuristics
Heuristics recognize the patterns in the URL or page equivalent to the suspicious websites. Each heuristic recognizes based on some features of the website.
Eight features categorize as features for the machine learning algorithm, to classify the websites as legitimate or phishing.
Supervised learning analyzes the training data to build a predictor model that generates reasonable predictions for the response to new data.
In the third client-side phishing filtering technique, all the websites are correctly identifying and getting a zero false-negative rate.
All three client-side phishing filtering techniques evaluate their performance. Thus, the analytical results prove that these techniques provide better performance for identifying client side
Here’s a few links of favorite resources on this topic..