Email Phishing attempts to fraudulently acquire personal information, such as your account password or credit card information. This is one of the types of Phishing.
Here, the Email may look like a legitimate source, but actually, it is not.
Typically, the blacklists block the IP address of the Email (SMTP) server, the sender domain, or even the whole e-mail address domain of a sender.
Once your information is obtained, hackers create new user credentials or install malware into your system to steal sensitive data.
In this article we have used three stage robust classification model.
Web servers automatically detect phishing messages and discover the impersonated entity in those messages.
Hence, the existing classifier algorithms are rescheduled as a multi-tier classification process to classify the phishing Emails and to find out the optimum scheduling.
Three Stage Classification Model for Email Phishing
There are three stages in the classification model discussed here to detect Email Phishing.
Stage-One Classifier for Email Phishing
The stage-one classifier validates the texts in the mail subject. It selects the texts, checks and verifies with the predefined keywords. Thus, it is either marked as legitimate or spam mail, based on the keyword match.
Then, the mails are moved to the spam or junk folder, if illegitimate. If it is found to be good, it is then passed to the stage-two classifier.
Stage-two Classifier Email Phishing
The mails are checked for their legitimacy in content. The content is checked for phishing keywords as well as the embedded images in it.
The outputs may either be a good mail or a spam mail. If invalid, it is moved to the spam or junk folder.
If legitimate, the outputs are fed as input to the stage-three classifier.
Stage-three Classifier for Email Phishing
This algorithm will classify the message with a label of either good or spam after validating the IP address. Hence, the IP address received checks in the blacklist of real-time site Spamhaus.org.
If the received mail is marked as spam, it moves to the spam or junk folder.
Otherwise, the output message of the algorithm will directly be sent to the inbox, as the mail is legitimate.
As many Emails can be detected for phishing as possible. The user accounts can be configured for any of the mail servers like Gmail and Yahoo.
For example, Gmail is to be configured as imap.gmail.com. User accounts which are to be detected for phishing can be many for the mail server configured.
The accounts for which the mails are to be detected are configured in the credentials.xml file. One needs to encode the user id and password and then update in the credentials.xml file, separated by a semicolon.
Also, the folder where the illegitimate mails are to be moved should be mentioned for each and every user account.
Thus, the folder name can be like Spam, Junk, or any user convenient name. The user credentials can be encoded for security reasons, using the encoder/decoder.exe file.
A logger file needs to be maintained. The logger is nothing but a console application. The console is used to display all the details of the mails checked.
Error messages like “unable to connect to Gmail host”, “invalid user-id or password” are shown in the console window.
If there is no new mail, the message “Email box is empty or no new mails” is displayed.
If the mail is invalid, the message “Mail with such subject is illegitimate and thus moved to spam” is fired in the log.
Hence, all the details for illegitimacy are grouped in the console. The details include mail subject/content, legitimacy check, and spam info.
The illegitimate Emails are highlighted in red colour. The legitimate mails are shown in default colour as shown in the below Pic,
Features considered for detecting Email Phishing
In the three-stage classifier, there are fifteen features that are considered for checking the subject header and the content of the subject. The fifteen features are listed below,
Phishing attacks can be found in Emails if the attacker inserts any forms or links to the compromised websites. Hence, the attacker may include scripts to create a popup and then load a form in that popup, to trick the user into entering sensitive data.
So, finding the presence of a popup suggests the possibility of the mail being an attempt to phish sensitive data.
(ii) Text “Verify Account”
If an Email is found to have the text “Verify Account”, “Verify Email”, ”Bank”, “Debit”, “fwd”, “reply”, “Click”, “Here ”, “login”, “update” or any of its variants, then it is worth checking the Email for further symptoms of phishing.
Thus, the presence of these texts does not necessarily indicate the presence of a phishing attempt, yet it is an easy way to lure people to click into malicious links.
Thus, it can be used to trick users in various ways.
(iv) onClick attribute for Email Phishing
The onClick attribute in an HTML element can be used to make an HTML element clickable, and redirect a user to another URL which is normally not possible.
(v) Change of window status for Email Phishing
(vi) IP address in URLs for Email Phishing
Some phishing attacks are hosted on PCs infected with Virus/Malware. It will attack through a phishing link. Hence, the only way to link to them is by using their IP address. Legitimate Email seldom uses links with an IP address.
A link is an Email whose host is an IP-address (E.g HTTP:// 101. 56.3.48/ login. facebook. com/login).
(vii) ReplyTo modification
The attacker may modify the ‘replyto’ field in the Email, with the Email address of the legitimate company, so that the user can reply back to the legitimate company, and thus not become suspicious about the sender’s identity.
Hence, checking if the sender address and the ‘reply to’ address are different is important. If they are from different domains, it will help in identifying phishing attempts.
(viii) Number of unique domains in URLs
The legitimate Emails contain links in only one or two domains. If the number is high, the Email is probably an attempt to phish user data from the receiver.
(ix) Number of words in Subject
Most legitimate E-mails have less than five to ten words in their subjects. Hence, the presence of a large number of words in the subject indicates the possibility of the Email is an attempt to phish sensitive data from the user.
(x) Richness of the Vocabulary
Phishing Emails normally contain the same words in a different form. This reduces the richness of the content. This can be calculated by the Type token ratio as shown in the equation.
Types: number of words
Tokens: Number of different word forms and characters
(xi) Number of Periods in URL
Legitimate URLs also can contain a number of dots, and this does not make it a phishing URL. This feature is simply the maximum number of periods (‘.’) contained in any of the links present in the Email and is a continuous feature.
(xii) Link in Image
By linking an image with a URL, many of the deceptions seen in phishing attacks are possible. For a phisher to launch an attack with a plain link is difficult, because the user is less likely to click a link with a plain text.
But the attacker can lure the user with attractive images to click it, and thus the attacker can redirect the user to a phishing site.
(xiii) Number of hyperlinks
It denotes the total number of hyperlinks which are available in the content.
(xiv) Cascading Style Sheet (CSS)
It denotes the CSS applied in the content of the message.
(xv) Number of words in subject with at least fifteen characters
These are the features X= <F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15> which can be used to differentiate between phishing and legitimate web pages.
These are the major eight features that need to be considered for E-Mail phishing detection.
Email Phishing is the major challenging factor nowadays. Hence, the end-user needs to be very careful when receiving Emails from an unknown user.
Here, You can find the related resources,