Phishing Emails- How do you detect? Here is the Solution

Email Phishing attempts to fraudulently acquire personal information, such as your account password or credit card information. This is one of the types of Phishing.

Here, the Email may look like a legitimate source, but actually, it is not.

Typically, the blacklists block the IP address of the Email (SMTP) server, the sender domain, or even the whole e-mail address domain of a sender.

Once your information is obtained, hackers create new user credentials or install malware into your system to steal sensitive data.

In this article we have used three stage robust classification model. 

Web servers automatically detect phishing messages and discover the impersonated entity in those messages.

Hence, the existing classifier algorithms are rescheduled as a multi-tier classification process to classify the phishing Emails and to find out the optimum scheduling.

Three Stage Classification Model for Email Phishing

Email Phishing

There are three stages in the classification model discussed here to detect Email Phishing.

Stage-One Classifier for Email Phishing

The stage-one classifier validates the texts in the mail subject. It selects the texts, checks and verifies with the predefined keywords. Thus, it is either marked as legitimate or spam mail, based on the keyword match.

Then, the mails are moved to the spam or junk folder, if illegitimate. If it is found to be good, it is then passed to the stage-two classifier.

Stage-two Classifier Email Phishing

The mails are checked for their legitimacy in content. The content is checked for phishing keywords as well as the embedded images in it.

The outputs may either be a good mail or a spam mail. If invalid, it is moved to the spam or junk folder.

If legitimate, the outputs are fed as input to the stage-three classifier.

Stage-three Classifier for Email Phishing

This algorithm will classify the message with a label of either good or spam after validating the IP address. Hence, the IP address received checks in the blacklist of real-time site

If the received mail is marked as spam, it moves to the spam or junk folder.

Otherwise, the output message of the algorithm will directly be sent to the inbox, as the mail is legitimate.

As many Emails can be detected for phishing as possible. The user accounts can be configured for any of the mail servers like Gmail and Yahoo.

For example, Gmail is to be configured as User accounts which are to be detected for phishing can be many for the mail server configured.

The accounts for which the mails are to be detected are configured in the credentials.xml file. One needs to encode the user id and password and then update in the credentials.xml file, separated by a semicolon.

Also, the folder where the illegitimate mails are to be moved should be mentioned for each and every user account.

Thus, the folder name can be like Spam, Junk, or any user convenient name. The user credentials can be encoded for security reasons, using the encoder/decoder.exe file.

Logger File

A logger file needs to be maintained. The logger is nothing but a console application. The console is used to display all the details of the mails checked.

Error messages like “unable to connect to Gmail host”, “invalid user-id or password” are shown in the console window.

If there is no new mail, the message “Email box is empty or no new mails” is displayed.

If the mail is invalid, the message “Mail with such subject is illegitimate and thus moved to spam” is fired in the log.

Hence, all the details for illegitimacy are grouped in the console. The details include mail subject/content, legitimacy check, and spam info.

The illegitimate Emails are highlighted in red colour. The legitimate mails are shown in default colour as shown in the below Pic,

Email Phishing_1

Features considered for detecting Email Phishing

In the three-stage classifier, there are fifteen features that are considered for checking the subject header and the content of the subject. The fifteen features are listed below,

(i) Popup

Phishing attacks can be found in Emails if the attacker inserts any forms or links to the compromised websites.  Hence, the attacker may include scripts to create a popup and then load a form in that popup, to trick the user into entering sensitive data.

So, finding the presence of a popup suggests the possibility of the mail being an attempt to phish sensitive data.

(ii) Text “Verify Account”

If an Email is found to have the text “Verify Account”, “Verify Email”, ”Bank”, “Debit”, “fwd”, “reply”, “Click”, “Here ”, “login”, “update” or any of its variants, then it is worth checking the Email for further symptoms of phishing.

Thus, the presence of these texts does not necessarily indicate the presence of a phishing attempt, yet it is an easy way to lure people to click into malicious links.

(iii) Javascript

Javascript is normally used to validate forms in websites. Its presence in an Email indicates that it is likely to be a malicious Email because javascript can be used to change the text of a document.

Thus, it can be used to trick users in various ways.

(iv) onClick attribute for Email Phishing

The onClick attribute in an HTML element can be used to make an HTML element clickable, and redirect a user to another URL which is normally not possible.

(v)  Change of window status for Email Phishing

The status of the browser page can be changed by using the window.object.status function in javascript. This can be used to provide the user with false information like load contents from other websites while showing the legitimate website’s address in the status bar.

(vi) IP address in URLs for Email Phishing

Some phishing attacks are hosted on PCs infected with Virus/Malware. It will attack through a phishing link. Hence, the only way to link to them is by using their IP address. Legitimate Email seldom uses links with an IP address.

A link is an Email whose host is an IP-address (E.g HTTP:// 101. 56.3.48/ login. facebook. com/login).

(vii) ReplyTo modification

The attacker may modify the ‘replyto’ field in the Email, with the Email address of the legitimate company, so that the user can reply back to the legitimate company, and thus not become suspicious about the sender’s identity.

Hence, checking if the sender address and the ‘reply to’ address are different is important. If they are from different domains, it will help in identifying phishing attempts.

(viii) Number of unique domains in URLs

The legitimate Emails contain links in only one or two domains. If the number is high, the Email is probably an attempt to phish user data from the receiver. 

(ix) Number of words in Subject

Most legitimate E-mails have less than five to ten words in their subjects. Hence, the presence of a large number of words in the subject indicates the possibility of the Email is an attempt to phish sensitive data from the user. 

(x) Richness of the Vocabulary

Phishing Emails normally contain the same words in a different form. This reduces the richness of the content. This can be calculated by the Type token ratio as shown in the equation.

Types: number of words

Tokens: Number of different word forms and characters

(xi) Number of Periods in URL

Legitimate URLs also can contain a number of dots, and this does not make it a phishing URL. This feature is simply the maximum number of periods (‘.’) contained in any of the links present in the Email and is a continuous feature.

(xii) Link in Image

By linking an image with a URL, many of the deceptions seen in phishing attacks are possible. For a phisher to launch an attack with a plain link is difficult, because the user is less likely to click a link with a plain text.

But the attacker can lure the user with attractive images to click it, and thus the attacker can redirect the user to a phishing site.

(xiii) Number of hyperlinks

It denotes the total number of hyperlinks which are available in the content.

(xiv) Cascading Style Sheet (CSS)

It denotes the CSS applied in the content of the message.

(xv) Number of words in subject with at least fifteen characters

These are the features X= <F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15> which can be used to differentiate between phishing and legitimate web pages. 

These are the major eight features that need to be considered for E-Mail phishing detection. 

Final Words

Email Phishing is the major challenging factor nowadays. Hence, the end-user needs to be very careful when receiving Emails from an unknown user.

Here, You can find the related resources,