Spam profile on Twitter detection is the main feature in Online Social Network. Millions of people use Twitter around the world to communicate with friends and relatives.
Online social networks succeed in building a network of trust. Hence, this trust exploits by spammers who spread spam messages which promote personal blogs, advertisements, phishing and scam.
Spamming is the method of sending unsolicited bulk messages especially advertisements, indiscriminately sharing information through URL shortening is an important feature of Online Social Network.
Twitter plays an important role in today’s online social networking scenario. It has about a billion registered users. Over 500 million tweets are sent on average every day. The number of features uses to detect spam profiles.
Twitter data and characteristics of Spam Profile detection.
A set of profiles has to collect from the Twitter social network which included manually classified benign and spam profiles. The maximum number of tweets considers for analysis.
The main characteristics of twitter from which specific features can be identified and collected include Tweets: These are messages using which information determines by sharing a link or writing messages not > 140 characters, @mentions: This feature uses to address someone.
Hashtags: the popular topics in Tweets are hashtags. The topic is preceded by a hash (#) and hence its gets the name.
URLs: URLs can be shared in Tweets.
Spam Profile Detection based on Features
The classification of spam or legitimate profile takes several steps based on its features consideration. The main characteristics of the features that can help in branding a profile as malicious or legitimate.
These features which help in characterizing a profile identifies in the Twitter network and the details crawls using the HTML Parser.
The statistical features have different kinds of values which typically indicate whether the profile is spam or not.
What are the Features you are going to consider?
- URL related features
- Age related features
- @mention related features
- Interaction related features
The followers and the following considers in interaction related features. This gives the knowledge of friendship details of the users and the popularity of the user with other users on Twitter.
Thus, a large number of following and a small number of followers highlights the suspicious act of the account.
URL related features should be consider based on the API ratio. Thus, higher API URL ratio of an account implies that this account’s tweets sent from API are most likely to contain URLs, making this account more suspicious.
The number of unique URLs will be very small for a malicious profile. This feature along with the total number of URLs shared by profile models the malicious behaviour of the profile.
A spammer will have a small number of URLs which spread spam content. These spam URLs circulates at large by the malicious profiles.
Age-related features mean the age of the particular Twitter account. Most of the spammers create twitter for a short span of time.
A large number of @mention indicates that the user interacts with a large number of people many times. Legitimate users may not do this as they communicate only with a small number of people and address only a small number of people. Thus, a large value illustrates that the profile is malicious.
In this system, we have used 550 profiles, of which 297 identifies as legitimate and 253 identifies as malicious.
The detecting spam profiles have to be blacklisted. Thus spam profiles can be blacklisted and removed hereby preventing harm to other accounts.