Data Analysis Using Hadoop
analyze the Facebook data using Hadoop for the purpose of better decision
making in the business.
Smart phone without social
media usage in daily lifestyle of people is unthinkable. That much effect has
been creaed in the lifestyle of people by smartphone and social media. There are
many social media such as Facebook, Twitter, etc., As per 2017 statistics, nearly
1.37 billion daily active users for Facebook. Every user contributes some type
of data to in structured or semi-structured or unstructured data format. Business
owners wants this data to understand customer
need and their behaviour to make profit in their busiess. Facebook data
analysis is the process of collecting, analysing facebook data and visualizing
extracted results to the end user.
The user data is
collected from Facebook based on their activities. User behavior, number of
likes, number of posts, type of posts, their comments, etc. are stored by the
database server. Comments by the user in unstructured formats, while other data
in structured and semi-structured format. Petabytes of data is generated by
Facebook users. So Hadoop, MapReduce and related big data concepts used in this
project to analyze the data.
The proposed system focuses on analyzing sentiments of Facebook
users using Hadoop. The user sentiments collected are categorized into
positive, negative, neutral. The proposed system architecture is
shown in the figure.
Collect Data based on Query
Proposed System Architecture
1: Facebook API
Facebook API is used as an
authentication API to extract the user contents related to the query requested.
2: Data Pre-Processing
Data Collection: The data are collected
from Facebook using Hadoop through the Facebook API based on the requested
Data Preparation: The collected data consists
of different emotions, stop words, acronyms, etc. But during analysis this type
of data needs to be converted into the proper format to extract sentiments from
the user behavior.
Stop Words Dictionary
Consider one of the Facebook posts
regarding new mobile features. Users opinion about the new phone might be positive
or negative or neutral.
Example for Positive Sentiment:
Looks are awesome. Battery backup is excellent.
Camera is good. The display light quality is good.
Example for Neutral Sentiment:
Although this is good mobile, looks
good, but Problem is that it doesn’t provide separate Space for dual SIM &
memory card together.
Example for Negative Sentiment:
Not good one as expected. Camera quality
Comments extracted from Facebook are divided
into tokens. This is known as tokenization process. For example, ‘Looks are
awesome. Battery backup is excellent. Camera is good. The display light quality
is good.’ is divided down into ‘Looks’, ‘are’, ‘awesome’, ‘.’, ‘Battery’, ‘backup’,
‘is’, ‘excellent’, ‘.’, ‘Camera’, ‘is’, ‘good’, ‘.’, ‘The’, ‘display’, ‘light’,
‘quality’, ‘is’, ‘good’, ‘.’
Acronym Dictionary: It
is used to give the required acronym for the words, if needed.
Stop Words Dictionary: It
is used to remove the unrelated words in the sentiment analysis process.
Example: A, An, The, Has, Are, Is.
Emoticon: This is used to detect
the emoticons for the purpose of classifying the comment as positive or
negative or neutral.
3: Sentiment Analysis
The user sentiments collected from the Facebook
are categorized into positive, negative, neutral. This sentiment analysis can
be performed for different purposes based on the business objectives.
4: Data Visualization
After the Facebook sentiment analysis,
the extracted and analyzed sentiments are visualized using Tableau.
ü Linux OS
ü Hadoop & MapReduce
ü Facebook API
ü Hard Disk – 1 TB or Above
ü RAM required – 4 GB or Above
ü Processor – Core i3 or Above
ü Big Data – Hadoop