# Q.1. iii. regression algorithm · When to use

Q.1. How will you choose the right algorithm? (with justification)

-It
mainly depends on what type of problem is to be solved for that problem
statement carries a major role.

We Will Write a Custom Essay Specifically
For You For Only \$13.90/page!

order now

-For
a problem statement to be created input and output of the problem to be solved
need to be present, Also the problem statement should not be jumbled up it
should be up to the mark

-Once
problem statement is identified below mentioned algorithms can be choosing

i.
Classification Algorithm

ii.
Clustering algorithm

iii.
regression algorithm

·
When to use classification

It
depends whether your problem statement is to divide or partition certain things
in certain classes

for
ex (he is fat or this).

Classification
is a Supervised learning

Example
of Classification Algorithm:

i.
K – Nearest neighbour

ii.
Decision Trees

iii.
Bayesian Classifier

·
When to use clustering algo

when
dividing a large data set in a clusters i.e. in a groups is to be performed
clustering algorithms are used

Clustering
is an unsupervised learning

Example
of Clustering Algorithm:

i.
K- Means Algorithm

ii.
Expectation maximization

·
When to use regression algo

When
the data set provided is a numerical value and the output to be predicted is
also a numerical value the regression can be applied.

Regression
is supervised learning

Example
of Regression Algorithm:

i.
Linear Regression Algorithm

ii.
Logistic Regression Algorithm

iii.
Polynomial Regression Algorithm etc.

Q.2. Write in detail about the different steps that you will perform to
develop the machine learning application for the given scenarios

1.      For
Spam Detection

2.      Recommendations

3.      Stock
market prediction

4.      Automated
breaking

1.
For Spam Detection

– Since in Spam discovery we need to group whether spam or not
spam thus characterization calculation is to be connected

The spam discovery calculation will include five stages:

i.

ii.
Pre-processing,

iii.
Extracting the highlights,

iv.
Training the classifier, and

v.
Evaluating the classifier.

A machine learning framework works in two modes: preparing and
testing.

Preparing: Amid preparing, the machine learning framework is given named
information

from a preparation informational index. For egg, the named preparing
information

is an extensive arrangement of messages that are named spam or not
spam (ham)? Amid

the preparation procedure, the classifier (some portion of the
machine learning framework that

all things considered predicts marks of future messages) gains
from the preparation information by deciding the associations between the
highlights of an email and its mark.

Testing: Amid testing, the machine learning framework is given unlabelled
information. For egg,

this information are messages without the spam/ham mark.
Contingent upon the highlights of an

email, the classifier predicts whether the email is spam or ham.
This grouping

is contrasted with the genuine estimation of spam/ham to quantify
execution.

Stage2: Preprocessing:

Before nourishing the messages to our classifiers, we have to
pre-process the messages. The objective is to make an element lattice with
lines being the email and segments being the highlights. Subsequent to
expelling HTML labels and extricating the pertinent content, extra
pre-preparing must be done to make the element framework.

After preparatory pre-handling (expelling HTML labels and headers
from the email in the informational collection), we make the accompanying
strides:

Tokenize – We make “tokens” from each word in the email by
expelling accentuation.

Expel trivial words – The content in red squares are stop-words,
which ought to be evacuated. Stop-words don’t give important data to the
classifier, and they increment dimensionality of highlight grid.
Notwithstanding numerous stop-words, we expelled words more than 12 characters
and words under three characters.

Stem – The content in blue circle is changed over to its
“stem”. Comparative words are changed over to its stem with a
specific end goal to frame a superior component grid. This enables words with
comparative implications to be dealt with the same. For instance, history,
histories, noteworthy will be viewed as same word in the component network.
Each stem is set into our “pack of words”, which is only a rundown of
each stem utilized as a part of the dataset.

Make include network – After making the “sack of words”
from the greater part of the stems, we make an element framework. The component
grid is made with the end goal that the passage in push I and section j is the
circumstances that token j happens in email I.

Stage 3: Extracting the highlights

When content is pre-handled, you can remove the highlights
describing spam and ham messages. The main thing to see is that a few words,
for example, “the”, “is” or “of” show up in all
messages and don’t have much substance to them. These words are not going to
enable you to recognize spam from ham. Such words are called stop words and
they can be slighted amid arrangement.

To extricate the highlights – words that can tell the program
whether the email is spam or ham – you’ll have to do the accompanying:

i.     Read in the content of the email.

ii.     Pre-process it utilizing the capacity pre-process characterized
previously.

iii.     For each word that isn’t in the stop word list, either

iv.     calculate how every now and again it happens in the content, or
just

v.     register the way that the word happens in the email.

vi.     The previous approach is known as the pack of-words (bow), and it
enables the classifier to see that specific watchwords may happen in the two
kinds of messages however with various frequencies.

Stage 4: Training a classifier

Since the information is in the right organization, you can part
it into a preparation set that will be utilized to prepare the classifier, and
a test set that will be utilized to assess it. Normally, the information is
part utilizing 80% for preparing and the other 20% for testing.

Stage 5: Evaluating your classifier execution

Checking whether your classifier is completing a great job at
distinguishing spam or not. This is the Last advance and decides execution of

2.
Steps
for Recommendations

i.
Define problem statements

ii.

iii.
Generate a popularity model

iv.
Proceed with collaborative filtering model

v.
Evaluate engine

Step
1-Define problem statement based on type of recommendation engine needed to
build.

Step
2- There are multiple datasets on web which someone can use during evaluation
step. Depending on you model and the auxiliary information used (tags,
timestamps, ratings, etc.) you should choose the best dataset close to your
needs.

Step3-
popularity based model, i.e. the one where all the users have same
recommendation based on the most popular choices.

Step4-
The core idea works in 2 steps:

Find
similar items by using a similarity metric

For
a user, recommend the items most similar to the items (s)he already likes

To
give you a high level overview, this is done by making an item-item matrix in
which we keep a record of the pair of items which were rated together.

In
this case, an item is a movie. Once we have the matrix, we use it to determine
the best recommendations for a user based on the movies he has already rated.
Note that there a few more things to take care in actual implementation which
would require deeper mathematical introspection, which I’ll skip for now.

Step
5-Checking whether your model works properly ad correctly or not.

3.      Stock
prediction

i.
Define problem statements

ii.
Data
Preparation

iii.
Train /
Test Split

iv.
Since we
always want to predict the future, we take the latest 10% of data as the test
data.

v.
Normalization

vi.
Model
Construction

Step
1-Define problem statement based on type engine needed to build.

Step2- The stock prices are a time series of length, defined as
in which is the close price on day, imagine that we have a sliding window of a
fixed size (later, we refer to this as input_size) and every time we move the
window to the right by size, so that there is no overlap between data in all
the sliding windows.

Step3- The S 500 index increases in time, bringing about
the problem that most values in the test set are out of the scale of the train
set and thus the model has to predict some numbers it has never seen before. Sadly,
and unsurprisingly, it does a tragic job to solve the out-of-scale issue, I
normalize the prices in each sliding window. The task becomes predicting the
relative change rates instead of the absolute values. In a normalized sliding
window at time, all the values are divided by the last unknown price—the last
price in:

Step4- The training requires max_epoch epochs in total; an epoch
is a single full pass of all the training data points. In one epoch, the
training data points are split into mini-batches of size

4.      Automated
breaking

i.
Define problem statements

ii.

iii.
Generate a popularity model

iv.
Proceed with collaborative filtering model

v.
Evaluate engine

Step 1-Problem statement should be according
to the application needed to be built

Step2-Load the training data of the
application u want to build.Every varied application consist of varied data
sets

Step3-generate a general model where common
data sets needs to be compared and build.

Step4-Once general model is build proceed
with A more deep model.

Step5-After completion of the engine evaluate
i.e. analyse ur engine whether it is up to the mark or not.