An Analysis of Crop Yield Prediction by Data

An Analysis of Crop Yield Prediction
 by  Data Mining Techniques using optimal

VARA LAKSHMI G , Associate Professor, Aurora’s Technological
and Research Institute, Dept of CSE

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


ABSTRACT: Agricultural sector is the emerging area for research as in
Indian economy agriculture plays a vital role. Moving towards digitalization of
agriculture sector. .so the government and researchers  are more focused on how to improve the crop
yield  using latest technologies like Big
Data Analytics , IOT and considering optimal parameters for predicting.we are
using different Data Mining Techniques to extract the patterns to predict
future crop production. Here we have observed the most important parameter that
is type of cultivation like Traditional cultivation, SRI cultivation and God
Rice cultivation has to be considered as one of the optimal parameter for yield
prediction .

KEYWORDS: Big Data, IOT, Chameleon, Random Forest
,Support vector Machine, Regression




Many Countries are applying latest technologies
like Big Data and IOT to digitalize the agricultural data and applying advance
data mining techniques on dataset. The reduced production of food material and higher cost of
food products mostly due to deficient in primary nutrients of the agricultural
soil in India. In present time soil fertility is on a verge of decreasing trend
due to use of Fertilizers, Pesticides, Insecticides, Salinity, Unscientific Cultivation
and urbanization lot of research is being done with the help of bioinformatics,
biotechnology and data analytics.


We apply technology to increase
production in parallel to have less impact on Soil Fertility, Air Pollution, Water
Pollution and Health of people who consume the Food products



Mining Techniques for yield Prediction


Clustering is an un-supervised
learning. Clustering techniques can be categorized into Partitioning method,
Hierarchical method have 2 approaches they are Agglomerative and Divisive,
Density based methods, Grid-based methods and Model based clustering methods.


Classification techniques for finding
knowledge that are rule Based Classifiers, Bayesian Networks, Nearest
Neighbour, Support Vector Machine, Decision Tree, Artificial Neural Network, Rough
Sets, Fuzzy Logic and Genetic Algorithms.




paper1 focuses on  finding optimal
parameters like year, State-Karnataka (28 districts), District, crop (cotton,
groundnut, jowar, rice and wheat), season (kharif, rabi, summer), area (in
hectares), production (in tonnes), average temperature (°C), Average Rainfall
(mm), Soil, PH value, Major Fertilizers, Nitrogen (kg/Ha), Phosphorus
(Kg/Ha),Potassium(Kg/Ha), Minimum Rainfall , Minimum Temperature  to maximize the crop production using data
mining techniques like DBSCAN method is used to cluster the data based on
districts which are having similar temperature, rain fall and soil type. To
cluster the data based on the districts which are producing maximum crop
production PAM and CLARA are used . They compared PAM, CLARA and DBSCAN using
the external quality metrics like Purity, Homogenity, Completeness, V Measure,
Rand Index, Precision, Recall and F measure. To predict the annual crop yield
they used Multiple linear regression method



  2In this paper they
have collected the agricultural data from several sources such as and The dataset ranges from year 2005 to
2013 of rice production. They considered 30 districts in Karnataka state and
1200 rows of data and 18 parameters. The Input Dataset consist of 9 year data
with following parameters Year, State-Karnataka (30 districts),District , Crop
(Rice) , Area (in Hectares) , Production (in Tonnes), Yield, Average Rainfall
(mm), Soil, canals, wells,water(Cusec), Nitrogen(kg/Ha),
Phosphorus(Kg/Ha),Potassium(Kg/Ha), Actual Rainfall , Zone and Insecticides.

used data mining techniques like Chameleon using 2 phase algorithm derived the
best soil required by rice and soil fertility improvisation, Random forest they
derived, for the available water and fertilizers what kind of yield is expected,
multiple regression technique for the available set of selected multiple
parameters what kind of yield can be expected, an increase of parameters how
the yield can maximized and logistic regression summarizes how yield is
affected by different parameters like water, nitrogen, phosphorous, potassium
through different plots.


this paper authors to predict the crop production they used data mining
techniques like Multiple Linear Regression (MLR) and Density-based Clustering
Technique and results so obtained were verified and analyzed. The data they
used collected from 1955 to 2009 for East Godavari district considering 8
parameters year, Rainfall, Area of Sowing, yield, Fertilizers(Nitrogen,
Potassium and Phosphorus) and Production.


 4 In this paper a Frame work  was proposed 
i.e., The data in the form of pictures can be captured through our smart
phones can be sent to the bank. The Agricultural bank contains necessary tools
to analyze the data and within a short period, the farmer gets the solution to
the problems like Pesticide Usage , Seed Usage , Crop Diagnosis ,Temperature
and climate , Loan  , Rain fall .


In this paper an architecture was proposed named as  crop yield prediction model consist of Input
module contains crop name, land area, soil type, soil pH, pest details , weather,
water level, seed type .The Feature selection module to select subset of
attributes from crop details. The Crop yield prediction model used to predict
plant growth, plant diseases. This presents new research possibilities for the application
of new classification methodologies to the problem of yield prediction.


study on the effect of temperature and Rainfall on agricultural production of
rice has been done prior. In different cropping seasons Bangladesh offers
several varieties of rice 6. They have taken Temperature and Rainfall and
performed regression analysis. Temperature plays a vital role on the crop
production.The data has been taken from the “Bangladesh Agricultural Research
Council (BARC)” for past 20 years with 7 attributes: rainfall, max and min
temperature, sunlight, speed of wind, humidity and cloud-coverage. The complete
dataset was divided in 3 month duration phases (March to June, July to October,
November to February) during pre-processing. This pre-processing has been done
for each kind of rice variety. For this duration, the average for every attribute
has been taken and associated with it.


In this work the data mining techniques used are Support vector machine which
is a black box technique used for classification
and prediction, SVM combines the concept of regression as well as clustering
and Artificial
neural network . The data used in this paper has been from, ISRIC-World Soil
Information, They have
more than 50 attributes, out of which we selected 10 attributes as follows:
Depth, pH, Organic Carbon, Available Nitrogen, Available Phosphorus, Available
Potassium, Porosity and Water Holding Capacity. In case of ANN we achieved highest
performance of 55 percent with 7 hidden nodes.SVM is applied with three
different kernel, polynomial, Radial Basis and Hyperbolic tangent.SVM we have
achieved much better results with 74% using Radial basis kernel.


this work, they designed an algorithm to select a sequence of crops over the
season to achieve net yield rate of crops using crop selection method(CSM).In
this they have taken the name of the crop, sowing period, Harvesting period,
Growing days or Plantation days and Predicted yield rate influenced by
different parameters. This method results in high performance and accuracy when
predicted values are accurate.


C4.5 decision tree algorithm is used to build Rice Disease Classification (RDC)
based on symptoms . The experiment is done over Indian Rice Disease. Decision
tree,C4.5, is used to automatically acquire knowledge from empirical data of
Indian Rice Disease. The advantage of C4.5 is interpretable. The algorithm,
C4.5, can effectively built a tree with high predictive power and gives more
accurate result on test data set.


Machine learning techniques are used in prediction of crop diseases
classification .Couple of machine learning techniques are studied such as C4.5
decision tree algorithm, support vector algorithm and artificial neural network
to develop agriculture applications.