Consider that you have data about 2 fruits – Apple and Oranges. You have their Weights and Textures. Weights are in grams and texture is either smooth or rough. These are called Features. You also must associate Features with Labels. In our case the label is Apple (denoted by ‘0’) and Orange (denoted by ‘1’).
Now we use a ML algorithm called Decision Tree Classifier to predict whether a given Feature (weights and textures) represent an Apple or Orange.
So, here’s the Python Program
A> from sklearn import tree
This means from sklearn (a library called Sci-Kit Learn) import all tree functions
B> features = [[140,1],[130,1],[150,0],[170,0],[190,0]]
labels = [0,0,1,1,1]
This is our dataset. We assign the features with values. For example [140,1] means 140 grams weight and Smooth texture (which is the ‘1’ after ‘140’). Similarly [170,0] means 170 grams weight and Rough Texture (which is the ‘0’ after ‘170’).
By now you must have guessed from the texture that ‘1’ indicates Apple (smooth texture) and ‘0’ indicates Orange (rough texture)
Now we provide labels for these features. Thus ‘0’ indicates an Apple and ‘1’ indicates an Orange.
Note that there are 5 data points (for features and labels). The first 2 are Apples and the remaining 3 are for Oranges.
Then we use a variable called ‘clf’ and assign it the tree function called tree.DecisionTreeClassifier()
D> clf = clf.fit(features,labels)
Now we fit the Features and Labels into this function. So now ‘clf’ is ready for prediction.
E> wt = input(‘Enter weight : ‘)
texture = input(‘Enter texture 0 or 1 :’)
wt and texture are two variables where we capture the weight and texture information from the user, to predict whether the fruit is ‘Apple’ or ‘Orange’ from the 5-point dataset that we have.
F> if clf.predict([[wt,texture]]) == 1:
Now this is where the magic happens. The clf.predict function will predict whether this fruit is and Orange or an Apple. If the ‘predict’ function returns ‘1’ then the fruit is Orange or else, it’s Apple.
This is program under 15 lines that we have written to do predictions. All these functions are available in libraries of Python language.
We have just written a Supervised Machine Learning algorithm called Decision Trees. The more the data you have in your dataset (we have used 5 data points) the better the prediction is.
So, go ahead try this example (I have used Spyder (an IDE) in Anaconda (a platform))
Hope you liked this program. Have any questions, let me know.