Dataset
To begin, we need a dataset as data is crucial for any ML/AI model.
Data Splitting
The first step in our classification task is to randomly split our data into 3 independent sets:
Training Set:
The dataset that we feed our model to learn underlying patterns and
relationships.
Validation Set:
The dataset that we use to understand our model's performance
and tune it accordingly.
Test Set:
The dataset that we use to asses model's performance in the real world.
Training the Model
Now, let’s go ahead and train our model on training dataset.
Here, we will teach the AI model
to learn and make predictions or perform specific tasks.
But wait, there are a plethora of classification algorithms available:
- Logistic Regression
- Support Vector Machines (SVM)
- Random Forest
- Naive Bayes
Let’s use the Logistic Regression model for today!
Building The Model
What you are performing here is supervised learning where the model learns from labeled examples
to make predictions.
Drag each animal in the training set to a new
position to see how
model updates the decision boundary!
Validating the Model
Now that we have trained the model, we will assess its performance using a validation set.
On the basis of the assesment, you can tweak the parameters of the model to try and get the desired performance.
Testing the Model
Great! We have tested our model and we have reached an accuracy of 75%.
This means that the model will accurately classify cats and dogs 3/4 times.
We can now assess the model performance using a confusion matrix: