More in The fastText Series.
In our previous post, we saw what n-grams are and how they are useful. Before that post, we built a simple text classifier using Facebook’s fastText library. In this post, we’ll see how we can optimise that model for better accuracy.
Precision and Recall
Precision and recall are two things we need to know to better understand the accuracy of our models. And these two things are not very difficult to understand. Precision is the number of correct labels that were predicted by the fastText model, and recall is the number of labels, out of the correct labels, that were successfully predicted. That might be a bit confusing, so let’s look at an example to understand it better.
Suppose for a sentence that we gave the model for classification, from our Stack Exchange cooking sample, of course, the model predicted the labels food-safety, baking, equipment, substitutions, and bread. And the actual labels in Stack Exchange are equipment, cleaning, and knives. Here, out of the top five labels that the model predicted, only one is correct. So the precision becomes 1 / 5, or 0.20. Also, out of the three correct labels, the model correctly predicted only one label (equipment), so the recall is 1 / 3 or 0.33. That is what precision and recall mean.
There’s a way for us to test the precision and recall of our model using a simple command in fastText. At this point, make sure you have gone through the intro to fastText post I’ve written earlier because I’ll be using the same example from that post here. Assuming that you’ve done that, and hoping that you have the data from that post, we’ll run the following command in the root directory to get our precision and recall data:
./fasttext test cooking_question_classification_model.bin testing_data.txt
Once you run this command, you should get an output similar to the following:
N 3080 P@1 0.139 R@1 0.0602
As you can see from the output, we have the P@1 and R@1 results, which is nothing but precision at 1 and recall at 1. We’ll see how we can improve these in this post.
Cleaning the data
If you look at the data file we have, you can see that there are some uppercase alphabets. These are not significant for our model, and we can get rid of those to improve the performance to some extent. But we can’t go through the entire data and clean it. So we’ll use a simple command to convert all uppercase letters to lowercase letters. Run the following command to do that:
cat cooking.stackexchange.txt | sed -e “s/\([.\!?,’/()]\)/ \1 /g” | tr “[:upper:]” “[:lower:]” > cooking.preprocessed.txt
In this command, we are cat-ing the file to print the data to the standard output, using a pipe to redirect that data to the sed command to run the regular expression on the input data, and then using another pipe to run this new output to the translate command to convert all uppercase alphabets to lowercase alphabets. We’re redirecting this final output to a file called ‘cooking.preprocessed.txt.’ This is again a simple example provided on the official fastText website. In a real life production scenario, this might not be so simple of a task. Anyway, once we have this new pre-processed file, let’s see what it has.
➜ head cooking.preprocessed.txt __label__sauce __label__cheese how much does potato starch affect a cheese sauce recipe ? __label__food-safety __label__acidity dangerous pathogens capable of growing in acidic environments __label__cast-iron __label__stove how do i cover up the white spots on my cast iron stove ? __label__restaurant michelin three star restaurant; but if the chef is not there __label__knife-skills __label__dicing without knife skills , how can i quickly and accurately dice vegetables ? __label__storage-method __label__equipment __label__bread what ‘ s the purpose of a bread box ? __label__baking __label__food-safety __label__substitutions __label__peanuts how to seperate peanut oil from roasted peanuts at home ? __label__chocolate american equivalent for british chocolate terms __label__baking __label__oven __label__convection fan bake vs bake __label__sauce __label__storage-lifetime __label__acidity __label__mayonnaise regulation and balancing of readymade packed mayonnaise and other sauces
As you can see, the data is much cleaner now. Now, we have to again split this to test and train datasets. We’ll run the following two commands to do that:
➜ head -n 12324 cooking.preprocessed.txt > preprocessed_training_data.txt ➜ tail -n 3080 cooking.preprocessed.txt > preprocessed_testing_data.txt
We have to again train our model on this new data because we have changed the data. To do that, we’ll run the following command and the output should be something similar to what you see here:
➜ ./fasttext supervised -input preprocessed_training_data.txt -output cooking_question_classification_model Read 0M words Number of words: 8921 Number of labels: 735 Progress: 100.0% words/sec/thread: 47747 lr: 0.000000 avg.loss: 10.379300 ETA: 0h 0m 0s
To check the precision and recall, we’ll test this model on the new test data:
➜ ./fasttext test cooking_question_classification_model.bin preprocessed_testing_data.txt N 3080 P@1 0.171 R@1 0.0743
As you can see, both the precision and the recall have improved a bit. One more thing to observe here is that when we trained the model with the new data, we saw only 8921 words, whereas the last time, we saw 14492 words. So the model had multiple variations of the same words due to uppercase and lowercase variations, which could decrease the precision to some extent.
If you have a software development background, you know epoch has something to do with time. You’d be right. In this context, epoch is the number of times a model sees a phrase or an example input. By default, the model sees an example five times, i.e., epoch = 5. Because our dataset only has around 12k samples, 5 epochs are less. We can increase that to 25 using the –ecpoch option to make the model ‘see’ an example sentence 25 times, which can help the model in learning better. Let’s try that now:
➜ ./fasttext supervised -input preprocessed_training_data.txt -output cooking_question_classification_model -epoch 25 Read 0M words Number of words: 8921 Number of labels: 735 Progress: 100.0% words/sec/thread: 43007 lr: 0.000000 avg.loss: 7.383627 ETA: 0h 0m 0s
You might have noticed that it took a bit longer for the process to finish now, which is expected as we increased the epoch. Anyway, let’s now test our model for precision:
➜ ./fasttext test cooking_question_classification_model.bin preprocessed_testing_data.txt N 3080 P@1 0.518 R@1 0.225
As you see, we have some significant improvement in the precision and recall. That’s good.
Learning Rate of the algorithm
Learning rate of an algorithm indicates how much the model changes after each example sentence is processed. We can both increase and decrease the learning rate of an algorithm. A learning rate of 0 means that there is no change in learning, or the rate of change is just 0, so the model doesn’t change at all. A usual learning rate is 0.1 to 1. For our example here, we’ll keep the learning rate at 1 and re-train our model. We’ll use the -lr option for this:
➜ ./fasttext supervised -input preprocessed_training_data.txt -output cooking_question_classification_model -lr 1.0 Read 0M words Number of words: 8921 Number of labels: 735 Progress: 100.0% words/sec/thread: 47903 lr: 0.000000 avg.loss: 6.398750 ETA: 0h 0m 0s
We’ll test the model again to see if there’s any improvement after changing the learning rate:
➜ ./fasttext test cooking_question_classification_model.bin preprocessed_testing_data.txt N 3080 P@1 0.572 R@1 0.248
There definitely is an improvement. But what would happen if we increase the epochs and the learning rate together?
Increase Epochs and Learning Rate together
Now, we’ll keep the epoch at 25 and the learning rate at 1. Let’s see what happens to the precision and the recall:
➜ ./fasttext supervised -input preprocessed_training_data.txt -output cooking_question_classification_model -epoch 25 -lr 1.0 Read 0M words Number of words: 8921 Number of labels: 735 Progress: 100.0% words/sec/thread: 41933 lr: 0.000000 avg.loss: 4.297409 ETA: 0h 0m 0s
Let’s test the model now:
➜ ./fasttext test cooking_question_classification_model.bin preprocessed_testing_data.txt N 3080 P@1 0.583 R@1 0.253
We can easily see the improvement here.
So, we learned a lot in this post (I hope). There’s more to do though (such as n-grams). We’ll see this in future posts. If you have anything to add to this, please leave a comment below.