Hi everyone! In continuation to my last article Facebook/Udacity PyTorch Challenge 2018-2019 Complete!, I wanted to go over more AI focused keys and takeaways that I learned during the course. It will range from beginner to intermediate tips so for the uninitiated feel free to speed through the more technical parts.
It’s not magic, it’s just math.
AI may seem like this magical complicated thing, but at its’ core, it’s just a lot of math. Given a big enough dataset, enough computational power (CPU or GPU) and enough weights you can easily create an algorithm that can predict whatever your heart desires. From there, what separates the men from the boys (or ladies from the girls) is generalization. If you can curate your dataset in a way that allows the algorithm to generalize well, then any new data you haven’t seen before should also be predicted easily. If your algorithm can only detect your existing dataset well, that’s called overfitting and that is no bueno.
What model do I use?
In this course we learned about two types of AI models, Convolutional and Recurrent Neural Networks (CNNs and RNNs). Although there are many many more types and subtypes, I’ll just go over the difference between these two as highlighted by the course. In reality, new models are being developed everyday so following popular data scientists via twitter or blogs is the best way to stay current on the hottest trends.
CNNs are great for image classification tasks because they are very good at extracting major features while reducing noise and dimensionality. This means by the time the algorithm starts processing the weights, the size of the data has been reduced greatly compared to the original input. This makes the computational cost very low compared to other AI models. My advice is to always try CNNs first. The computational efficiency is just too good to pass up if you can make it work.
RNNs are great for problems that require context to make a prediction. They have the ability to use past data (t-1) which is used as additional input for the next step (t). This makes it very good for prediction problems over timed data like stock prices and inventory forecasting. A variation called Long Short Term Memory (LSTMs) is especially popular because rather than keeping only (t-1) data as additional input for the next step, it can keep data from further back. This makes it ideal for natural language processing problems where the next word prediction could rely on some context further back in a paragraph versus just the word before.
MacOS, Windows or Linux?
I’m still in the exploratory phase on this topic. All the feedback from the community seems to point to Linux, Ubuntu specifically. I initially used MacOS at the start because that happens to be the brand I’m using for my laptop. The problem is MacOS doesn’t officially support NVIDIA video cards, at least with the latest Mojave 10.14.2 version. This means there’s no way to reliably use GPU acceleration with CUDA on MacOS. Having a beefy GPU can reduce your training time from months to hours depending on the dataset size so it’s very important to have access to one, either locally or in the cloud. That led me to my Windows 10 desktop with dual Geforce 1080 TIs (positive side effect from being a gamer). Single GPU performance was great compared to CPU, but Multi-GPU performance was not what I was expecting. Further research points to PyTorch officially supporting distributed GPUs only under Linux, so installing that with Ubuntu Server 18.04.1 LTS will be next on my list.
Final Lab Challenge Project
For the final lab, we were tasked with a flower classification problem with 102 types of flowers. Luckily, we were able to leverage pre-trained models from other amazing AI research teams (Microsoft Resnet, Densenet, etc…) so re-training the final layer with a dataset of 102 flowers only took a couple of hours. In the end I was able to achieve a 95% accuracy on my test set. I actually hit 98% on my first training but I wasn’t able to duplicate that result thereafter so I’ll chalk up that fluke to the mathematical lotto. A couple of key takeaways from the lab are:
- Train, Validate, Test, Repeat – Make sure you shuffle your training data so you don’t overfit your algorithm while keeping your validation and test sets consistent (no shuffle or transforms). Keep your Train, Validation and Test sets separate for best results.
- Hyper-parameter Tuning – to get 95-98% accuracy requires finetuning of multiple variables including image transforms, number of workers, batch size, optimizer, loss function, number of epochs and learning rate. Try different combinations and save the best result.
- Number of Workers on Windows – I’m not sure of the reason, but I wasn’t able to get Number of Workers greater than 0 to work on Windows until I added the following at the beginning of my code:
if __name__ == “__main__”:
On MacOS this was not an issue, so I assume Linux is also a non-issue.
- One fully connected layer with pre-trained models – Pre-trained models have already done the hard part, especially if your categories exist in the dataset they were trained on. So generally you’ll only need one fully connected layer at the end of the model. Adding additional layers won’t increase performance that much. Others also advised that unlocking additional layers in a pre-trained model and re-training the weights could add 3-5% towards that coveted 98-99% accuracy.
- Add timestamps to checkpoint filenames – To prevent overwriting the same checkpoint file over and over I added automatic timestamps appended to the filename along with number of Epochs and pre-trained model name. This made it simple to keep a running history of all the saved models and easy to distinguish at a glance.
Share your code, but make it beautiful.
PEP8 is a style guideline for writing easy to understand Python code. It should be followed at all times, especially when collaborating and sharing code. Thankfully you can enable automatic style checking in Spyder by going to Python>Preferences>Editor>Code Introspection/Analysis and enabling “Real-time code style analysis”. Once you’ve got the code running solid, share it with the world! Create a Github account and upload your .py files so you can add your achievement to your resume, LinkedIn and/or Facebook. I opted to also include a link to the dataset in the readme file so others can duplicate my results if needed.
Link to Python Style Code here: https://realpython.com/python-pep8/
Link to Github Project Code: https://github.com/davidhn112/pytorch-challenge-2019