Using GPU acceleration with Pytorch to make your algorithms 2000% faster
Most developers are aware that some algorithms can be run on a GPU, instead of a CPU, and see orders of magnitude speedups. However, many people assume that:
1. Only specialist areas like deep learning are suitable for GPU
2. Learning to program a GPU takes years of developing specialist knowledge
It turns out that neither assumption is true! Nearly any non-recursive algorithm that operates on datasets of 1000+ items can be accelerated by a GPU. And recent libraries like Pytorch make it nearly as simple to write a GPU accelerated algorithm as a regular CPU algorithm.
In this talk we’ll explain what the mean-shift clustering algorithm is, and why it’s important for many data science applications. We’ll first implement it in python (with numpy), and will then show how to port it to Pytorch, showing how to get a 20x performance improvement in the process.
Illia Polosukhin, TensorFlow
Optimizing Distributed TensorFlow
TensorFlow allows to run distributed training, but making the most out of hardware still takes a lot of work. In this talk, you will learn: - How to setup distributed Tensorflow across multiple CPUs and GPUs. - Analyze TensorFlow timeline to figure out bottlenecks - Tune various components of the training stack to achieve optimal training speed.
Jonathan Hseu, Google
Training TensorFlow models on Cloud TPUs
Cloud TPUs are a new accelerator that Google is offering as in alpha. In this talk, we'll discuss:
- A description of the TPUs and what they offer for training models.
- The modifications you need to make to your TensorFlow model to work with TPUs.
- Best practices to get the best performance out of these devices.