Knowledge Distillation (using Tensorflow)

This is an implementation for the basic idea behind Hinton’s Knowledge Distillation Paper. We do not reproduce the exact results but rather show that the idea works.

While a few other implementations are available, the code flow is not very intuitive. Here we generate the soft targets from the teacher in an on-line manner while training the student network.

While this may not (or may) be a good way to implement the distillation architecture, it leads to a good improvement in the (small) student model.

Links