This post is a direct continuation of Part 1, please try to go through it before proceeding. In this post, I will be going through Low rank transforms, efficient network architectures and knowledge distillation. Low rank transforms techniques decompose a convolution filter to lower rank parts decreasing the overall computational and storage complexity. Knowledge distillation or student-teacher models use techniques in which a larger model trains a smaller model. The smaller model inherits the ‘knowledge’ of the larger model.
reply