Submitted by Ananth_A_007 t3_zgpmtn in MachineLearning
I am aware that one 1x1 convolution is needed for separable convolution but when else is it useful. I see it used in mobilenetv2 before the depthwise separable convolution later in the bottleneck but not sure why. I also see it used with stride 2 when max pooling could be used instead. Could someone please explain the logic behind this. Thanks.
MathChief t1_izjarfb wrote
1x1 conv is essentially a linear transformation (of number of channels) as the other redditor suggests, same as
nn.Linear
in PyTorch.What I would to add is in PyTorch the 1x1 conv by default accepts tensor of shapes
(B, C, *)
, for example(B, C, H, W)
in 2d, this is convenient for implementing purposes. If you usenn.Linear
, the channel dimension has to be first permuted to the last, and then applying the linear transformation, and permuted back. While using the 1x1 conv, which is essentially a wrapper for the C function that does the einsum automatically, it is just a single line thus the code is cleaner and less error prone.