论文笔记：Mask R-CNN

xiaoxiao2021-04-18 73

Kaiming He 新作，貌似投iccv？论文地址： https://arxiv.org/abs/1703.06870

目标

设计一个很棒的实例分割网络

Our goal in this work is to develop a comparably enabling framework for instance segmentation.

实例分割难点

不仅仅要找出物体，还要语义分割

It combines elements from the classical computer vision task of objection detection and semantci segmentation.

网络设计

在Faster R-CNN上改。简单来说就是在ROIpooling之后，每一个ROI上做分割，原图直接作为ground-truth。所以Loss变成：

L=Lcls+Lbox+Lmask 还是秉承RCNN的一贯作风用了multi-task loss。

L_mask

为了计算 Lmask 对于每一个RoI生成 Km2 维的mask。每一张mask的大小是 m×m ，共K张对应K类。

The mask branch has a Km2 dimensional output for each RoI, which encodes K binary masks of resolution m × m, one for each of the K classes.

在训练的时候并不是每张mask都训练，它从分类网络获得类别信息，然后直接在某一张图上进行0/1的mask学习。具体操作是与ground-truth进行交叉熵的计算~

To this we apply a per-pixel sigmoid, and define L_mask as the average binary cross-entropy loss.

Mask Representation

因为要保存空间信息，所以作者没有把feature map变成向量，而是m*m的mask。然后在与ground-truth点对点的对比。由于RoIPooling的取整操作，导致视野范围有些许差异，所以要对齐，作者设计了RoIAlign。

RoIAlign

由于RoIPooling有向下取整，所以回退的时候不能很好的点对点。作者就用了双线性差值来解决。双线性插值的原理如下：如图，已知Q12，Q22，Q11，Q21，但是要插值的点为P点，这就要用双线性插值了，首先在x轴方向上，对R1和R2两个点进行插值，这个很简单，然后根据R1和R2对P点进行插值，这就是所谓的双线性插值。

结果

现在都转到COCO数据集上论剑了。VOC都不做了/(ㄒoㄒ)/~~ 蓝瘦香菇。

转载请注明原文地址: https://ju.6miu.com/read-674888.html

技术

最新回复(0)