FaceNet 直接学习了从脸部图像到欧式空间的映射,使得面部的相似度直接可以用距离来表示。128位表示每张脸,同时 ||f(x)||2=1
目标约束
||f(xai)−f(xpi)||22+α<||f(xai)−f(xni)||22 我们想要达到的目的,是所有的点与本类的距离和与异类的距离都至少差一个 αtriplet loss function
L=∑iN[||f(xai)−f(xpi)||22−||f(xai)−f(xni)||22+α]+ 损失函数的定义说明,只在乎那些与本类的距离和与异类距离之差没能小于阈值 α 的 不满足约束条件点*,而对那些已经满足条件的,则不予追究。这与SVM有些相似,关注边界附近的样本。追求的是尽可能所有的点都守约束,而不是总体的距离和最小之类的。Triplet Selection
自然的做法,给定一个点,我们要找出与它最远的本类点,和与它最近的异类点。但是这样可能会导致不好的训练结果,因为质量差的人脸图片或错误的标签可能会主导选择的结果。We only compute the argmin and argmax with mini-batch. In our experiments we sample the training data such that around 40 faces are selected per identity per mini-batch.Additionally, randomly sampled negative faces are added to each mini-batch.Instead of picking the hardest positive, we use all anchor-positive pairs in a mini-batch while still selecting the hard negatives. We found the all anchor-positive method was more stable and converged slightly faster at the beginning of training.Selecting the hardest negtives can in practice lead to bad local minima early on in training, specially it can result in a collapsed model(i.e. f(x)=0 ). In order to mitigate this, it helps to select xni such that ||f(xai)−f(xpi)||22<||f(xai)−f(xni)||22 We call these negative exemplars semi-hard, ad they are futher away from the anchor than the positive exemplar, but still hard because the squared distance is close to the anchor-positive distance. Those negative lie inside the margin α