Understanding and Diagnosing Visual Tracking Systems

    xiaoxiao2026-01-10  8

    Understanding and Diagnosing Visual Tracking Systems

    Several benchmark datasets for visual tracking research have been proposed in recent years. Despite their usefulness, whether they are sufficient for understanding and diagnosing the strengths and weaknesses of different trackers remains questionable. 

    近年来,人们提出了几种用于视觉跟踪研究的基准数据集。尽管它们很有用,但它们是否足以理解和诊断不同跟踪器的优缺点仍然值得怀疑。

    To address this issue, we propose a framework by breaking a tracker down into five constituent parts, namely, motion model, feature extractor, observation model, model updater, and ensemble post-processor.

    为了解决这个问题,作者提出了一个框架,将跟踪器分解为五个组成部分,即运动模型、特征提取器、观察模型、模型更新器和集成后处理器。

     We then conduct ablative experiments on each component to study how it affects the overall result. Surprisingly, our findings are discrepant with some common beliefs in the visual tracking research community. We find that the feature extractor plays the most important role in a tracker.

    然后对每个部件进行烧蚀实验,以研究它是如何影响整体结果的。令人惊讶的是,研究结果与视觉跟踪研究领域的一些共识有所不同。研究发现特征提取器在跟踪器中起着最重要的作用。

     On the other hand, although the observation model is the focus of many studies, we find that it often brings no significant improvement. Moreover, the motion model and model updater contain many details that could affect the result. Also, the ensemble post-processor can improve the result substantially when the constituent trackers have high diversity. 

    另一方面,虽然观测模型是许多研究的重点,但它往往没有带来明显的改善。同时,运动模型和模型更新器包含了许多可能影响结果的细节。此外,当组成跟踪器具有较高的分集时,集成后置处理器可以显著提高跟踪结果。

    Based on our findings, we put together some very elementary building blocks to give a basic tracker which is competitive in performance to the state-of-the-art trackers. We believe our framework can provide a solid baseline when conducting controlled experiments for visual tracking research.

    基于这些发现,作者把一些非常基本的构建块,以提供一个基本的跟踪器,它的性能是最先进的跟踪器竞争。作者相信,在进行受控实验进行视觉追踪研究时,他们框架可以提供坚实的基线。

    A tracking system usually worksby initializing the observation model with the given bounding box of the targetin the first frame. In each of the following frames, the motion model firstgenerates candidate regions or proposals for testing based on the estimationfrom the previous frame. The candidate regions or proposals are fed into theobservation model to compute their probability of being the target. The onewith the highest probability is then selected as the estimation result of thecurrent frame. Based on the output of the observation model, the model updaterdecides whether the observation model needs any update and, if needed, theupdate frequency. Finally, if there are multiple trackers, the bounding boxesreturned by the trackers will be combined by the ensemble post-processor toobtain a more accurate estimate. This pipeline is illustrated in Fig.1

    跟踪系统通常是利用给定的第一帧目标边界框初始化观测模型。在接下来的每一帧中,运动模型首先生成基于前一帧估计的候选区域或测试建议。将候选区域或建议输入观测模型,计算其成为目标的概率。然后选择概率最大的帧作为当前帧的估计结果。根据观察模型的输出,模型updater决定观察模型是否需要更新,如果需要,还决定更新频率。最后,如果有多个跟踪器,跟踪器返回的边界框将被集成后处理器合并,以获得更准确的估计。该管道如图1所示

     

    Five constituent parts:

    1. Motion Model: Based on theestimation from the previous frame, the motion model generates a set ofcandidate regions or bounding boxes which may contain the target in the currentframe.

    运动模型:基于前一帧的估计,运动模型生成一组可能包含当前帧目标的候选区域或边界框。

    2.Feature Extractor: Thefeature extractor represents each candidate in the candidate set using somefeatures.

    特征提取器:特征提取器使用一些特征表示候选集中的每个候选。

    3.Observation Model: Theobservation model judges whether a candidate is the target based on thefeatures extracted from the candidate.

    观察模型:观察模型通过提取候选人的特征来判断候选人是否为目标。

    4.Model Updater: The modelupdater controls the strategy and frequency of updating the observation model. Ithas to strike a balance between model adaptation and drift.

    模型更新器:模型更新控制更新观察模型的策略和频率。它需要在模型适应和漂移之间取得平衡。

    5. Ensemble Post-processor:When a tracking system consists of multiple trackers, the ensemblepostprocessor takes the outputs of the constituent trackers and uses the ensemblelearning approach to combine them into the final result.

    集成后处理器:当一个跟踪系统包含多个跟踪器时,ensemblepostprocessor获取各个跟踪器的输出,并使用集成学习方法将它们组合成最终结果。

     

    Details of all these components

    1.Motion Model :ParticleFilter Sliding ,Window Radius ,Sliding Window

    2.Feature Extractor: RawGrayscale ,Raw Color , Haar-like Features, HOG ,HOG +Raw Color

    3.Observation Model:LogisticRegression ,Ridge Regression ,SVM ,Structured Output SVM (SO-SVM)

    4.Model Updater:

    ①update the model whenever the confidence of the target falls below athreshold.

    ②update the model whenever the difference between the confidence of thetarget and that of the background examples is below a threshold.

    5.Ensemble post-processing:

    ①a loss function for bounding box majority voting and then extended it toincorporate tracker weights, trajectory continuity and removal of bad trackers

    ②a factorial hidden Markov model that considers the temporal smoothnessbetween frames.

    文章来源:http://arxiv.org/abs/1504.06055

    转载请注明原文地址: https://ju.6miu.com/read-1305846.html
    最新回复(0)