局部异常因子与KL散度异常检测算法简述

xiaoxiao2021-04-18 124

Local Outlier Factor

Given local outlier factors, we can detect the outliers that are always away from most of the samples. In order to outline the algorithm, some concepts must go first: Reachability Distance

RDk(x,x′)=max(∥x−x(k)∥,∥x−x′∥) where

x(k) stands for the

k th point nearest to

x in training set

{xi}ni=1 . Note that

k is manually selected. Local Reachability Density

LRDk(x)=(1k∑i=1kRDk(x(i),x))−1 Local Outlier Factor

LOFk(x)=1k∑ki=1LRDk(x(i))LRDk(x) Evidently, as the LOF of

x ascends, the probability that

x is an outlier also goes up. Theoretically, it is an easy algorithm with intuitive principle. However, when

n is a very large number, it also requires tremendous computation amount.

Here is a simple example

n=100; x=[(rand(n/2,2)-0.5)*20; randn(n/2,2)]; x(n,1)=14; k=3; x2=sum(x.^2,2); [s, t]=sort(sqrt(repmat(x2,1,n) repmat(x2',n,1)-2*x*x'), 2); for i=1:k 1 for j=1:k RD(:,j)=max(s(t(t(:,i),j 1),k), s(t(:,i),j 1)); end LRD(:,i)=1./mean(RD,2); end LOF=mean(LRD(:,2:k 1),2)./LRD(:,1); figure(1); clf; hold on plot(x(:,1),x(:,2),'rx'); for i=1:n plot(x(i,1),x(i,2),'bo', 'MarkerSize', LOF(i)*10); end

KL Divergence

In unsupervised learning problems, there is usually little information about the outliers. However, when some known normal sample set {x′i′}n′i′=1 is given, we may be confident to figure out the outliers in the test set {xi}ni=1 to some degree. Kullback-Leibler (KL) divergence, also known as Relative Entropy, is a powerful tool to estimate the probability density ratio of normal samples to test samples- w(x)=p′(x)p(x) where p′(x) is the probability density of normal samples and p(x) is that of test ones, and avoid direct calculation of the ratio. The ratio of normal sample approaches 1 but of an outlier is away from 1. To begin with, let transform the model of density ratio to a parameterized linear model: wα(x)=∑j=1bαjψj(x)=αTψ(x) where α=(α1,⋯,αb)T is the parameter vector and ψ(x)=(ψ1,⋯,ψb)T is a non-negative basis function vector. Then wα(x)p(x) can be seen as an estimation of p′(x) . Define the similarity between wα(x)p(x) and p′(x) as KL distance, i.e. KL(p′∥wα(x)p(x))=∫p′(x)logp′(x)wα(x)p(x) In general case, KL distance is non-negative and equals to zero only if wαp=p′ . When KL distance is considerably small, wαp can be regarded near to p′ . In order to guarantee that wαp is well-defined, we apply the following constraint ∫wα(x)p(x)dx=1,∀x,wα(x)p(x)≥0 Then by approximation, we can transform the estimation above to the following optimal problem: maxα1n′∑i′=1n′logwα(x′i′)s.t.1n∑i=1nwα(xi)=1,α1,…,αn′≥0 We briefly summarize the estimation process:

Initialize

α .Repeatedly carry out the following process until

α comes a suitable precision:

α←α+ϵAT(1./Aα)

α←α+(1−bTα)b(bTb)

α←max(0,α)

α←α/(bTα)

where A is the matrix whose (i′,j)th element is ψj(x′i′) . b is the vector whose jth element is 1n∑ni=1ψj(xi) .

Here is an example (Gaussian Kernal Model):

function [ a ] = KLIEP( k, r ) a0=rand(size(k,2),1); b=mean(r)'; c=sum(b.^2); for o=1:1000 a=a0+0.01*k'*(1./k*a0); a=a+b*(1-sum(b.*a))/c; a=max(0,a); a=a/sum(b.*a); if norm(a-a0)<0.001, break, end a0=a; end end n=100; x=randn(n,1); y=randn(n,1); y(n)=5; hhs=2*[1,5,10].^2; m=5; x2=x.^2; xx=repmat(x2,1,n)+repmat(x2',n,1)-2*(x*x'); y2=y.^2; yx=repmat(y2,1,n)+repmat(x2',n,1)-2*y*x'; u=floor(m*(0:n-1)/n)+1; u=u(randperm(n)); for hk=1:length(hhs) hh=hhs(hk);k=exp(-xx/hh); r=exp(-yx/hh); for i=1:m g(hk,i)=mean(k(u==i,:)*KLIEP(k(u~=i,:),r)); end end [gh,ggh]=max(mean(g,2)); HH=hhs(ggh); k=exp(-xx/HH); r=exp(-yx/HH); s=r*KLIEP(k,r); figure(1); clf; hold on; plot(y,s,'rx');

SVM

Furthermore, outlier detection can be done using support vector machine techniques. Due to the time limit, we just outline the main structure of that algorithm. A typical SVM outlier detector gives a hyper-ball that contains nearly all the sample points. Then a point which is outlying the hyper-ball can be seen as an outlier. Concretely speaking, we get the center c and radius R by solving the following optimal problem:

minc,r,ξ(R2+C∑i=1nξi)s.t.∥xi−c∥2≤R2+ξi,ξi≥0,∀i=1,2,…,n It can be solved by using Lagrange multiplers:

L(c,r,ξ,α,β)=R2+C∑i=1nξi−∑i=1nαi(R2+ξi−∥xi−c∥2)−∑i=1nβiξi Then its dual problem can be formulated as:

maxα,βinfc,R,ξL(c,r,ξ,α,β),s.t.α≥0,β≥0 KKT condition:

∂L∂c=0∂L∂R=0∂L∂ξi=0⇒⇒⇒c=∑ni=1αixi∑ni=1αi∑i=1nαi=1αi+βi=C,∀i=1,2,…,n Therefore, the dual problem can be solved by

α^=argmaxα=⎛⎝∑i=1nαixTixi−∑i,j=1nαiαjxTixj⎞⎠s.t.0≤αi≤C,∀i=1,2,…,n It is in the form of typical quadratic programming problem. After solving it, we are able to further solve

c and

R2^=∥∥∥∥xi−∑j=1nα^jxj∥∥∥∥2,c^=∑i=1nα^ixi where

xi is the support vector satisfying

∥xi−c∥2=R2 and

0<αi<C . Hence, when a sample point

x satisfies

∥x−c^∥2>R2^ it can be viewed as an outlier.

转载请注明原文地址: https://ju.6miu.com/read-675087.html

技术

最新回复(0)