开始说正题
直播间的飘赞使用SurfaceView,但是当直播间Activity到后台时(SurfaceView销毁了),极少情况下会出现ANR。我通过查看SurfaceView源码发现了一个坑,其实很多人使用的姿势不对,他们没有出现ANR只是幸运而已。1、如何找ANR日志 出现ANR之后我立刻想到要拿到ANR日志,可以通过如下命令获取ANR日志: adb pull data/anr/traces.txt
这样就把ANR日志下载到电脑了。 2、分析ANR日志 打开ANR日志,可以看到main线程的堆栈信息 "main" prio=5 tid=1 Waiting | group="main" sCount=1 dsCount=0 obj=0x76c353e8 self=0xf4a64500 | sysTid=7047 nice=-11 cgrp=default sched=0/0 handle=0xf7276b4c | state=S schedstat=( 0 0 0 ) utm=2795 stm=388 core=7 HZ=100 | stack=0xff030000-0xff032000 stackSize=8MB | held mutexes= at java.lang.Object.wait!(Native method) - waiting on <0x03fd06cb> (a java.lang.Object) at java.lang.Thread.parkFor$(Thread.java:1220) - locked <0x03fd06cb> (a java.lang.Object) at sun.misc.Unsafe.park(Unsafe.java:299) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:810) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:843) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1172) at java.util.concurrent.locks.ReentrantLock$FairSync.lock(ReentrantLock.java:196) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:257) at android.view.SurfaceView.updateWindow(SurfaceView.java:638) at android.view.SurfaceView.onWindowVisibilityChanged(SurfaceView.java:316) at android.view.View.dispatchWindowVisibilityChanged(View.java:10434) at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:1328) at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:1328) at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:1328) ... repeated 1 times at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:1750) at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1437) at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:7397) at android.view.Choreographer$CallbackRecord.run(Choreographer.java:920) at android.view.Choreographer.doCallbacks(Choreographer.java:695) at android.view.Choreographer.doFrame(Choreographer.java:631) at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:906) at android.os.Handler.handleCallback(Handler.java:739) at android.os.Handler.dispatchMessage(Handler.java:95) at android.os.Looper.loop(Looper.java:158) at android.app.ActivityThread.main(ActivityThread.java:7237) at java.lang.reflect.Method.invoke!(Native method) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:1230) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1120) 1、分析SurfaceView源码: 根据log可以看出是SurfacecView导致的ANR,直播间的飘赞动画就是用SurfaceView实现的,在执行SurfaceView.updateWindow方法里面的ReentrantLock.lock()时一直阻塞在这里,导致了ANR。 打开SurfaceView源码,看到updateWindow方法里面果然有mSurfaceLock.lock()方法。 mSurfaceLock是这样被定义的:final ReentrantLock mSurfaceLock = new ReentrantLock(); 肯定是有一个地方没有调用unlock释放锁,导致调用lock时一直无法获得锁,想到Canvas有lock,并且需要开发者及时unlock。
操作画布的代码并没有问题,在finally里unlock也是正确的,如下: Canvas canvas = mHolder.lockCanvas(); if(canvas != null){ try { for (Heart heart : mHeartArray) { canvas.drawBitmap(heart.bitmap, null, heart.dst, mPaint); } } catch (Exception e) { e.printStackTrace(); } finally { mHolder.unlockCanvasAndPost(canvas); } }
自己反复让Activity前后台切换,因为SurfaceView不可见会被销毁,可见后会被创建。这时终于复现了ANR,并且看到了一条异常: Surface has already been released. 于是开始具体分析源码,先看unlockCanvasAndPost实现,因为可能unlock // SurfaceView.SurfaceHolder的实现 @Override public void unlockCanvasAndPost(Canvas canvas) { mSurface.unlockCanvasAndPost(canvas); mSurfaceLock.unlock(); } // Surface类 public void unlockCanvasAndPost(Canvas canvas) { synchronized (mLock) { checkNotReleasedLocked(); //... } } // 找到了那个抛异常位置,如果在这里抛出异常,那么在就不会执行SurfaceLock.unlock了,最后导致再次lock的时候出现ANR。 // 当mNativeObject=0时,会抛这个异常,接着看mNativeObject什么情况下回置为0. private void checkNotReleasedLocked() { if (mNativeObject == 0) { throw new IllegalStateException("Surface has already been released."); } } // 原来这个方法会把mNativeObject置为0,接分析哪里调用这个方法 private void setNativeObjectLocked(long ptr) { //... mNativeObject = ptr; //... } // 搜索了一下,原来这里调用了setNativeObjectLocked(0) @Deprecated public void transferFrom(Surface other) { if (other != this) { //... other.setNativeObjectLocked(0); //... } } // SurfaceView里调用transferFrom /** @hide */ protected void updateWindow(boolean force, boolean redrawNeeded) { mSurfaceLock.lock(); try { } finally { mSurfaceLock.unlock(); } try { .... if (mSurfaceCreated && (surfaceChanged || (!visible && visibleChanged))) { mSurfaceCreated = false; if (mSurface.isValid()) { callbacks = getSurfaceCallbacks(); for (SurfaceHolder.Callback c : callbacks) { c.surfaceDestroyed(mSurfaceHolder); } } } mSurface.transferFrom(mNewSurface); .... } finally { } } } SurfaceView生命周期如下: surfaceCreated:当从不可见状态变为可见状态时 surfaceChanged:当大小改变时 surfaceDestroyed:当从可见状态变为不可见状态时 根据BUG复现步骤,点击聊天按钮,跳转到聊天页面,此时直播间处于不可见状态,因此SurfaeView会被销毁,所以会调用surfaceDestroyed。 // 从上面代码可以看到,先回调surfaceDestroyed,然后执行mSurface.transferFrom(mNewSurface),这时会将mNativeObject置为0, // 如果恰好此时调用unlockCanvasAndPost,会抛出异常,并且不能调用unlock,导致下次创建SurfaceView时发生ANR。 产生ANR的原因:简而言之,处于在lockCanvas和unlockCanvasAndPost之间时,SurfaceView销毁了,导致unlock失败,出现了死锁。 总结本次ANR过程: 第一步:执行了mHolder.lockCanvas(),lock成功获得锁 第二步:此时恰巧遇到SurfaceView销毁,surfaceDestroyed执行,并且将mNativeObject置为0 第三步:调用unlockCanvasAndPost,但是由于mNativeObject为0,所以抛出异常,并没有成功unlock 第四步:SurfaceView重新创建,尝试lock,因为上次的锁没有释放,所以进入了无限等待。
解决方法:分为2步 1、在操作画布过程增加同步锁,让整个操作画布过程作为一个整体
synchronized (this) { if (mDrawFlag) { Canvas canvas = mHolder.lockCanvas(); if (canvas != null) { try { for (Heart heart : mHeartArray) { canvas.drawBitmap(heart.bitmap, null, heart.dst, mPaint); } } } catch (Exception e) { e.printStackTrace(); } finally { try { mHolder.unlockCanvasAndPost(canvas); } catch (Exception e) { e.printStackTrace(); } } } } 2、在SurfaceView销毁回调增加同步锁,可以保证mNativeObject不会在lockCanvas和unlockCanvasAndPost之间置为0 @Override public void surfaceDestroyed(SurfaceHolder holder) { synchronized (this) { mDrawFlag = false; } }
解决这个ANR,简而言之,阻止SurfaceView在lockCanvas和unlockCanvasAndPost之间销毁,在上面两处加上了同步块。