2012年11月15日 星期四

skia 性能優化 - part 2


Android 3.0 版之前, 包括Android 2.2, 原生UI 的效能一直讓使用者有所微詞. 部分的原因在於UI 上的所有圖形, 都使用 CPU rendering. 要提高 UI  rendering 的效能, 除了改以特定硬體加速, 例如 GPU 或是 2D engine, 另外一個方法就是針對程式代碼進行優化, 包括針對效能
瓶頸部分, 改用處理器所提供更有效率的指令完成. 當然, 繪圖的加速只是一部分, 另外像是 



touch panel , 系統中 event dispatch 的效能, 也會對 UI  的整體反應速度有所影響.
這篇文章要介紹的是 drawBitmap 的優化, 如下圖一中的icon (橘色框出其中一個)就是用 skia 中的 SkDraw::drawBitmap() 相關函式完成.
圖一
修改程式代碼後, 執行結果如圖二, 可看出每個 icon (bitmap) 實際上的大小範圍. 這代表Skia drawBitmap 所要處理的每個 icon 資料量.
圖二
現在來看 drawBitmap相關的程式代碼, 先看Skdraw::drawBitmap 函式, 代碼位置在:  \external\skia\src\core\SkDraw.cpp, 函式如下:
01 void SkDraw::drawBitmap(const SkBitmap& bitmap, const SkMatrix& prematrix,
02                         const SkPaint& paint) const
03 {
04     SkDEBUGCODE(this->validate();)
05 
06     // nothing to draw
07     if (fClip->isEmpty() ||
08             bitmap.width() == 0 || bitmap.height() == 0 ||
09             bitmap.getConfig() == SkBitmap::kNo_Config ||
10             (paint.getAlpha() == 0 && paint.getXfermode() == NULL))
11     {
12         return;
13     }
14 
15     // run away on too-big bitmaps for now (exceed 16.16)
16     if (bitmap.width() > 32767 || bitmap.height() > 32767)
17     {
18         return;
19     }
20 
21     SkAutoPaintStyleRestore restore(paint, SkPaint::kFill_Style);
22 
23     SkMatrix matrix;
24     if (!matrix.setConcat(*fMatrix, prematrix))
25     {
26         return;
27     }
28 
29     if (clipped_out(matrix, *fClip, bitmap.width(), bitmap.height()))
30     {
31         return;
32     }
33 
34     if (fBounder && just_translate(matrix, bitmap))
35     {
36         SkIRect ir;
37         int32_t ix = SkScalarRound(matrix.getTranslateX());
38         int32_t iy = SkScalarRound(matrix.getTranslateY());
39         ir.set(ix, iy, ix + bitmap.width(), iy + bitmap.height());
40         if (!fBounder->doIRect(ir))
41         {
42             return;
43         }
44     }
45 
46     // only lock the pixels if we passed the clip and bounder tests
47     SkAutoLockPixels alp(bitmap);
48     // after the lock, check if we are valid
49     if (!bitmap.readyToDraw())
50     {
51         return;
52     }
53 
54     if (bitmap.getConfig() != SkBitmap::kA8_Config &&
55             just_translate(matrix, bitmap))
56     {
57         int         ix = SkScalarRound(matrix.getTranslateX());
58         int         iy = SkScalarRound(matrix.getTranslateY());
59         uint32_t    storage[kBlitterStorageLongCount];
60         SkBlitter*  blitter = SkBlitter::ChooseSprite(*fBitmap, paint,
61                               bitmap, ix, iy,
62                               storage, sizeof(storage));
63         if (blitter)
64         {
65             SkAutoTPlacementDelete<SkBlitter>   ad(blitter, storage);
66 
67             SkIRect    ir;
68             ir.set(ix, iy, ix + bitmap.width(), iy + bitmap.height());
69 
70             SkRegion::Cliperator iter(*fClip, ir);
71             const SkIRect&       cr = iter.rect();
72 
73             for (; !iter.done(); iter.next())
74             {
75                 SkASSERT(!cr.isEmpty());
76                 //LOGE("SKBITMAP0 : blit\n");
77                 blitter->blitRect(cr.fLeft, cr.fTop,
78                                   cr.width(), cr.height());
79             }
80             return;
81         }
82     }
83 
84     // now make a temp draw on the stack, and use it
85     //
86     SkDraw draw(*this);
87     draw.fMatrix = &matrix;
88 
89     if (bitmap.getConfig() == SkBitmap::kA8_Config)
90     {
91         draw.drawBitmapAsMask(bitmap, paint);
92     }
93     else
94     {
95         SkAutoBitmapShaderInstall   install(bitmap, &paint);
96 
97         SkRect  r;
98         r.set(0, 0, SkIntToScalar(bitmap.width()),
99               SkIntToScalar(bitmap.height()));
100         // is this ok if paint has a rasterizer?
101 
102         draw.drawRect(r, paint);
103     }
104 }

其中比較關鍵的是第 60~62 行
SkBlitter*  blitter = SkBlitter::ChooseSprite(*fBitmap, paint, bitmap, ix, iy, storage, sizeof(storage));

以及 77~78

 blitter->blitRect(cr.fLeft, cr.fTop, cr.width(), cr.height());

這兩段代碼.
先來看 SkBlitter::ChooseSprite() 函式, 程式代碼在 \external\skia\src\core\SkBlitter_Sprite.cpp
001 SkBlitter* SkBlitter::ChooseSprite( const SkBitmap& device,
002                                     const SkPaint& paint,
003                                     const SkBitmap& source,
004                                     int left, int top,
005                                     void* storage, size_t storageSize)
006 {
007     SkSpriteBlitter* blitter;
008 
009     switch (device.getConfig())
010     {
011     case SkBitmap::kRGB_565_Config:
012         //LOGI("SKBITMAP0 : dst=565\n");
013         blitter = SkSpriteBlitter::ChooseD16(source, paint,
014                                              storage, storageSize);
015         //blitter = NULL;
016         break;
017     case SkBitmap::kARGB_8888_Config:
018         //LOGI("SKBITMAP0 : dst=8888\n");
019         blitter = SkSpriteBlitter::ChooseD32(source, paint,
020                                              storage, storageSize);
021         //blitter = NULL;
022         break;
023     default:
024         blitter = NULL;
025         break;
026     }
027 
028     if (blitter)
029         blitter->setup(device, left, top, paint);
030     return blitter;
031 }
此函式的關鍵部分是檢查 device color format, 這裡的 device指的是 launcher framebuffer. 這裡有 RGB565 以及 ARGB8888 兩種格式, RGB565 為例子, 會調用到 SkSpriteBlitter::ChooseD16() 這個函式. 代碼在 \external\skia\src\core\SkSpriteBlitter_RGB16.cpp
001 SkSpriteBlitter* SkSpriteBlitter::ChooseD16(const SkBitmap& source,
002         const SkPaint& paint,
003         void* storage, size_t storageSize)
004 {
005     if (paint.getMaskFilter() != NULL)   // may add cases for this
006     {
007         return NULL;
008     }
009     if (paint.getXfermode() != NULL)   // may add cases for this
010     {
011         return NULL;
012     }
013     if (paint.getColorFilter() != NULL)   // may add cases for this
014     {
015         return NULL;
016     }
017 
018     SkSpriteBlitter* blitter = NULL;
019     unsigned alpha = paint.getAlpha();
020 
021     switch (source.getConfig())
022     {
023     case SkBitmap::kARGB_8888_Config:
024         SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_S32_BlitRowProc,
025                               storage, storageSize, (source));
026         break;
027     case SkBitmap::kARGB_4444_Config:
028         if (255 == alpha)
029         {
030             SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_S4444_Opaque,
031                                   storage, storageSize, (source));
032         }
033         else
034         {
035             SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_S4444_Blend,
036                                   storage, storageSize, (source, alpha >> 4));
037         }
038         break;
039     case SkBitmap::kRGB_565_Config:
040         if (255 == alpha)
041         {
042             SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_S16_Opaque,
043                                   storage, storageSize, (source));
044         }
045         else
046         {
047             SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_S16_Blend,
048                                   storage, storageSize, (source, alpha));
049         }
050         break;
051     case SkBitmap::kIndex8_Config:
052         if (paint.isDither())
053         {
054             // we don't support dither yet in these special cases
055             break;
056         }
057         if (source.isOpaque())
058         {
059             if (255 == alpha)
060             {
061                 SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_SIndex8_Opaque,
062                                       storage, storageSize, (source));
063             }
064             else
065             {
066                 SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_SIndex8_Blend,
067                                       storage, storageSize, (source, alpha));
068             }
069         }
070         else
071         {
072             if (255 == alpha)
073             {
074                 SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_SIndex8A_Opaque,
075                                       storage, storageSize, (source));
076             }
077             else
078             {
079                 SK_PLACEMENT_NEW_ARGS(blitter, Sprite_D16_SIndex8A_Blend,
080                                       storage, storageSize, (source, alpha));
081             }
082         }
083         break;
084     default:
085         break;
086     }
087     return blitter;
088 }
SkSpriteBlitter::ChooseD16() 函式的目的是根據 source bitmap color format 產生一個適當的 blitter. skia blitter 的角色就如它的名字, blitter 會在 source bitmap 以及 destination bitmap 之間做資料的運算以及搬移. 以筆者的例子, source bitmap 以及 destination bitmap color format 分別是 ARGB8888 以及 RGB565 因此由第23~26行代碼, 會產生Sprite_D16_S32_BlitRowProc型別的 blitter.  SkSpriteBlitter::ChooseD16() 函式以此 blitter 做為返回值. 返回 SkBlitter::ChooseSprite() , SkBlitter::ChooseSprite() 會調用該 blitter setup 函式().  Sprite_D16_S32_BlitRowProc 類別的 setup() 函式位置在

\external\skia\src\core\SkSpriteBlitter_RGB16.cpp ,內容如下:

001 virtual void setup(const SkBitmap& device, int left, int top,
002                    const SkPaint& paint)
003 {
004     this->INHERITED::setup(device, left, top, paint);
005 
006     unsigned flags = 0;
007 
008     if (paint.getAlpha() < 0xFF)
009     {
010         flags |= SkBlitRow::kGlobalAlpha_Flag;
011     }
012     if (!fSource->isOpaque())
013     {
014         flags |= SkBlitRow::kSrcPixelAlpha_Flag;
015     }
016     if (paint.isDither())
017     {
018         //flags |= SkBlitRow::kDither_Flag;
019     }
020     fProc = SkBlitRow::Factory(flags, SkBitmap::kRGB_565_Config);
021 }
022 

重點在 Proc = SkBlitRow::Factory(flags, SkBitmap::kRGB_565_Config);

此函式代碼位於 \external\skia\src\core\SkBlitRow_D16.cpp

函式的目的是指定一個 blit 函式給 blitter blitrect函式指標.

之後 blitter 會調用 blitrect 完成 source bitmap 以及 destination bitmap

之間的資料運算以及搬移.



SkBlitRow::Factory() 函式的代碼位置在\external\skia\src\core\SkBlit_D16.cpp

內容如下:

001 SkBlitRow::Proc SkBlitRow::Factory(unsigned flags, SkBitmap::Config config)
002 {
003     SkASSERT(flags < SK_ARRAY_COUNT(gDefault_565_Procs));
004     // just so we don't crash
005     flags &= kFlags16_Mask;
006 
007     SkBlitRow::Proc proc = NULL;
008 
009     switch (config)
010     {
011     case SkBitmap::kRGB_565_Config:
012         proc = PlatformProcs565(flags);
013         if (NULL == proc)
014         {
015             proc = gDefault_565_Procs[flags];
016         }
017         break;
018     case SkBitmap::kARGB_4444_Config:
019         proc = PlatformProcs4444(flags);
020         if (NULL == proc)
021         {
022             proc = SkBlitRow_Factory_4444(flags);
023         }
024         break;
025     default:
026         break;
027     }
028     return proc;
029 }

在筆者目前的平台上, 會選擇 S32A_D565_Opaque 函式. 代碼位置在
\external\skia\src\core\SkBlitRow_D16.cpp 內容如下:

001 static void S32A_D565_Opaque(uint16_t* SK_RESTRICT dst,
002                              const SkPMColor* SK_RESTRICT src,
003                              int count,
004                              U8CPU alpha, int /*x*/, int /*y*/)
005 {
006     SkASSERT(255 == alpha);
007 
008     if (count > 0)
009     {
010         do
011         {
012             SkPMColor c = *src++;
013             SkPMColorAssert(c);
014 //            if (__builtin_expect(c!=0, 1))
015             if (c)
016             {
017                 *dst = SkSrcOver32To16(c, *dst);
018             }
019             dst += 1;
020         }
021         while (--count != 0);
022     }
023 }

S32A_D565_Opaque 每次被調用只會處理bitmap 中一條 line, 要了解此函式要搭配Sprite_D16_S32_BlitRowProc:: blitRect(). 代碼位置在

\external\skia\src\core\SkSpriteBlitter_RGB16.cpp

內容如下:

001 virtual void blitRect(int x, int y, int width, int height)
002 {
003     SK_RESTRICT uint16_t* dst = fDevice->getAddr16(x, y);
004     const SK_RESTRICT SkPMColor* src = fSource->getAddr32(x - fLeft,
005                                        y - fTop);
006     uint32_t* src32 = fSource->getAddr32(x - fLeft, y - fTop);
007     unsigned dstRB = fDevice->rowBytes();
008     unsigned srcRB = fSource->rowBytes();
009     SkBlitRow::Proc proc = fProc;
010     U8CPU alpha = fPaint->getAlpha();
011 
012     while (--height >= 0)
013     {
014         proc(dst, src, width, alpha, x, y);
015         y += 1;
016         dst = (SK_RESTRICT uint16_t*)((char*)dst + dstRB);
017         src = (const SK_RESTRICT SkPMColor*)((const char*)src + srcRB);
018     }
019 }

此函式關鍵在逐列 計算出該列的 source/destination bitmap memory address.

並調用 S32A_D565_Opaque.  

blitRect()只單純負責流程控制. 並不涉及資料運算以及搬移.

我們回到S32A_D565_Opaque.  此函式的功能是逐一將 32bpp source pixels

16bpp destination pixels 進行 alpha blending 運算. 這裡要進行的優化方式是存取 destination pixels , 32bits 為單位, 一次存取兩個 pixels. 此外Source pixels 的部分則維持一次讀取一個 32bpp pixels. 要如此修改要考慮兩個條件:



  (1)  line 的寬度是奇數或是偶數.

  (2)  Destination bitmap Memory address 32bit alignment 或是 16bit alignment.



兩種條件會組合出四種 condition.

優化後的代碼如下:

001 static void S32A_D565_Opaque(uint16_t* SK_RESTRICT dst,
002                              const SkPMColor* SK_RESTRICT src, int count,
003                              U8CPU alpha, int /*x*/, int /*y*/)
004 {
005     //LOGE("SKBITMAP : S32A_D565_Opaque\n");
006     SkASSERT(255 == alpha);
007     uint32_t* dst32 = (uint32_t*)dst;
008     uint32_t dstH, dstL;
009     uint32_t resultH, resultL, result32;
010     if ( count > 0 )
011     {
012         if ((count&0x1)==0)
013         {
014             if (((unsigned int)dst32&0x3)==0)
015             {
016                 do
017                 {
018                     dstH = *dst32;
019 
020                     SkPMColor srcL = *src++;
021                     SkPMColor srcH = *src++;
022                     if ( srcL==0 && srcH==0 )
023                         goto a;
024                     else
025                     {
026                         dstL = dstH & 0xffff;
027                         dstH >>= 16;
028 
029                         resultH = (uint32_t)SrcOver32To16(srcH, dstH);
030                         resultL = (uint32_t)SrcOver32To16(srcL, dstL);
031 
032                         result32 = (resultH << 16) | (resultL&0xffff);
033                         *dst32 = result32;
034                     }
035 a:
036                     dst32++;
037                     count -= 2;
038                 }
039                 while (count != 0);
040             }
041             else
042             {
043                 SkPMColor c = *src++;
044                 if (c)
045                     *dst = SrcOver32To16(c, *dst);
046                 dst32 = (uint32_t*)(++dst);
047                 count--;
048                 do
049                 {
050                     dstH = *dst32;
051                     SkPMColor srcL = *src++;
052                     SkPMColor srcH = *src++;
053                     if ( srcL==0 && srcH==0 )
054                         goto b;
055                     else
056                     {
057                         dstL = dstH & 0xffff;
058                         dstH >>= 16;
059 
060                         resultH = (uint32_t)SrcOver32To16(srcH, dstH);
061                         resultL = (uint32_t)SrcOver32To16(srcL, dstL);
062 
063                         *dst32 = (resultH << 16) | (resultL&0xffff);
064                     }
065 b:
066                     dst32++;
067                     count -= 2;
068                 }
069                 while (count != 1);
070 
071                 c = *src;
072                 dst = (uint16_t*) dst32;
073                 *dst = SrcOver32To16(c, *dst);
074             }
075         }
076         else
077         {
078             if (((unsigned int)dst32&0x3)==0)
079             {
080                 if ( count == 1 )
081                 {
082                     SkPMColor c = *src;
083                     if (c)
084                     {
085                         dst = (uint16_t*) dst32;
086                         *dst = SrcOver32To16(c, *dst);
087                     }
088                 }
089                 else
090                 {
091                     do
092                     {
093                         dstH = *dst32;
094                         SkPMColor srcL = *src++;
095                         SkPMColor srcH = *src++;
096                         if ( srcL==0 && srcH==0 )
097                             goto c;
098                         else
099                         {
100                             dstL = dstH & 0xffff;
101                             dstH >>= 16;
102 
103                             resultH = (uint32_t)SrcOver32To16(srcH, dstH);
104                             resultL = (uint32_t)SrcOver32To16(srcL, dstL);
105 
106                             result32 = (resultH << 16) | (resultL&0xffff);
107                             *dst32 = result32;
108                         }
109 c:
110                         dst32++;
111                         count -= 2;
112                     }
113                     while (count != 1);
114 
115                     SkPMColor c = *src;
116                     dst = (uint16_t*) dst32;
117                     *dst = SrcOver32To16(c, *dst);
118                 }
119             }
120             else
121             {
122                 if ( count == 1 )
123                 {
124                     SkPMColor c = *src;
125                     dst = (uint16_t*) dst32;
126                     *dst = SrcOver32To16(c, *dst);
127                 }
128                 else
129                 {
130                     SkPMColor c = *src++;
131                     *dst = SrcOver32To16(c, *dst);
132                     dst32 = (uint32_t*)(++dst);
133                     count--;
134                     do
135                     {
136                         dstH = *dst32;
137                         SkPMColor srcL = *src++;
138                         SkPMColor srcH = *src++;
139                         if ( srcL==0 && srcH==0 )
140                             goto d;
141                         else
142                         {
143                             dstL = dstH & 0xffff;
144                             dstH >>= 16;
145 
146                             resultH = (uint32_t)SrcOver32To16(srcH, dstH);
147                             resultL = (uint32_t)SrcOver32To16(srcL, dstL);
148 
149                             *dst32 = (resultH << 16) | (resultL&0xffff);
150                         }
151 d:
152                         dst32++;
153                         count -= 2;
154                     }
155                     while (count != 0);
156 
157                 }
158             }
159         }
160     }
161 }

圖三是優化前後效能測量結果. 垂直軸是blit 一個 bitmap 的執行時間, 單位是 uS.
圖三

優化 per-pixel operation

前面完成 memory alignment 的優化後, 剩餘的優化工作是 per-pixel operation, 在優化後的S32A_D565_Opaque 代碼中, 使用到的 SrcOver32To16 是修改自原來的 SkSrcOver32To16, SkSrcOver32To16會使用到 SkMul16ShiftRound, 進行alpha blending 以及 rounding, 這裡是另一個優化的重點. 例如 : 可以更為有效的 alpha blending, 並且取消 rounding. 對效能提升會有助益. 優化後的代碼如下:
01 static inline U16CPU SrcOver32To16(SkPMColor src, uint16_t dst)
02 {
03     uint32_t  S;
04     uint32_t  D;
05     uint16_t  src_rgb565, result_rgb565;
06     uint16_t  org_result_rgb565;
07 
08     // argb 8888 2 rgb565
09     unsigned sr = GetPackedR32(src);
10     unsigned sg = GetPackedG32(src);
11     unsigned sb = GetPackedB32(src);
12 
13     unsigned  isa = (255>>3) - GetPackedA32(src); // 5bits alpha.
14     src_rgb565 = SkPackRGB16(sr, sg, sb);
15 
16     D = dst;
17     D =  D | (D << 16);
18     D &= 0x07e0f81f;
19     D =  ( D * isa ) >> 5;
20     D &= 0x07e0f81f;
21     D = D | (D >> 16);
22     result_rgb565 = ((uint16_t)D + src_rgb565);
23 
24     return result_rgb565;
25 }
其中的 macro definition 以及 inline function 如下:
01 #define GetPackedA32(packed) ((uint32_t)((packed)<<(24-SK_A32_SHIFT))>>27)
02 #define GetPackedR32(packed) ((uint32_t)((packed)<<(24-SK_R32_SHIFT))>>27)
03 #define GetPackedG32(packed) ((uint32_t)((packed)<<(24-SK_G32_SHIFT))>>26)
04 #define GetPackedB32(packed) ((uint32_t)((packed)<<(24-SK_B32_SHIFT))>>27)
05 
06 
07 static inline uint16_t SkPackRGB16(unsigned r,
08                                    unsigned g,
09                                    unsigned b)
10 {
11     SkASSERT(r <= SK_R16_MASK);
12     SkASSERT(g <= SK_G16_MASK);
13     SkASSERT(b <= SK_B16_MASK);
14 
15     return SkToU16((r << SK_R16_SHIFT) |
16                    (g << SK_G16_SHIFT) |
17                    (b << SK_B16_SHIFT));
18 }

 END

沒有留言:

張貼留言