基于图像识别与位置关系的Android控件遍历方式

背景

在《基于图像识别的Android控件遍历》中描述了一种根据控件区域截图来遍历UI控件的方法，但是这种方法存在很大的缺陷，即无法准确区分相似度很大控件，如下图中的桌面壁纸、显示亮度与移动网络。

要提高准确度就需要引入更多的判断条件。这里尝试引入位置关系，来辅助识别。

原理

UI上所有的控件都是以布局（特指界面绘制布局）的方式呈现出来的，而目前Android应用普遍以上下结构（即从上往下滑动为主，如：新闻阅读类App）或左右结构（即左右滑动为主，如：图片处理类App）为主（图层重叠类的暂不考虑，后续增加处理方式）。那么控件彼此间的位置是存在相互关系的，如上下或左右关系。当我们依次遍历控件时，一定是从根节点向子节点遍历，而对应到这两种结构上来说，我们一定是按从左到右或从上到下的顺序在遍历。有了这个结论，就可以进一步假设，如果我们获取到某个控件，其位置相对于我们已知控件的位置更靠左或上方，那么这个控件就是我们已遍历过的。
对于静态的界面判断控件间相对位置非常简单，只需要判断坐标关系即可。

上图展示了除重叠以外，控件与控件所有可能的位置关系。从中间Z控件出发，先看看其余控件与其位置关系（假设左上坐标表示为x₀,y₀，右下坐标表示为x₁,y₁。Andorid系统以屏幕左上为原点，x轴向右逐步增大，y轴向下逐步增大。）

z与a：x_z0 > x_a1; y_z0 > y_a1
z与b： y_z0 > y_b1
z与c：x_z1 < x_c0; y_z0 > y_c1
z与d：x_z0 > x_d1
z与e：x_z1 < x_e0
z与f：x_z0 > x_f1; y_z1 < y_f0
z与g：y_z1 < y_g0
z与h：x_z1 < x_h0; y_z1 < y_h0
实现
从上面可以看出一定的规律，只要满足任意一条关系，则可以找到其对应的位置关系（紧挨着的情况未考虑，即某一个坐标相等的情况）。从另一个角度讲，如果不满足以上关系的坐标，那么一定是相互重叠的关系。
转换为代码
```
def isOverlap(x,y,w,h,x1,y1,w1,h1):
     return ((y1 + h1) > y) and ((y + h) > y1) and ((x1 + w1) > x) and ((x + w) > x1)
```
x, y是左上角坐标
w, h是长宽，用来计算右下角坐标

为什么我们需要计算是否重叠呢？在《基于图像识别的Android控件遍历》中我采用的方法是控件区域截图后与之前的截图进行对比，如果相似则说明是同一个控件。这种方法存在一个缺陷，在开头时已提到。有了重叠判断方法，我们可以对之前的逻辑进行改进。

for item in freeze().offspring():
   attrs = self.getUseableAttrs(item)
       if(attrs):
          width = item.attr('size')[0] * screen_width
          height = item.attr('size')[1] * screen_height
          anchorX = item.attr('anchorPoint')[0] * width
          anchorY = item.attr('anchorPoint')[1] * height
          x0 = (item.attr('pos')[0] * screen_width) - anchorX
          y0 = (item.attr('pos')[1] * screen_height) - anchorY
          if(x0 < 0 or y0 < 0 or (y0 + height) < last_py):
             continue
          if(last_width > 0 and uic.isOverlap(last_px, last_py, last_width, last_height, x0, y0, width, height)):
              continue
           last_px = x0
           last_py = y0
           last_width = width
           last_height = height
           path = self.__saveCropScreen(screen, x0, y0, width, height, page, index)

这样截图的区域一定是不重复且未遍历到的控件。

width = item.attr('size')[0] * screen_width
height = item.attr('size')[1] * screen_height
anchorX = item.attr('anchorPoint')[0] * width
anchorY = item.attr('anchorPoint')[1] * height
x0 = (item.attr('pos')[0] * screen_width) - anchorX
y0 = (item.attr('pos')[1] * screen_height) - anchorY

这段代码是将控件的相对坐标转换为绝对坐标。Airtest通过属性获取到均是相对坐标，而截图需要的是屏幕绝对坐标，因此需要转换。

if(x0 < 0 or y0 < 0 or (y0 + height) < last_py):
   continue
if(last_width > 0 and uic.isOverlap(last_px, last_py, last_width, last_height, x0, y0, width, height)):
   continue

有了统一的绝对坐标，就可以做坐标对比了，如果坐标在屏幕外或控件上方则认为是已遍历过控件。同样如果是重叠的，则认为是同一个控件。然后我们就可以做区域控件截图了。

def __saveCropScreen(self, screen, x, y, width, height, page, index):
    cropscreen = aircv.crop_image(screen, [x,y,x+width,y+height])
    path = self.datapath
    for dir in page.split('-'):
        path = os.path.join(path, dir)
    if(not os.path.exists(path)):
        os.makedirs(path)
    path = os.path.join(path, '{}.jpg'.format(index))
    aircv.imwrite(path, cropscreen, ST.SNAPSHOT_QUALITY, ST.IMAGE_MAXSIZE)
    return path

当把当前屏幕的控件遍历完之后，就需要上下或左右滑动一下以判断是否还存在新的控件。但是滑动幅度过大，可能把未遍历到的控件滑过，滑动幅度过小，又可能无法显示足够多的区域来显示下一个控件。因此我们每次滑动仅从最后一个控件开始滑动最后一个控件高度。但实际上这个距离是一个经验值，可能需要根据不同的应用做不一样的调整。

self.poco.swipe([last_px/screen_width, last_py/screen_height], direction=[0, -last_height/screen_height])

滑动之后需要用已发现最后一个控件的截图去新的屏幕上寻找其定位点，并以此更新last_px, last_py，以及一系列与最后一个控件相关的属性值。并像之前做过的那样，去判断新的控件相对于last控件的坐标关系。

screen = G.DEVICE.snapshot(os.path.join(self.datapath, page,"snap.png"))
if(last_path is not None):
   pos = uic.air_match_in(last_path, screen)
   if(pos is not None):
      last_px = pos[0] - last_width * 0.5
      last_py = pos[1] - last_height * 0.5
   else:
      last_px = 0
      last_py = 0
      last_width = 0
      last_height = 0
      if(last_px < 0):
         last_px = 0

air_match_in是基于Airtest封装的区域图片查找函数。代码如下：

def air_match_in(srcpath, screen):
   template = Template(srcpath)
   return template.match_in(screen)

那么什么时间结束遍历呢？当程序再也找不到新的控件时就可以了，为了防止某些特殊情况发生，可以再连续两次滑动后，均无新增控件发现再退出。完整代码如下：

from .confidence import UIConfidence as uic
from airtest.core.api import *
from airtest.aircv import *
import os
class UIManager:
    def __init__(self, poco, datapath):
        self.poco = poco
        self.datapath = datapath
        self.MAX_NOT_FOUND_TIMES = 2
        if(not os.path.exists(datapath)):
            os.makedirs(datapath)
    def getUseableAttrs(self, item):
        attrsArray = []
        if(item.attr('touchable')):
            attrsArray.append('touchable')
        if(item.attr('touchable')):
            attrsArray.append('touchable')
        if(item.attr('editalbe')):
            attrsArray.append('editalbe')
        return attrsArray
    def __getScreenSize(self):
        width = G.DEVICE.display_info['width']
        height = G.DEVICE.display_info['height']
        if(height > width):
            return width,height
        return height,width
    def __saveCropScreen(self, screen, x, y, width, height, page, index):
        cropscreen = aircv.crop_image(screen, [x,y,x+width,y+height])
        path = self.datapath
        for dir in page.split('-'):
            path = os.path.join(path, dir)
        if(not os.path.exists(path)):
            os.makedirs(path)
        path = os.path.join(path, '{}.jpg'.format(index))
        aircv.imwrite(path, cropscreen, ST.SNAPSHOT_QUALITY, ST.IMAGE_MAXSIZE)
        return path
    
    def parseLayout(self, page):
        index = 1
        last_px = 0
        last_py = 0
        last_width = 0
        last_height = 0
        last_path = None
        screen_width, screen_height = self.__getScreenSize()
        newItem = 0
        not_found_times = 0
        while(not_found_times < self.MAX_NOT_FOUND_TIMES):
            sleep(5)
            newItem = 0
            freeze = self.poco.freeze()
            if(not os.path.exists(os.path.join(self.datapath, page))):
                os.makedirs(os.path.join(self.datapath, page))
            screen = G.DEVICE.snapshot(os.path.join(self.datapath, page,"snap.png"))
            if(last_path is not None):
                pos = uic.air_match_in(last_path, screen)
                if(pos is not None):
                    print('match in', pos, last_px, last_py, last_width, last_height)
                    last_px = pos[0] - last_width * 0.5
                    last_py = pos[1] - last_height * 0.5
                else:
                    last_px = 0
                    last_py = 0
                    last_width = 0
                    last_height = 0
                if(last_px < 0):
                    last_px = 0
            for item in freeze().offspring():
                attrs = self.getUseableAttrs(item)
                if(attrs):
                    width = item.attr('size')[0] * screen_width
                    height = item.attr('size')[1] * screen_height
                    if(width == 0 or height == 0):
                        continue
                    anchorX = item.attr('anchorPoint')[0] * width
                    anchorY = item.attr('anchorPoint')[1] * height
                    x0 = (item.attr('pos')[0] * screen_width) - anchorX
                    y0 = (item.attr('pos')[1] * screen_height) - anchorY
                    if(x0 < 0 or y0 < 0 or (y0 + height) < last_py):
                        continue
                    if(last_width > 0 and uic.isOverlap(last_px, last_py, last_width, last_height, x0, y0, width, height)):
                        continue
                    last_px = x0
                    last_py = y0
                    last_width = width
                    last_height = height
                    path = self.__saveCropScreen(screen, x0, y0, width, height, page, index)
                    last_path = path
                    not_found_times = -1
                    # TODO 递归处理

                    index += 1
            if(not_found_times == -1):
                not_found_times = 0
            else:
                not_found_times += )
            self.poco.swipe([last_px/screen_width, last_py/screen_height], direction=[0, -last_height/screen_height])

外部调用时，只需要调用parseLayout即可，代码中存在TODO，还缺少了递归遍历的处理部分，后续会继续更新相关代码。代码已归档至Gitee 代码仓库。

背景

原理

实现