反汇编 libc.so 来分析 memcpy 导致的 crash

问题

客户反馈了一个 SDK 的 crash log,跑 monkey testing 报出来的问题。trace 信息如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
04-16 23:58:41.861 18741 18741 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7740a1d000
04-16 23:58:41.861 18741 18741 F DEBUG : x0 0000007740a1d000 x1 0000007791aa9d40 x2 0000000000f30000 x3 0000007740a1d000
04-16 23:58:41.861 18741 18741 F DEBUG : x4 00000077929d9d40 x5 000000774194d000 x6 0707070808080808 x7 0808080808080707
04-16 23:58:41.861 18741 18741 F DEBUG : x8 0000000000f30000 x9 0000000000000000 x10 0000000042ffc000 x11 0000000000f30000
04-16 23:58:41.861 18741 18741 F DEBUG : x12 0909090909090909 x13 0807080708070809 x14 0000000000000d80 x15 0000000000001200
04-16 23:58:41.861 18741 18741 F DEBUG : x16 000000785d698418 x17 00000078f2ad4380 x18 0000000000000076 x19 0000007853f80878
04-16 23:58:41.861 18741 18741 F DEBUG : x20 0000007853f804d0 x21 0000007740a1d000 x22 0000007791aa9d40 x23 0000000000001200
04-16 23:58:41.861 18741 18741 F DEBUG : x24 0000007853f8d020 x25 00000078156b0f80 x26 0000007853f80878 x27 000000780e20d1f0
04-16 23:58:41.861 18741 18741 F DEBUG : x28 0000000026deaf4d x29 0000007853f807d0
04-16 23:58:41.861 18741 18741 F DEBUG : sp 0000007853f801f0 lr 000000785ce7b90c pc 00000078f2ad4308
04-16 23:58:41.884 18741 18741 F DEBUG :
04-16 23:58:41.884 18741 18741 F DEBUG : backtrace:
04-16 23:58:41.884 18741 18741 F DEBUG : #00 pc 000000000007f308 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy+248) (BuildId: bf43d263cea44574bf5be4629e4f0a8f)
04-16 23:58:41.884 18741 18741 F DEBUG : #01 pc 000000000007b908 /vendor/lib64/libanc_nightshot.so (BuildId: eefc88116d665da17ed0d5d62b2403d72474a995)
04-16 23:58:41.884 18741 18741 F DEBUG : #02 pc 0000000000004e80 /vendor/lib64/camera/components/com.anc.node.nightshot.so (NightShotNodeProcRequest(ChiNodeProcessRequestInfo*)+1992) (BuildId: a999c7381a9344a26858c43375292779)
04-16 23:58:41.884 18741 18741 F DEBUG : #03 pc 00000000006d4020 /vendor/lib64/hw/camera.qcom.so (CamX::ChiNodeWrapper::ExecuteProcessRequest(CamX::ExecuteProcessRequestData*)+2640) (BuildId: cfa7472586f48d4f52f2b5d0d8629b60)
04-16 23:58:41.884 18741 18741 F DEBUG : #04 pc 000000000073d730 /vendor/lib64/hw/camera.qcom.so (CamX::Node::ProcessRequest(CamX::NodeProcessRequestData*, unsigned long)+9832) (BuildId: cfa7472586f48d4f52f2b5d0d8629b60)
04-16 23:58:41.884 18741 18741 F DEBUG : #05 pc 00000000006def80 /vendor/lib64/hw/camera.qcom.so (CamX::DeferredRequestQueue::DeferredWorkerWrapper(void*)+368) (BuildId: cfa7472586f48d4f52f2b5d0d8629b60)
04-16 23:58:41.884 18741 18741 F DEBUG : #06 pc 0000000000636a18 /vendor/lib64/hw/camera.qcom.so (CamX::ThreadCore::WorkerThreadBody(void*)+1992) (BuildId: cfa7472586f48d4f52f2b5d0d8629b60)
04-16 23:58:41.884 18741 18741 F DEBUG : #07 pc 00000000000e6b60 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+36) (BuildId: bf43d263cea44574bf5be4629e4f0a8f)
04-16 23:58:41.884 18741 18741 F DEBUG : #08 pc 0000000000084b6c /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+64) (BuildId: bf43d263cea44574bf5be4629e4f0a8f)

精简一下,我们需要关注内容如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
04-16 23:58:41.861 18741 18741 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7740a1d000
04-16 23:58:41.861 18741 18741 F DEBUG : x0 0000007740a1d000 x1 0000007791aa9d40 x2 0000000000f30000 x3 0000007740a1d000
04-16 23:58:41.861 18741 18741 F DEBUG : x4 00000077929d9d40 x5 000000774194d000 x6 0707070808080808 x7 0808080808080707
04-16 23:58:41.861 18741 18741 F DEBUG : x8 0000000000f30000 x9 0000000000000000 x10 0000000042ffc000 x11 0000000000f30000
04-16 23:58:41.861 18741 18741 F DEBUG : x12 0909090909090909 x13 0807080708070809 x14 0000000000000d80 x15 0000000000001200
04-16 23:58:41.861 18741 18741 F DEBUG : x16 000000785d698418 x17 00000078f2ad4380 x18 0000000000000076 x19 0000007853f80878
04-16 23:58:41.861 18741 18741 F DEBUG : x20 0000007853f804d0 x21 0000007740a1d000 x22 0000007791aa9d40 x23 0000000000001200
04-16 23:58:41.861 18741 18741 F DEBUG : x24 0000007853f8d020 x25 00000078156b0f80 x26 0000007853f80878 x27 000000780e20d1f0
04-16 23:58:41.861 18741 18741 F DEBUG : x28 0000000026deaf4d x29 0000007853f807d0
04-16 23:58:41.861 18741 18741 F DEBUG : sp 0000007853f801f0 lr 000000785ce7b90c pc 00000078f2ad4308
04-16 23:58:41.884 18741 18741 F DEBUG :
04-16 23:58:41.884 18741 18741 F DEBUG : backtrace:
04-16 23:58:41.884 18741 18741 F DEBUG : #00 pc 000000000007f308 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy+248) (BuildId: bf43d263cea44574bf5be4629e4f0a8f)
04-16 23:58:41.884 18741 18741 F DEBUG : #01 pc 000000000007b908 /vendor/lib64/libanc_nightshot.so (BuildId: eefc88116d665da17ed0d5d62b2403d72474a995)
04-16 23:58:41.884 18741 18741 F DEBUG : #02 pc 0000000000004e80 /vendor/lib64/camera/components/com.anc.node.nightshot.so (NightShotNodeProcRequest(ChiNodeProcessRequestInfo*)+1992) (BuildId: a999c7381a9344a26858c43375292779)

和 SDK 相关的只有 #01 pc 000000000007b908 /vendor/lib64/libanc_nightshot.so 这一行。使用 addr2line 定位到地址 7b908 这行代码是使用 memcpy 将 yuv 数据拷贝到客户设置的输出图的 buffer 里面。也和 trace 中紧接着 SDK 的 #00 pc 000000000007f308 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy+248) 对应了起来。

以前遇到的 crash 问题 trace 里会有多个 SDK 的地址信息,且一般都是 SDK 内部的问题。

但此时只有一行地址信息,而且是挂在 c++ 库函数 memcpy 里。

分析

既然咱们的库没有办法进行分析了,就在想能不能从 libc.so 下手,看一下导致 crash 的 libc.so 的地址 7f308fault addr 0x7740a1d000处到底发生了什么。

1.根据上面 trace 中的路径将客户手机中的 libc.so pull 出来,使用 objdump 将 libc.so 的汇编代码反汇编出来:

1
arm-linux-androideabi-objdump -dS libc.so > libc.dump

2.利用 trace 中的地址 7f308 来在 libc.dump 中定位导致 crash 的汇编代码,地址 7f308 处的汇编代码为:

1
7f308: a900340c stp x12, x13, [x0]

如何 double check 这一行汇编代码就是真正导致 crash 代码呢?有个办法可以简单验证下,这个是 trace 中关于 memcpy 的那一行:

1
#00 pc 000000000007f308 /apex/com.android.runtime/lib64/bionic/libc.so (__memcpy+248)

根据上面开头的地址,和结尾的偏移量。可以计算出 __memcpy 函数的起始地址是 7f210。(十六进制的 0x7f308 减去 十进制的 248 等于 十六进制的 7f210)

下面是反汇编出来的__memcpy 函数完整的汇编代码(Too long; Don’t read),可以看到 7f210 就是 __memcpy 的起始地址:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
000000000007f210 <__memcpy>:
7f210: f9800020 prfm pldl1keep, [x1]
7f214: 8b020024 add x4, x1, x2
7f218: 8b020005 add x5, x0, x2
7f21c: f100405f cmp x2, #0x10
7f220: 54000209 b.ls 7f260 <__memcpy+0x50>
7f224: f101805f cmp x2, #0x60
7f228: 54000648 b.hi 7f2f0 <__memcpy+0xe0>
7f22c: d1000449 sub x9, x2, #0x1
7f230: a9401c26 ldp x6, x7, [x1]
7f234: 37300469 tbnz w9, #6, 7f2c0 <__memcpy+0xb0>
7f238: a97f348c ldp x12, x13, [x4,#-16]
7f23c: 362800a9 tbz w9, #5, 7f250 <__memcpy+0x40>
7f240: a9412428 ldp x8, x9, [x1,#16]
7f244: a97e2c8a ldp x10, x11, [x4,#-32]
7f248: a9012408 stp x8, x9, [x0,#16]
7f24c: a93e2caa stp x10, x11, [x5,#-32]
7f250: a9001c06 stp x6, x7, [x0]
7f254: a93f34ac stp x12, x13, [x5,#-16]
7f258: d65f03c0 ret
7f25c: d503201f nop
7f260: f100205f cmp x2, #0x8
7f264: 540000e3 b.cc 7f280 <__memcpy+0x70>
7f268: f9400026 ldr x6, [x1]
7f26c: f85f8087 ldur x7, [x4,#-8]
7f270: f9000006 str x6, [x0]
7f274: f81f80a7 stur x7, [x5,#-8]
7f278: d65f03c0 ret
7f27c: d503201f nop
7f280: 361000c2 tbz w2, #2, 7f298 <__memcpy+0x88>
7f284: b9400026 ldr w6, [x1]
7f288: b85fc087 ldur w7, [x4,#-4]
7f28c: b9000006 str w6, [x0]
7f290: b81fc0a7 stur w7, [x5,#-4]
7f294: d65f03c0 ret
7f298: b4000102 cbz x2, 7f2b8 <__memcpy+0xa8>
7f29c: d341fc49 lsr x9, x2, #1
7f2a0: 39400026 ldrb w6, [x1]
7f2a4: 385ff087 ldurb w7, [x4,#-1]
7f2a8: 38696828 ldrb w8, [x1,x9]
7f2ac: 39000006 strb w6, [x0]
7f2b0: 38296808 strb w8, [x0,x9]
7f2b4: 381ff0a7 sturb w7, [x5,#-1]
7f2b8: d65f03c0 ret
7f2bc: d503201f nop
7f2c0: a9412428 ldp x8, x9, [x1,#16]
7f2c4: a9422c2a ldp x10, x11, [x1,#32]
7f2c8: a943342c ldp x12, x13, [x1,#48]
7f2cc: a97e0881 ldp x1, x2, [x4,#-32]
7f2d0: a97f0c84 ldp x4, x3, [x4,#-16]
7f2d4: a9001c06 stp x6, x7, [x0]
7f2d8: a9012408 stp x8, x9, [x0,#16]
7f2dc: a9022c0a stp x10, x11, [x0,#32]
7f2e0: a903340c stp x12, x13, [x0,#48]
7f2e4: a93e08a1 stp x1, x2, [x5,#-32]
7f2e8: a93f0ca4 stp x4, x3, [x5,#-16]
7f2ec: d65f03c0 ret
7f2f0: 92400c09 and x9, x0, #0xf
7f2f4: 927cec03 and x3, x0, #0xfffffffffffffff0
7f2f8: a940342c ldp x12, x13, [x1]
7f2fc: cb090021 sub x1, x1, x9
7f300: 8b090042 add x2, x2, x9
7f304: a9411c26 ldp x6, x7, [x1,#16]
7f308: a900340c stp x12, x13, [x0]
7f30c: a9422428 ldp x8, x9, [x1,#32]
7f310: a9432c2a ldp x10, x11, [x1,#48]
7f314: a9c4342c ldp x12, x13, [x1,#64]!
7f318: f1024042 subs x2, x2, #0x90
7f31c: 54000169 b.ls 7f348 <__memcpy+0x138>
7f320: a9011c66 stp x6, x7, [x3,#16]
7f324: a9411c26 ldp x6, x7, [x1,#16]
7f328: a9022468 stp x8, x9, [x3,#32]
7f32c: a9422428 ldp x8, x9, [x1,#32]
7f330: a9032c6a stp x10, x11, [x3,#48]
7f334: a9432c2a ldp x10, x11, [x1,#48]
7f338: a984346c stp x12, x13, [x3,#64]!
7f33c: a9c4342c ldp x12, x13, [x1,#64]!
7f340: f1010042 subs x2, x2, #0x40
7f344: 54fffee8 b.hi 7f320 <__memcpy+0x110>
7f348: a97c0881 ldp x1, x2, [x4,#-64]
7f34c: a9011c66 stp x6, x7, [x3,#16]
7f350: a97d1c86 ldp x6, x7, [x4,#-48]
7f354: a9022468 stp x8, x9, [x3,#32]
7f358: a97e2488 ldp x8, x9, [x4,#-32]
7f35c: a9032c6a stp x10, x11, [x3,#48]
7f360: a97f2c8a ldp x10, x11, [x4,#-16]
7f364: a904346c stp x12, x13, [x3,#64]
7f368: a93c08a1 stp x1, x2, [x5,#-64]
7f36c: a93d1ca6 stp x6, x7, [x5,#-48]
7f370: a93e24a8 stp x8, x9, [x5,#-32]
7f374: a93f2caa stp x10, x11, [x5,#-16]
7f378: d65f03c0 ret
7f37c: 00000000 .inst 0x00000000 ; undefined

3.搞清楚导致 crash 的汇编代码 7f308: a900340c stp x12, x13, [x0] 的含义

1
2
04-16 23:58:41.861 18741 18741 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7740a1d000
04-16 23:58:41.861 18741 18741 F DEBUG : x0 0000007740a1d000 x1 0000007791aa9d40 x2 0000000000f30000 x3 0000007740a1d000

从这两行 trace 中的 x0 0000007740a1d000fault addr 0x7740a1d000,可以发现导致 crash 的就是上面这行汇编代码中的 [x0]

下一步就是要搞明白 stp x12, x13, [x0]在执行什么操作

1
2
3
4
5
stp:入栈指令(`str` 的变种指令,可以同时操作两个寄存器),如:
stp x29, x30, [sp, #0x10]
将 x29, x30 的值存入 sp 偏移 16 个字节的位置

在网上查了一下 stp 指令,根据上面的描述大概可以猜出 stp x12, x13, [x0]是将 x12 和 x13 的值写入到 x0 处,但真正意图还是很模糊

4.结合 memcpy 的汇编源代码进行分析

在 Android 源码中找到的 memcpy 的汇编语言源程序

下面是反汇编出来的汇编代码和 Android 源码中的汇编源代码的开始部分:

1
2
3
4
5
6
7
7f210: f9800020 prfm pldl1keep, [x1]
7f214: 8b020024 add x4, x1, x2
7f218: 8b020005 add x5, x0, x2
7f21c: f100405f cmp x2, #0x10
7f220: 54000209 b.ls 7f260 <__memcpy+0x50>
7f224: f101805f cmp x2, #0x60
7f228: 54000648 b.hi 7f2f0 <__memcpy+0xe0>
1
2
3
4
5
6
7
prfm PLDL1KEEP, [src]
add srcend, src, count
add dstend, dstin, count
cmp count, 16
b.ls L(copy16)
cmp count, 96
b.hi L(copy_long)

对比下上面使用 objdump 出来的汇编代码和 Android 源码中的汇编源代码开始部分,代码是可以对应起来的。两份代码开头这里都是对 memcpy 的第三个参数 count 进行判断,SDK 中调用 memcpy 的地方拷贝长度远大于了 96,所以会走到 copy_long 里去,也就是 7f2f0 <__memcpy+0xe0>,接着没多久就会走到导致 crash 的 7f308: a900340c stp x12, x13, [x0] 处,对应到汇编源代码中就是 stp D_l, D_h, [dstin] 这一行:

1
2
3
4
5
6
7
8
9
10
11
12
7f2f0: 92400c09 and x9, x0, #0xf
7f2f4: 927cec03 and x3, x0, #0xfffffffffffffff0
7f2f8: a940342c ldp x12, x13, [x1]
7f2fc: cb090021 sub x1, x1, x9
7f300: 8b090042 add x2, x2, x9
7f304: a9411c26 ldp x6, x7, [x1,#16]
7f308: a900340c stp x12, x13, [x0]
7f30c: a9422428 ldp x8, x9, [x1,#32]
7f310: a9432c2a ldp x10, x11, [x1,#48]
7f314: a9c4342c ldp x12, x13, [x1,#64]!
7f318: f1024042 subs x2, x2, #0x90
7f31c: 54000169 b.ls 7f348 <__memcpy+0x138>
1
2
3
4
5
6
7
8
9
10
11
12
13
L(copy_long):
and tmp1, dstin, 15
bic dst, dstin, 15
ldp D_l, D_h, [src]
sub src, src, tmp1
add count, count, tmp1 /* Count is now 16 too large. */
ldp A_l, A_h, [src, 16]
stp D_l, D_h, [dstin]
ldp B_l, B_h, [src, 32]
ldp C_l, C_h, [src, 48]
ldp D_l, D_h, [src, 64]!
subs count, count, 128 + 16 /* Test and readjust count. */
b.ls 2f

所以就是 stp D_l, D_h, [dstin] 这一行导致了 crash,从汇编源代码前面的宏定义可以看到 #define dstin x0,所以 dstin 就是 x0,而且 x0 就是导致 crash 的 fault addr 0x7740a1d000 ,从 trace 中可以看到这两点。

接着从以下三条线索可以得知导致 crash 的 dstin 是 memcpy 要写入的目的地址。

  1. 结合 Android 源码中 memcpy 的汇编源代码逻辑和 dstin 的命名可以得知 dstin 就是要将 src 写入到的目的地址。
  2. dstin 和 memcpy 函数定义中的目标地址命名一致:void *memcpy(void *destin, void *source, unsigned n)
  3. stp 指令要做的事情就是将值写入到目标地址中。而前几行的 ldp 指令做的事情就是从 src 中将值读出。

5.接下来要做的事情就是确认 SDK 内调用 memcpy 传入的目标地址是什么情况。经检查,目标地址不是 SDK 内部进行分配和管理的,而是由客户传进来的。

结论

所以 crash 的原因是外部分配的地址在 monkey testing 中出现了问题,导致 SDK 在进行 memcpy 将输出拷贝给客户时出现了 crash。

后续

将上面的分析结果发给客户后,客户进行了验证,注释掉在 com.anc.node.nightshot.so 中调用 SDK 接口导致 crash 的代码,直接使用 memcpy,也复现了该问题。证明了该 crash 和 SDK 无关,确实是 memcpy 要写入的目标地址出了问题,最终也解决了此问题。

参考

完整的 Android 源码中的 __memcpy 汇编源代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
#include <private/bionic_asm.h>
#define dstin x0
#define src x1
#define count x2
#define dst x3
#define srcend x4
#define dstend x5
#define A_l x6
#define A_lw w6
#define A_h x7
#define A_hw w7
#define B_l x8
#define B_lw w8
#define B_h x9
#define C_l x10
#define C_h x11
#define D_l x12
#define D_h x13
#define E_l src
#define E_h count
#define F_l srcend
#define F_h dst
#define tmp1 x9
#define L(l) .L ## l
/* Copies are split into 3 main cases: small copies of up to 16 bytes,
medium copies of 17..96 bytes which are fully unrolled. Large copies
of more than 96 bytes align the destination and use an unrolled loop
processing 64 bytes per iteration.
Small and medium copies read all data before writing, allowing any
kind of overlap, and memmove tailcalls memcpy for these cases as
well as non-overlapping copies.
*/
prfm PLDL1KEEP, [src]
add srcend, src, count
add dstend, dstin, count
cmp count, 16
b.ls L(copy16)
cmp count, 96
b.hi L(copy_long)
/* Medium copies: 17..96 bytes. */
sub tmp1, count, 1
ldp A_l, A_h, [src]
tbnz tmp1, 6, L(copy96)
ldp D_l, D_h, [srcend, -16]
tbz tmp1, 5, 1f
ldp B_l, B_h, [src, 16]
ldp C_l, C_h, [srcend, -32]
stp B_l, B_h, [dstin, 16]
stp C_l, C_h, [dstend, -32]
1:
stp A_l, A_h, [dstin]
stp D_l, D_h, [dstend, -16]
ret
.p2align 4
/* Small copies: 0..16 bytes. */
L(copy16):
cmp count, 8
b.lo 1f
ldr A_l, [src]
ldr A_h, [srcend, -8]
str A_l, [dstin]
str A_h, [dstend, -8]
ret
.p2align 4
1:
tbz count, 2, 1f
ldr A_lw, [src]
ldr A_hw, [srcend, -4]
str A_lw, [dstin]
str A_hw, [dstend, -4]
ret
/* Copy 0..3 bytes. Use a branchless sequence that copies the same
byte 3 times if count==1, or the 2nd byte twice if count==2. */
1:
cbz count, 2f
lsr tmp1, count, 1
ldrb A_lw, [src]
ldrb A_hw, [srcend, -1]
ldrb B_lw, [src, tmp1]
strb A_lw, [dstin]
strb B_lw, [dstin, tmp1]
strb A_hw, [dstend, -1]
2: ret
.p2align 4
/* Copy 64..96 bytes. Copy 64 bytes from the start and
32 bytes from the end. */
L(copy96):
ldp B_l, B_h, [src, 16]
ldp C_l, C_h, [src, 32]
ldp D_l, D_h, [src, 48]
ldp E_l, E_h, [srcend, -32]
ldp F_l, F_h, [srcend, -16]
stp A_l, A_h, [dstin]
stp B_l, B_h, [dstin, 16]
stp C_l, C_h, [dstin, 32]
stp D_l, D_h, [dstin, 48]
stp E_l, E_h, [dstend, -32]
stp F_l, F_h, [dstend, -16]
ret
/* Align DST to 16 byte alignment so that we don't cross cache line
boundaries on both loads and stores. There are at least 96 bytes
to copy, so copy 16 bytes unaligned and then align. The loop
copies 64 bytes per iteration and prefetches one iteration ahead. */
.p2align 4
L(copy_long):
and tmp1, dstin, 15
bic dst, dstin, 15
ldp D_l, D_h, [src]
sub src, src, tmp1
add count, count, tmp1 /* Count is now 16 too large. */
ldp A_l, A_h, [src, 16]
stp D_l, D_h, [dstin]
ldp B_l, B_h, [src, 32]
ldp C_l, C_h, [src, 48]
ldp D_l, D_h, [src, 64]!
subs count, count, 128 + 16 /* Test and readjust count. */
b.ls 2f
1:
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [src, 16]
stp B_l, B_h, [dst, 32]
ldp B_l, B_h, [src, 32]
stp C_l, C_h, [dst, 48]
ldp C_l, C_h, [src, 48]
stp D_l, D_h, [dst, 64]!
ldp D_l, D_h, [src, 64]!
subs count, count, 64
b.hi 1b
/* Write the last full set of 64 bytes. The remainder is at most 64
bytes, so it is safe to always copy 64 bytes from the end even if
there is just 1 byte left. */
2:
ldp E_l, E_h, [srcend, -64]
stp A_l, A_h, [dst, 16]
ldp A_l, A_h, [srcend, -48]
stp B_l, B_h, [dst, 32]
ldp B_l, B_h, [srcend, -32]
stp C_l, C_h, [dst, 48]
ldp C_l, C_h, [srcend, -16]
stp D_l, D_h, [dst, 64]
stp E_l, E_h, [dstend, -64]
stp A_l, A_h, [dstend, -48]
stp B_l, B_h, [dstend, -32]
stp C_l, C_h, [dstend, -16]
ret