先说结论

当在 Java 代码中使用 ByteBuffer.allocateDirect 分配内存，并在 JNI 中使用 GetDirectBufferAddress 来获取内存做一些操作后。最后在 Java 代码中操作 byte[] 数据的时候，不要使用 ByteBuffer.array() 来获取数据，而应该使用 ByteBuffer.get() 函数。

// step1: 在 Java 分配内存
ByteBuffer output = ByteBuffer.allocateDirect(OUT_PUT_SIZE);
// step2: 在 JNI 获取内存地址，并对此内存写入数据
auto outPixels = (unsigned char *) env->GetDirectBufferAddress(output);
// error step3: java code
saveYuv(output.array()); // error!
// step3: 在 Java 保存数据
// method 1：没验证过，但从下面的分析来看是可行的
byte[] totalData = output.array();
int offset = output.arrayOffset();
saveYuv(totalData[offset], totalData[OUT_PUT_SIZE + offset - 1]); // 伪代码
// method 2：经过验证是可行的
byte[] realData = new byte[OUT_PUT_SIZE];
output.get(realData);
saveYuv(realData);

遇到的 bug

有个项目我们在调试效果的过程中，发现客户反馈的 JPG 和我们 SDK dump 出来的输出图，算法效果是一致的，但是客户的图像有一点点偏移，两张图没办法完全重叠在一起。通过使用 YUVView 来查看具体的像素发现是 SDK 原图最右边四列的像素被移到了客户图的最左边。

1	00 00 00 00 9F 9D 9D 9B

如上，通过查看客户保存的 YUV 的原始数据，发现在一开始的地方多了四个字节的空数据。

做了个简单的验证，把多出来的 4 个字节删掉，并在 YUV 数据的结尾处添加 4 个字节的数据。然后再查看 YUV，发现此时正常了。

所以问题的原因在这多出来的四个字节的数据。

使用 ByteBuffer 在 Java 代码中分配 native 内存

先来看一个数据：

1 亿像素的输出图尺寸：11672*8756，YUV 数据内存需要 153 M。

2.5 亿像素的输出图尺寸：18432*13824，YUV 数据内存高达 382 M。

所以我们需要在 APP 中使用 native 内存。再来看下使用 ByteBuffer.allocateDirect 分配的内存的回收机制：

Direct Memory 的回收机制：Direct Memory 是受 GC 控制的，例如 ByteBuffer bb = ByteBuffer.allocateDirect(1024)，这段代码的执行会在堆外占用 1kb 的内存，Java 堆内只会占用一个 bb 对象的指针引用的大小，堆外的这 1k 的空间只有当 bb 对象被回收时，才会被回收。

很完美，适合我们想要在 APP 中分配大容量内存的需求。

用法很简单，我们在 Java 代码中使用 ByteBuffer.allocateDirect 来分配输出图的内存，然后将 ByteBuffer 对象传递到 JNI，在 JNI 代码中使用 env->GetDirectBufferAddress 函数来获取 buffer 的地址。后续将这个 buffer 传给 SDK 即可，SDK 会将输出图的 YUV 数据放在这段 buffer 中。算法运行完后，客户在 Java 代码中使用 output.array() 来获取 YUV 数据。

// step1: java code
ByteBuffer output = ByteBuffer.allocateDirect(OUT_PUT_SIZE);
// step2: jni code
auto outPixels = (unsigned char *) env->GetDirectBufferAddress(output);
// step3: java code
saveYuv(output.array());

一切看起来都很正常。

空数据从哪里来的？

一切看起来都很正常，只剩一个问题。空数据是怎么多出来的？

先看一下 Android Q 上 ByteBuffer.allocateDirect 的实现代码：

public static ByteBuffer allocateDirect(int capacity) {
    // Android-changed: Android's DirectByteBuffers carry a MemoryRef.
    // return new DirectByteBuffer(capacity);
    DirectByteBuffer.MemoryRef memoryRef = new DirectByteBuffer.MemoryRef(capacity);
    return new DirectByteBuffer(capacity, memoryRef);
}

次方法会返回一个 ByteBuffer 的子类 DirectByteBuffer，且从注释中可以看到在 Android 上 Google 做了一些改动，使用了 MemoryRef 。

接着看一下上面调用的 MemoryRef 的构造函数和 DirectByteBuffer 的构造函数：

MemoryRef(int capacity) {
    VMRuntime runtime = VMRuntime.getRuntime();
    buffer = (byte[]) runtime.newNonMovableArray(byte.class, capacity + 7);
    allocatedAddress = runtime.addressOf(buffer);
    // Offset is set to handle the alignment: http://b/16449607
    offset = (int) (((allocatedAddress + 7) & ~(long) 7) - allocatedAddress);
    isAccessible = true;
    isFreed = false;
    originalBufferObject = null;
}
DirectByteBuffer(int capacity, MemoryRef memoryRef) {
    super(-1, 0, capacity, capacity, memoryRef.buffer, memoryRef.offset);
     // Only have references to java objects, no need for a cleaner since the GC will do all the work.
     this.memoryRef = memoryRef;
     this.address = memoryRef.allocatedAddress + memoryRef.offset;
     cleaner = null;
     this.isReadOnly = false;
}

主要关注三个变量

allocatedAddress：这字面意思来看，这是分配内存的起始地址。并且可以看到并非只分配了 capacity 字节的内存，而且分配了 capacity + 7 个字节的内存。至于为什么要多分配 7 个字节，接着往下看。
offset：从注释可以看出，代码会对起始地址做一个对齐。从 (int) (((allocatedAddress + 7) & ~(long) 7) - allocatedAddress); 这行代码来看是对做了 8 字节的对齐，计算结果为对齐后的地址相对分配的起始地址的偏移量，存在 offset 中。关于这行代码为什么是计算的 8 字节对齐后的偏移量，可以自己写一些地址值带入公式做位运算计算一下，也可以直接写 Java 代码验证一下，结尾会给一个验证结果。由于是对地址做 8 字节对齐，所以可能的偏移量为 0 ~ 7，所以在分配内存的时候，只需要多分配 7 个字节，就可以满足所有情况。（大部分情况下还有点浪费，哈哈。）
address：真正使用的起始地址，从代码可以看出，它是由分配的起始地址 + 偏移量计算得出的，所以这个地址是 8 字节对齐了的。

看到这三个变量的时候，我们心里应该已经有了一个初步的猜测结论了：我们使用 arrry() 函数拿到的 byte[] 数据是从 allocatedAddress 处开始的，而 JNI 代码中通过 GetDirectBufferAddress 拿到的以及后续操作的指针地址是从 address 开始的。所以客户最后保存的 YUV 多出来的数据，其实就是 offset 的这几个字节。

继续看源码来验证猜测。

ByteBuffer 的 array() 函数，返回了 hb：

public final byte[] array() {
    if (hb == null)
        throw new UnsupportedOperationException();
    if (isReadOnly)
        throw new ReadOnlyBufferException();
    return hb;
}

hb 是在哪里赋值的呢？在 ByteBuffer 的构造函数中会对 hb 赋值，这个函数会在什么时候调用呢？前面贴出来的 DirectByteBuffer 的构造函数代码，它会通过调用super(-1, 0, capacity, capacity, memoryRef.buffer, memoryRef.offset); 调用到 ByteBuffer 的构造函数，证实了 hb 就是从 allocatedAddress 处开始的数组，且数组开始处包含 offset 个无用数据，结尾处包含 7 - offset 个无用数据，整个数组的长度为我们指定的大小 + 7。证明了文章开始结论中提到的方法 1 是可行的。

ByteBuffer(int mark, int pos, int lim, int cap, byte[] hb, int offset)
{
    super(mark, pos, lim, cap, 0 /* elementSizeShift */);
    this.hb = hb;
    this.offset = offset;
}

为什么 JNI 获取到的地址是对齐后的地址 address 呢？

1.首先我在 DirectByteBuffer 代码中发现了如下函数，返回的是对齐后的地址 address。

1
2
3

public final long address() {
    return address;
}

2.allocatedAddress 是 DirectByteBuffer 内部类 MemoryRef 的成员，不像 address 是 ByteBuffer 的成员。并且没有看到任何接口可以获取 allocatedAddress 。

为什么要对地址做 8 字节对齐？

1.为什么要对地址做 8 字节对齐？
对齐可以提高内存系统的性能，可以阅读《深入理解计算机系统第三版》3.9.3 数据对齐这一节。

2.验证 DirectByteBuffer 的内部类 MemoryRef 对地址做 8 字节对齐的测试程序和验证结果：

public class Main {
    public static int calOffset(long address) {
	    int offset = (int)(((address + 7) &~ (long) 7) - address);
	    return offset;
    }
    
    public static void main(String []args) {
       System.out.println("offset off 0x00000000: " + calOffset(0x00000000));
       System.out.println("offset off 0x00000001: " + calOffset(0x00000001));
       System.out.println("offset off 0x00000002: " + calOffset(0x00000002));
       System.out.println("offset off 0x00000003: " + calOffset(0x00000003));
       System.out.println("offset off 0x00000004: " + calOffset(0x00000004));
       System.out.println("offset off 0x00000005: " + calOffset(0x00000005));
       System.out.println("offset off 0x00000006: " + calOffset(0x00000006));
       System.out.println("offset off 0x00000007: " + calOffset(0x00000007));
       System.out.println("------------");
       System.out.println("offset off 0x00000008: " + calOffset(0x00000008));
       System.out.println("offset off 0x00000009: " + calOffset(0x00000009));
       System.out.println("offset off 0x0000000a: " + calOffset(0x0000000a));
       System.out.println("offset off 0x0000000b: " + calOffset(0x0000000b));
       System.out.println("offset off 0x0000000c: " + calOffset(0x0000000c));
       System.out.println("offset off 0x0000000d: " + calOffset(0x0000000d));
       System.out.println("offset off 0x0000000e: " + calOffset(0x0000000e));
       System.out.println("offset off 0x0000000f: " + calOffset(0x0000000f));
       System.out.println("------------");
       System.out.println("offset off 0x00000010: " + calOffset(0x00000010));
    }
}

// output:
offset off 0x00000000: 0
offset off 0x00000001: 7
offset off 0x00000002: 6
offset off 0x00000003: 5
offset off 0x00000004: 4
offset off 0x00000005: 3
offset off 0x00000006: 2
offset off 0x00000007: 1
------------
offset off 0x00000008: 0
offset off 0x00000009: 7
offset off 0x0000000a: 6
offset off 0x0000000b: 5
offset off 0x0000000c: 4
offset off 0x0000000d: 3
offset off 0x0000000e: 2
offset off 0x0000000f: 1
------------
offset off 0x00000010: 0