android graphic(15)—fence
首先,fence的产生和GPU有很大的关系,下面是wiki上GPU的介绍。
A graphics processing unit (GPU), also occasionally called visual processing unit (VPU), is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and game consoles. Modern GPUs are very efficient at manipulating computer graphics and image processing, and their highly parallel structure makes them more effective than general-purpose CPUs for algorithms where the processing of large blocks of visual data is done in parallel.
GPU的产生就是为了加速图形显示到display的过程。现在广泛使用在嵌入式设备,手机,pc,服务器,游戏机等上。GPU在图形处理上非常高效,此外它的并行架构使得在处理大规模的并行数据上性能远超CPU,以前上学的时候做过CUDA相关的东西,印象很深,对于并行数据处理能力提升了起码10倍。
而CPU和GPU两个硬件是异步的,当使用opengl时,首先在CPU上调用一系列gl命令,然后这些命令去GPU执行真正的绘图过程,绘图何时结束,CPU根本不知道,当然可以让CPU阻塞等待GPU绘图完成,然后再去处理后续工作,但是这样效率就太低了。
下面的例子非常形象,说明了fence在GPU和CPU之间协调工作,fence让GPU和CPU并行运行,提高了图像显示的速度。
For example, an application may queue up work to be carried out in the GPU. The GPU then starts drawing that image. Although the image hasn’t been drawn into memory yet, the buffer pointer can still be passed to the window compositor along with a fence that indicates when the GPU work will be finished. The window compositor may then start processing ahead of time and hand off the work to the display controller. In this manner, the CPU work can be done ahead of time. Once the GPU finishes, the display controller can immediately display the image.
fence如何使用
一般fence的使用方法如下,
//首先创建一个EGLSyncKHR 同步对象
EGLSyncKHR sync = eglCreateSyncKHR(dpy,
EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
if (sync == EGL_NO_SYNC_KHR) {
ST_LOGE("syncForReleaseLocked: error creating EGL fence: %#x",
eglGetError());
return UNKNOWN_ERROR;
}
//将opengl cmd 缓冲队列中的cmd全部flush去执行,而不用去等cmd缓冲区满了再执行
glFlush();
//将同步对象sync转换为fencefd
int fenceFd = eglDupNativeFenceFDANDROID(dpy, sync);
eglDestroySyncKHR(dpy, sync);
if (fenceFd == EGL_NO_NATIVE_FENCE_FD_ANDROID) {
ST_LOGE("syncForReleaseLocked: error dup'ing native fence "
"fd: %#x", eglGetError());
return UNKNOWN_ERROR;
}
//利用fencefd,新建一个fence对象
sp fence(new Fence(fenceFd));
//将新创建的fence和老的fence merge
status_t err = addReleaseFenceLocked(mCurrentTexture,
mCurrentTextureBuf, fence);
其中,addReleaseFenceLocked为
status_t ConsumerBase::addReleaseFenceLocked(int slot,
const sp graphicBuffer, const sp& fence) {
CB_LOGV("addReleaseFenceLocked: slot=%d", slot);
// If consumer no longer tracks this graphicBuffer, we can safely
// drop this fence, as it will never be received by the producer.
if (!stillTracking(slot, graphicBuffer)) {
return OK;
}
//老的fence为null,直接赋值
if (!mSlots[slot].mFence.get()) {
mSlots[slot].mFence = fence;
} else {
//否则执行merge
sp mergedFence = Fence::merge(
String8::format("%.28s:%d", mName.string(), slot),
mSlots[slot].mFence, fence);
if (!mergedFence.get()) {
CB_LOGE("failed to merge release fences");
// synchronization is broken, the best we can do is hope fences
// signal in order so the new fence will act like a union
mSlots[slot].mFence = fence;
return BAD_VALUE;
}
mSlots[slot].mFence = mergedFence;
}
return OK;
}
关于Fence对象,只有当mFenceFd不等于-1的时候才是有效的fence,即可以起到“拦截”作用,让CPU和GPU进行同步。
//NO_FENCE对应的mFenceFd为-1
const sp Fence::NO_FENCE = sp(new Fence);
Fence::Fence() :
mFenceFd(-1) {
}
Fence::Fence(int fenceFd) :
mFenceFd(fenceFd) {
}
而Fence这个类,由于实现了Flattenable协议,所以可以利用binder传递。
Most recent Android devices support the “sync framework”. This allows the system to do some nifty thing when combined with hardware components that can manipulate graphics data asynchronously. For example, a producer can submit a series of OpenGL ES drawing commands and then enqueue the output buffer before rendering completes. The buffer is accompanied by a fence that signals when the contents are ready. A second fence accompanies the buffer when it is returned to the free list, so that the consumer can release the buffer while the contents are still in use. This approach improves latency and throughput as the buffers move through the system.
上面这段话结合BufferQueue的生产者和消费者模式更容易理解,描述了fence如何提升graphic的显示性能。生产者利用opengl绘图,不用等绘图完成,直接queue buffer,在queue buffer的同时,需要传递给BufferQueue一个fence,而消费者acquire这个buffer后同时也会获取到这个fence,这个fence在GPU绘图完成后signal。这就是所谓的“acquireFence”,用于生产者通知消费者生产已完成。
当消费者对acquire到的buffer做完自己要做的事情后(例如把buffer交给surfaceflinger去合成),就要把buffer release到BufferQueue的free list,由于该buffer的内容可能正在被surfaceflinger使用,所以release时也需要传递一个fence,用来指示该buffer的内容是否依然在被使用,接下来生产者在继续dequeue buffer时,如果dequeue到了这个buffer,在使用前先要等待该fence signal。这就是所谓的“releaseFence”,后者用于消费者通知生产者消费已完成。
一般来说,fence对象(new Fence)在一个BufferQueue对应的生产者和消费者之间通过binder传递,不会在不同的BufferQueue中传递(但是对利用overlay合成的layer,其所对应的acquire fence,会被传递到HWComposer中,因为overlay直接会由hal层的hwcomposer去合成,其使用的graphic buffer是上层surface中render的buffer,如果上层surface使用opengl合成,那么在hwcomposer对overlay合成前先要保证render完成(画图完成),即在hwcomposer中等待这个fence触发,所以fence需要首先被传递到hal层,但是这个fence的传递不是通过BufferQueue的binder传递,而是利用具体函数去实现,后续有分析)。
由于opengl的实现分为软件和硬件,所以下面结合代码分别分析。
软件实现的opengl
opengl的软件实现,也就是agl,虽然4.4上已经舍弃了,但是在一个项目中由于没有GPU,overlay,所以只能使用agl去进行layer的合成。agl的eglCreateSyncKHR函数如下,其中的注释写的很清晰,agl是同步的,因为不牵扯GPU。所以通过agl创建的fence的mFenceFd都是-1。
EGLSyncKHR eglCreateSyncKHR(EGLDisplay dpy, EGLenum type,
const EGLint *attrib_list)
{
if (egl_display_t::is_valid(dpy) == EGL_FALSE) {
return setError(EGL_BAD_DISPLAY, EGL_NO_SYNC_KHR);
}
if (type != EGL_SYNC_FENCE_KHR ||
(attrib_list != NULL && attrib_list[0] != EGL_NONE)) {
return setError(EGL_BAD_ATTRIBUTE, EGL_NO_SYNC_KHR);
}
if (eglGetCurrentContext() == EGL_NO_CONTEXT) {
return setError(EGL_BAD_MATCH, EGL_NO_SYNC_KHR);
}
// AGL is synchronous; nothing to do h