Monday, November 3, 2008

Sharing Blocks of Memory between Kernel Mode Process and the User Mode Process in Windows Embedded CE 6.0

There are certain cases that we may need to share the same memory between kernel process and user process instead of copying the kernel memory data to user memory data or vice versa to increase the performance or process the data in a real time manner.
For Example, consider the scenario on the camera based application. A camera has sending the frames in 30 Frames per second. Approximately 33 ms is the interval between the 2 frames. The camera driver stores the captured data (640*480*8bits) within 5 ms for every frame and the remaining 28 ms has been left to process the data in the application for the required output. The driver has to store all the required frames as blocks of memory. It means if the application need 60 frames, the driver has to store the 60 Frames in the memory contiguously.
In this case, maintaining two large amount of memory in an embedded product will create scarcity of memory also this is not an efficient design. Unlike windows CE 5.0, directly accessing the kernel mode memory by the user process is restricted in windows CE 6.0 to avoid security vulnerability. However Windows Embedded CE 6.0 has introduced a new set of APIs that allows sharing the memory blocks between the kernel mode process and the user mode process in the secured manner.
VirtualAllocEX, VirtualCopyEx and VirtualFreeEX are newly introduced APIs in Windows CE 6.0 and allows you to share the memory between the kernel and the user mode processes. I can explain you the implementation through the sample stream driver and a user application. The virtual address space is allocated to the process id given as an argument to VirtualAllocEX (). In our example, virtual address space is allocated to user process area or application process area. You can print the address to find the address range. It should be less than 2 GB (user address space).
Allocating, sharing and freeing the memory address has been implemented through the IOCTL calls. The following source code explains the implementation.
DWORD SMD_IOControl (DWORD dwOpen, DWORD dwCode, PDWORD pIn, DWORD dwIn,
PDWORD pOut, DWORD dwOut, DWORD *pdwBytesWritten)
{
RETAILMSG(1, (TEXT("SMD_IOControl++ dwCode: 0x%x\r\n"),dwCode));
switch (dwCode)
{
case IOCTL_SMD_ALLOC_ADDRESS:
{
*pOut=(DWORD)GetVirtualAddress();
*pdwBytesWritten=4;
dwOut=4;
RETAILMSG (1, (TEXT("SMD_IOControl: Address=0x%x\r\n"), *pOut));
}
break;
case IOCTL_SMD_FREE_ADDRESS:
{
void* pvProcess=(void*)GetCallerVMProcessId();
VirtualFreeEx(pvProcess,(PVOID)((ULONG)gUserAddr & ~(ULONG)(PAGE_SIZE - 1)), 0,MEM_RELEASE);
gUserAddr=NULL;
}
break;
case IOCTL_SMD_FILL_ADDRESS:
{
char* Temp= (char*)((char*)gUserAddr+0x1F00000);
NKDbgPrintfW(L"Address=0x%x\r\n",Temp);
Strcpy(Temp," Hello World");
NKDbgPrintfW(L"Written value =%s\r\n",Temp);
}
break;

default:
{
DEBUGMSG (1, (TEXT("SMD_IOControl: unknown code %x\r\n"), dwCode));
}
return FALSE;
}

RETAILMSG(1, (TEXT("SMD_IOControl Finished--\r\n")));
return TRUE;

}

LPVOID GetVirtualAddress()
{
DWORD sDevPhysAddr = 0x81F00000;
DWORD dwSize = 0x02000000;
LPVOID lpUserAddr;
ULONG SourceSize;
ULONGLONG SourcePhys;
void* pvProcess=(void*)GetCallerVMProcessId();
SourcePhys = sDevPhysAddr & ~(PAGE_SIZE - 1);
SourceSize = dwSize + (sDevPhysAddr & (PAGE_SIZE - 1));
lpUserAddr = VirtualAllocEx(pvProcess, 0, SourceSize, MEM_RESERVE,PAGE_NOACCESS);
if (lpUserAddr == NULL) {
return NULL;
}
if (!VirtualCopyEx(pvProcess, lpUserAddr, GetCurrentProcess(), (PVOID)
SourcePhys, SourceSize,
PAGE_READWRITE PAGE_NOCACHE)) {
return NULL;
}
NKDbgPrintfW(L"Before round up lpUserAddr=0x%x\r\n",lpUserAddr);
lpUserAddr=(LPVOID)((ULONG)lpUserAddr+(sDevPhysAddr & (PAGE_SIZE - 1)));
gUserAddr=lpUserAddr;
NKDbgPrintfW(L"After round up gUserAddr=0x%x\r\n",gUserAddr);
return lpUserAddr;
}

IOCTL_SMD_ALLOC_ADDRESS: This Ioctl allocates blocks of memory and return the starting address of the allocated area to the user mode process (Application).
IOCTL_SMD_FREE_ADDRESS: This IOCtl frees the allocated shared memory area using VirtualFreeEX ().
IOCTL_SMD_FILL_ADDRESS: Just fill the data at the end of the large block of memory. These data can be printed by the application for the proof of concept of shared memory implementation.
GetVirtualAddress (): It allocates 32 MB of large memory, which can be shared by the kernel mode process and the user mode process. This function calls the GetCallerVMProcessId () to get the caller process id given as an argument to VirtualAllocEx () and VirtualCopyEX (). The caller process id is the application process id or user mode process id in our case. VirtualAllocEx takes the caller process id and the current process id as an argument and allocates the shared memory. Usage of the VirtualAllocEx and VirtualCopyEx are very similar to the VirtualAlloc and VirtualCopy.
The following is the sample application access the memory allocated by the sample memory driver.
HANDLE SMDDrv;
// Open the sample memory driver
SMDDrv=CreateFile(L"SMD1:", //file name to be opened
GENERIC_WRITEGENERIC_READ, //openign he fiel in read mode
0, //no share mode
NULL, //default security
OPEN_EXISTING, //opening an existing file
0, //non overlapped mode
NULL //no any template file
);
if(SMDDrv==INVALID_HANDLE_VALUE)
{
printf(":SMD open failed\r\n");
}
else
{
printf(":SMD open success\r\n");
}
//Get the allocated shared pointer
DeviceIoControl(SMDDrv,IOCTL_SMD_ALLOC_ADDRESS,NULL,0,&Address,sizeof(DWORD),&BytesWritten,NULL);
char* Data=(char*)Address;
printf("Address:0x%x",Data);
//Request the driver to fill the data
DeviceIoControl(SMDDrv,IOCTL_SMD_FILL_ADDRESS,NULL,0,NULL,0,NULL,NULL);
// Move the pointer to the filled data area
Data+=0x1F00000;
//Just print the data for proof of concept
printf("Data:%s",Data);
// Free the memory
DeviceIoControl(SMDDrv,IOCTL_SMD_FREE_ADDRESS,NULL,0,NULL,0,NULL,NULL);
//Close the driver handle
CloseHandle(SMDDrv);
I have given the simple application that opens the driver, get the large memory pointer, move to the data area filled by the driver, print the data and close the driver.
Hope this blog is useful for architects and developers, who are all in need of sharing the large blocks of memory between the user and kernel mode processes.

About Me:
I am currently working with e-con systems India which offers product design services, BSP development, Device driver development on Windows CE. we are specialized in camera drivers.

9 comments:

Jovan said...

Did You have complete source code for this examples?

Martin said...

Thanks a lot, it´s very useful for me.

Anonymous said...

is there something like this in windows ddk or sdk?

aparna

Vinoth said...

i have published the source code for this driver in codeplex, here is the link.
https://sampmemdrv.codeplex.com/

EvD said...

Dear Mr Vinoth,

The code on CodePlex is not up to date.

The Virtualfree has not been updated to VirtalFreeEx.

Tested under WEC2013 and works well!

Thanks,

EvD



Vinoth said...

Thank you for pointing it down EvD. I have uploaded the correct source code again in the same link.

Anonymous said...

Thank you Vinoth for sharing the blog.

I have a question though, even when we are using the VirtualAllocEx function we are still reserving 32 MB of the process's virtual memory, which would eventually get mapped to a block of the system's physical RAM when we start using it. Then how does this approach help in alleviating the issue of reserving/blocking physical memory for a large chunk of data?

By using VirtualAllocEx, does this issue get solved by the On_Demand Memory mapping feature of Windows Ce? So by VirtualAlloc we have just reserved the virtual memory but only when we access it, it gets mapped and committed in physical memory? And till that time the physical memory is available for other allocations?

Even then we would end up using the physical memory to store 64*480*60 bytes when we start storing the data from the camera driver.

Please throw some more light and correct gaps in my understanding...

PS: I had posted a query to Windows Compact MSDN forum as well and you had directed me to this post, I used this info to create a High level design of my system, but now that I am implementing I have this doubt

Vinoth said...

Any Virtual Memory that you are requesting from OS will be utilized from the per process virtual memory. VirtualAllocEX and VirtualCopyEx function are having a special advantage than other Memory allocation functions such as memory mapped files etc. you can request the available memory only from the kernel for Memory Mapped files. But we can allocate virtual memory for a contiguous physical memory which is not mapped (not known) to kernel at all. Say for example my design requires RAM more than 512 MB and in CE 6.0 the maximum memory mapping through OEMAddrtable is 512 MB which includes other peripherals that need a memory mapping. May be 300 or 400 MB of RAM mapped through OEMAddrtable but to map the remaining RAM, we can use VirtualAllocEX and VirtualCopyEx that too shared between a two processes for the known physical address. As explained in the above blog, it will help to design a custom product that requires large amount of physically contiguous memory shared between two processes.

Sagar Bussa said...

Hi Vinoth,

I am trying to share some memory between application and Miniport Driver. I have gone through your post and tried creating handle for my miniport driver Ioctl interface but it was failing while opening the handle with CreateFile API with following error.
"you are requesting IOCTL_HAL_GET_DEVICE_INFO::SPI_GETPLATFORMTYPE, which has been deprecated. Use IOCTL_HAL_GET_DEVICE_INFO::SPI_GETPLATFORMNAME instead"

Can you pls suggest me on how to create the handle for the driver

Thanks
Sagar