Previously, we covered Basic Static Analysis and Basic Dynamic Analysis in Chapter 1 and 3 of Practical Malware Analysis. That marks an end to the first part of the book.
The fourth chapter kicks off the second part of the book and takes a slight detour to cover one of the most important pre-requisites to perform malware analysis; assembly language. However, the crash course doesn’t have any exercises. It’s why we’ll be proceeding to exercises of the fifth chapter i.e., IDA Pro . Let’s get to it!
Exercise 1
Hash | Name |
---|---|
1A9FD80174AAFECD9A52FD908CB82637 | Lab05-01.dll |
Spoiler; I’m going to use IDA Pro Free or IDA Free to complete these exercises.
Question Number 01: What is the address of DllMain?
Let’s quickly import the PE (DLL) file into IDA Pro. Since it does conform to the x86 architecture, we’ll leave disassembler to process it using its default configuration.
Switch to text view and let’s look for the Main function of the DLL. Once IDA’s initial analysis is complete, you’ll first be redirected to the DllMain function, if identified. Otherwise, you’re likely going to hit the DllEntryPoint.
Mind you, there’s a difference in the two. DllEntryPoint is not the definite start of the DLL file. Infact, the compiler adds in several constructs and initialization functions e.g., the C Runtime Initialization, to ensure the program is loaded correctly. One of the main functions of the CRT init call is to call the Main function. We’d have to skip these. Now, I’ve labelled them here and after skipping the CRT calls, I’ve found the address of my DllMain.
Another way to identify the Main function is to look for typical parameters being loaded onto the stack or the implementation of DllMain from MSDN.
Address: 0x1000D02E
Question Number 02: Use the Imports window to browse to ‘gethostbyname’. Where is the import located?
Simply head over to the Imports window and look for gethostbyname. Double click and we’ll be redirected to the function declaration in the disassembly.
Address: 0x100163CC Fact: We can also see the section where the import is located i.e., idata (which is responsible for referencing imports)
Question Number 03: How many functions call ‘gethostbyname’?
This sounds like an easy task for ‘cross-references’. Simply hover or click on the import and press CRTL+X to look for cross-references to it. The window lists 18 cross-references to the import of two types – ‘r’ and ‘p’. However, we can see that the addresses/function call statements are repeated twice (i.e., once for ‘r’ and once for ‘p’).
I couldn’t wrap my head around this initially yet but this is what I’ve learned – the ‘p’ type cross-references indicate a cross-reference to an intra-segment call whereas the ‘r’ type cross-reference is for read-access. So, we see 18 references because 9 of these are actual function calls, whereas the rest of them are marked as ‘read’ type references because the CPU first has to access the import (reading it) and then call it.
Answer: 9 Fact: You can also see that these 9 calls are spread across 5 sub-routines in the disassembly
Question Number 04: Focusing on the call to ‘gethostbyname’ located at 0x10001757, can you figure out which DNS request will be made?
Let’s head to the call at the address, 0x10001757, by pressing G and entering it in. So, judging by the name of the import, the function should return an IP and take ‘name’ (domain) as input. Since the parameters need to be pushed to the stack, let’s head back and look at push statements (preceded by MOV statements into EAX).
We can see off_10019040 being moved into EAX (data, offset value) before being pushed onto the stack. Tracking the offset further, we can see the string ‘aThisIsRdoPicsP’ referencing the name for which the hostname query will be made i.e., ‘pics.practicalmalwareanalaysis.com’. However, the string is preceded by ‘[This Is Rdo]’ which if we review the statements before the function call, we can see 0xD bytes being added into EAX which pushes the pointer to the start of the actual URL, ‘pics.practicalmalwareanalysis.com’
Question Number 05: How many local variables has IDA Pro recognized for the subroutine at 0x10001656?
Let’s head over to the address by pressing G
and entering the address in. Here, the function metadata doesn’t state the fact that the function uses an EBP-based stack but we can see in the code that the function is using ESP to reference local variables. Let’s get back to the question though. Local variables are stored at a negative offset from the Base Pointer. Counting local variables, the final amount accumulates to 23.
Answer: 23
Question Number 06: How many parameters has IDA Pro recognized for the subroutine at 0x10001656?
Similarly, parameters or arguments are stored at a positive offset from the Base Pointer. Here, only one parameter is recognized.
Answer: 1
Question Number 07: Use the Strings window to locate the string \cmd.exe /c in the disassembly. Where is it located?
Using Shift+F12
, let’s open up the Strings sub-view. Searching the keyword in there, we can see the address, xdoors_d:10095B34
[section : address]
Answer: 10095B34
Bonus: If we’re talking about the reference, it’s called at: 100101D0
Question Number 08: What is happening in the area of code that references \cmd.exe /c?
Let’s head over to 100101D0
. Firstly, judging by the red arrow on top of the section containing the script, cmd.exe
, it seems as if it never runs. On the right, we have an offset being pushed to the stack i.e., command.exe /c
which is an incorrect command. Since there’s a GetSystemDirectoryA call just before, we know it was used to construct the path to cmd.exe
i.e., C:\Windows\System32\cmd.exe /c
.
In general, if we were to scroll below and take a look at the code, I can see several calls to closesocket
along with other networking commands such as receive, send, and data offsets being pushed to the stack which appear to be commands which would be run on the system. It’s likely this entire section is attempting to create a shell session for a remote attacker and run desired commands.
Question Number 09: In the same area, at 0x100101C8, it looks like dword_1008E5C4 is a global variable that helps decide which path to take. How does the malware set dword_1008E5C4? (Hint: Use dword_1008E5C4’s cross-references.)
Reading cross-references to the address, 0x100101C8
, we can see just one w/write
cross-reference in the table. Heading to the address, 0x10001678
, we can see it holds the output of a function call to sub_100036951
(at the same address i.e., 0x10003695). Although there’s a function call to GetVersionExA
which returns an OSVERSIONINFOA
data structure containing operating system information of the host workstation. Although EAX itself is reset to 0, a comparison instruction is run wherein the dwPlatformId
field is compared to 2 (a truthy suggests the OS is Windows 7, Windows Server 2008, Windows Vista, Windows Server 2003, Windows XP, or Windows 2000). If it is indeed 2, the AL register is set to 1 (since the ZF is also 1). Otherwise, it’ll be unset. An example being my own system where nothing is set in the global variable since my VM is running a Windows 10 OS.
Question Number 10: A few hundred lines into the subroutine at 0x1000FF58, a series of comparisons use memcmp to compare strings. What happens if the string comparison to robotwork is successful (when memcmp returns 0)?
Firstly, I have no idea how memcmp
works. Let’s head over to the C++ reference. So the function checks if two blocks of memory are same or not. If they are, it returns 0. Otherwise, it returns a non-zero value suggesting which is greater.
Now, if we find the robowork
reference and the call to memcmp, we can see if it returns 0, the program heads to the following piece of code:
The function, sub_100052A2
, first opens up the Registry key, SOFTWARE\Microsoft\Windows\CurrentVersion
then attempts to query the value of the key, WorkTime
and WorkTimes
. If it is non-zero, it is printed to the screen and later sent to the C2 server via the send function. However, if it is zero, the key is closed and the function ends.
Question Number 11: What does the export PSLIST do?
PSLIST
is one of the many exports available with the DLL under investigation. Exploring it, we can see it:
- Identifies the version of the operating system via a call to the function,
sub_100036C3
- Decides to call a function based on the output returned. Both function in turn do perform the same operations (more or less)
- Creates a snapshot of the process using
CreateToolhelp32Snapshot
- Identify the first process in the snapshot using
Process32First
- Enumerates a process' modules and their respective file names
- Opens the file,
xinstall.dll
in append mode and appends the data there using a specified format - Sends the process listing to the C2 server using a
send
call - Continues to enumerate processes using
Process32Next
and a loop
Question Number 12: Use the graph mode to graph the cross-references from sub_10004E79. Which API functions could be called by entering this function? Based on the API functions alone, what could you rename this function?
We can use the cross-references from feature to view all cross-references from our desired function.
There’s only one API call in here; GetSystemDefaultLangID
which returns the language identifier of the system in question. We can rename the function to getSystemLanguageIdentifier
by pressing ‘N’ and entering the new name.
Question Number 13: How many Windows API functions does DllMain call directly? How many at a depth of 2?
Using the function calls feature, we can check out calls from the DllMain function. We can also use custom addresses in the chart to specify which addresses should it restrict itself to. DllMain runs from 1000D02E
to 1000D10A
. Setting it, we have:
For depth 2, we have a large graph to view. I won’t be linking it here.
Question Number 14: At 0x10001358, there is a call to Sleep (an API function that takes one parameter containing the number of milliseconds to sleep). Looking backward through the code, how long will the program sleep if this code executes?
Let’s head over to the address, 0x10001358
by pressing ‘G’ and entering in the address. The sleep
call takes in a single parameter which we see is pushed to the stack right before the call (push eax
). Now, let’s find the value of EAX by backtracking into the function. We can see the first operation is where an offset (an address) is moved into EAX. The offset points to the string (say char* array in C), [This Is CTI]30
. Next, 0D
is added to EAX which moves the pointer from the start of the array to 30
.
This string is then passed off to atoi
which converts the string to integer. The integer is further multiplied with 3E8
(hex) and the resulting value (stored in EAX) is our answer. Here, 3E8 is actually 1000 in decimal value. Multiplication yields 30,000 which is equivalent to 30 seconds.
Answer: 30 seconds.
Question Number 15: At 0x10001701 is a call to socket. What are the three parameters?
Socket takes in three parameters. They are:
- Domain
- Type
- Transfer Protocol
Here, we see the following values pushed to the stack:
- 6 (Protocol: Set to TCP which is officially assigned the number 6)
- 1 (Type: Socket type, set to SOCK_STREAM which is equivalent to TCP for a reliable connection)
- 2 (AF: Address Family, 2 suggests we use AF_INET which allows the socket to use IPv4 addresses)
PS: I’ve referred to MSDN documentation for the Socket function to identify the numeric values being passed here instead of their labels.
Question Number 16: Using the MSDN page for socket and the named symbolic constants functionality in IDA Pro, can you make the parameters more meaningful? What are the parameters after you apply changes?
Symbolic constants, well as the name suggests, are simply constants which are given a symbolic label and their values don’t change throughout the program. Luckily, the values for the Socket call are also pre-defined symbolic values but IDA didn’t automatically label them. We can do so.
Select the value for e.g., 2 for the address family and press M to open the Symbolic Constants dialog. You can either select one of the pre-identified symbols for the constant, define a new one, or search amongst IDA’s own database of such symbols.
AF_INET and SOCK_STREAM were easier to find in IDA’s own library. However, IPPROTO_TCP (i.e., 6 for the Protocol parameter) was not available. I found an excellent blog on creating custom symbolic constants. The steps are highlighted as follows:
- Open the ‘Enumerations’ sub-view
- Create a default enumeration (Enumerations are simply custom data types and used for the same purpose; assign symbolic names to constant values)
- Select the newly created enumeration
- Press ‘N’ to add a new member to the enumeration
- Now define the new symbolic constant by giving it a name and value (preceded by 0x for a hex value)
- Assign the named symbol to the numeric value we previously couldn’t assign a symbol to
Here’s how the stack push instructions look now:
Question Number 17: Search for usage of the ‘in’ instruction (opcode 0xED). This instruction is used with a magic string VMXh to perform VMware detection. Is that in use in this malware? Using the cross-references to the function that executes the in instruction, is there further evidence of VMware detection?
We can search for the opcode under Search for sequence of bytes
and look for the instruction. Unluckily, I was unavailable to identify the address by myself. Peaking at the solutions, I see the author found the interesting instruction at the address, 0x100061C7
where the hex value 564D5868h
was moved into EAX. Converting it to ASCII, we can see it refers to the string VMXh
.
We can see cross-references to the function, sub_10006196
(which I’ve later renamed to VMWareCheck
). This function is simply checking for VMWare installation. If it does, it alerts as we can see in the function call.
Question Number 18: Jump your cursor to 0x1001D988. What do you find?
As we’ve seen before in this chapter, these are likely raw bytes which could either be encoded/encrypted data, shellcode, or unreadable strings.
Question Number 19: If you have the IDA Python plug-in installed (included with the commercial version of IDA Pro), run Lab05-01.py, an IDA Pro Python script provided with the malware for this book. (Make sure the cursor is at 0x1001D988.) What happens after you run the script?
I believe IDAPython is not available for IDAFree which I’m currently running. I’ll be skipping this one.
Question Number 20: With the cursor in the same location, how do you turn this data into a single ASCII string?
Since I haven’t revealed the data by XOR’ing it with 0x55, it isn’t a readable string for me. However, I do know we’re able to convert data into ASCII by pressing the ‘A’ key.
Question Number 21: Open the script with a text editor. How does it work?
Here’s the script:
sea = ScreenEA()
for i in range(0x00,0x50):
b = Byte(sea+i)
decoded_byte = b ^ 0x55
PatchByte(sea+i,decoded_byte)
Here’s an explanation:
- First read the effective address (EA) of the cursor using the
ScreenEA
call - Read 50 bytes and loop over each
- Select the Byte at the address from the start of the EA
- Decode the byte by XORing the byte with 0x55
- Patch the byte at the same address with the decoded data on IDA’s display
That’s a wrap to this chapter! Let’s move on!