Chapter Six focused on code constructs and how analysts can easily identify them when walking through the disassembly in IDA. Let’s take a look at the exercises now.
Exercise 1
Hash | Name |
---|---|
6abde2f83015f066385d27cff6143c44 | Lab06-01.exe |
536e6f91d4515e30af7afd37f22c213fee152126 | Lab06-01.exe |
fe30f280b1d0a5e9cef3324c2e8677f55a6202599d489170ece125f3cd843a03 | Lab06-01.exe |
Question Number 1: What is the major code construct found in the only subroutine called by main?
Let’s get to work. The start
function is located at the address 00401090
. It’s recommended that you don’t really fall into the disassembly generated by IDA here. Through some quick tracking, we can identify the main function which is located at 00401040
.
This function further calls a function at 00401000
which has an If conditional construct which further calls a function based on the output of the API call to InternetGetConnectedState.
Based on the connection’s status, an offset is pushed to the stack and the function at sub_40105F
is called.
Question Number 2: What is the subroutine located at 0x40105F?
Sadly, I was unable to answer this question correctly. After hours of tracking the arguments through the stack and going through function calls trying to understand it, I gave in. My initial suspicions were indeed that the function would somehow use the string offset pushed before (the output of InternetGetConnectedState) and print or write it to a file. I saw a few WriteFile functions as well but the mere size of the disassembly made it quite difficult to truly identify the purpose of the binary.
Well, the function was printf
. The authors explained how a string offset being pushed to the stack right before a function call is a pretty good indicator the function could be printing the string. Sadly, IDA didn’t recognize the function call itself and hours were lost trying to disassemble a simple function.
Question Number 3: What is the purpose of this program?
Using basic static analysis, we can identify that the program:
- Identifies the active internet connection status of the computer (and prints it as we’ve seen in the function disassembly above)
It could potentially be used by a different malware to check the connection status of a computer. I’ll avoid digging in to this binary any further (primarily due to my lack of interest after the blow from question number 2, haha).
Exercise 2
Hash | Name |
---|---|
c0b54534e188e1392f28d17faff3d454 | Lab06-02.exe |
bb6f01b1fef74a9cfc83ec2303d14f492a671f3c | Lab06-02.exe |
b71777edbf21167c96d20ff803cbcb25d24b94b3652db2f286dcd6efd3d8416a | Lab06-02.exe |
Question Number 1: What operation does the first subroutine called by main perform?
The disassembly points to the start
function at 004011B0
. We can find the main
function at 00401130
. However, the disassembly doesn’t read the function properly. We can change the name to main
and the arguments should be adjusted automatically.
The first function appears to be sub_401000
which is the same function as LAB6-01. It checks whether the system has an active internet connection or not.
Question Number 2: What is the subroutine located at 0x40117F?
Heh, the authors sure do love to challenge us. Yes, this time I can successfully understand that this function is indeed a printf
call. Why?
- Arguments before the function call are string offsets (with line endings suggesting they might be printed to console or a file)
- Format characters like %d or %c
- Code constructs similar to what we saw in the last exercise
Question Number 3: What does the second subroutine called by main do?
The second subroutine called by main is at the address 00401040
.
It appears to establish a connection to practicalmalwareanalysis.com
to access the file cc.htm
and reads its content 200 bytes a time.
The parsing is done such that the program attempts to compares the first few characters of the array (content read from the top of the webpage and currently stored in the Buffer
) against the identifier for a comment ('<!–'). If a comment is successfully parsed, the following character is stored in a register and is rendered as the command to be executed.
Question Number 4: What type of code construct is used in this subroutine?
If it’s the second subroutine in question, they’re a series of if-else-if conditionals. When the call to InternetReadFile is made and an HTML file is downloaded, the buffer’s first four elements (wherein the file’s content are temporarily placed) undergo a series of comparisons to identify a comment; the fifth character is the actual command to be used by this program to continue the execution.
PS: I learned how to fix stack variables and change their types using this exercise’s solutions. See how the variable comparisons by default don’t show that the variables being compared are simple increments of 1 into the buffer character array. We can fix this by properly configuring the type of Buffer
to be an array of size 200H (number of bytes to read is pushed to stack before so we know Buffer is 512 bytes in size) which is equivalent to 512. Once done, you’ll see how IDA does its magic and properly traverses the buffer array without adding in useless variables to cater the counter variables.
Question Number 5: Are there any network-based indicators for this program?
The command acquisition function called by main
has several network calls using which we can acquire the following indicators:
- http://www.practicalmalwareanalysis.com/cc.htm (URL/Webpage)
- Internet Explorer 7.5/pma (User-agent String)
Question Number 6: What is the purpose of this malware?
The malware can be used to check the connection status of the compromised system as well as acquire commands from the C2 server (based on the provided URL and the file therein) and display them to the console.
Exercise 3
Hash | Name |
---|---|
3f8e2b945deba235fa4888682bd0d640 | Lab06-03.exe |
d4e234ec4baf7d12dd59c3a9238326819a509a31 | Lab06-03.exe |
75eb05679a0a988dddf8badfc6d5996cc7e372c73e1023dde59efbaab6ece655 | Lab06-03.exe |
Question Number 1: Compare the calls in main to Lab 6-2’s main method. What is the new function called from main?
There’s just one additional function call in main
here. It’s located at the address, 00401130
Question Number 2: What parameters does this new function take?
It takes two parameters:
- lpExistingFilename (which if we backtrack is the first argument of the argv array)
- Character (from the function call we disassembled and analyzed in the last exercise to traverse the switch)
Question Number 3: What major code construct does this function contain?
The major code construct in this function is a switch (along with jump tables).
Question Number 4: What can this function do?
Depending on the character input provided to the function, the switch can:
- Create a directory
- Copy a file
- Delete a file
- Set a Registry key value
- Sleep (100)
- Default Case: Print an error
Question Number 5: Are there any host-based indicators for this malware?
The function with the switch
has several host-based indicators which we can use to drive detections. They’re listed below:
- Directory:
C:\Temp\
- Filename:
CC.exe
- Registry Key:
Software\Microsoft\Windows\CurrentVersion\Run
- Registry Subkey:
Malware
Question Number 6: What is the purpose of this malware?
The purpose of this malware is to check an active network connection, download an HTML file, and parse a comment from it. Then, based on the command from the server, it will either create a directory, copy the malware, delete it, or set it in registry to persist in the Run
key (which is how malware re-executes when the system is rebooted and the user logs in).
Exercise 4
Hash | Name |
---|---|
21be74dfafdacaaab1c8d836e2186a69 | Lab06-04.exe |
5b0afb3069346a8e00b3786af0908783a5f304b4 | Lab06-04.exe |
cce96e5cb884c565c75960c41f53a7b56cef1a3ff5b9893cd81c390fd0c35ef3 | Lab06-04.exe |
Question Number 1: What is the difference between the calls made from the main method in Labs 6-3 and 6-4?
First thing’s first – let’s push some of our previously learned modifications so we don’t repeat our analysis.
Structural changes can be noticed in the main
function. Here’s a list of addresses of functions called from inside of main:
- 0x00401000 (Checks internet connection status)
- 0x00401040 (Connect to internet, acquire HTML, parse the command from comment)
- 0x004012B5 (Printf)
- 0x00401150 (Switch + Jump-tables)
Question Number 2: What new code construct has been added to main?
It’s a for
loop added to the main
function. This will help the function loop over the network call procedure and acquire a new command for one of the switch cases to execute.
Question Number 3: What is the difference between this lab’s parse HTML function and those of the previous labs?
It appears to be that the user-agent string this time takes in a format character, %d used for digits, to perhaps “do” something to the string. After another hour of analysis, I just had a flashback as the format character might be pointing to something here. Here’s my thought process:
- %d suggests a format character was taken in; could it be printing the value?
printf
is already labeled, what function could it be?
Turns out; sprintf
is yet another C function which takes in a format character but behaves a tad-bit differently than printf. Rather than printing it to the console or standard output, the function generates a string (takes in a string with a format character) and stores it in the buffer. In our case, I’ve changed the function name and parameters are they should be labeled:
Question Number 4: How long will this program run? (Assume that it is connected to the Internet.)
To answer this one, let’s go back to the main
function. We see a loop which is initialized to 0 and runs till 1440. Later, a sleep
command follows with the time set to 60 seconds so every loop runs for a minute (or until a minute? However you want to put it). So, the final time is 1440 hours which is an equivalent to 24 hours.
Question Number 5: Are there any new network-based indicators for this malware?
C2 URL and filename indicators are the same. Only difference here is the new User Agent string being generated on run-time.
User-agent String: Internet Explorer 7.50/ pma%d (%d is the active minute or the variable used in the loop)
Question Number 6: What is the purpose of this malware?
Now that we’re at the final question of this entire chapter; let’s summarize it.
The malware first attempts to check for an active network connection. If it does find it, it prints it to the console and then attempts to connect over to the Internet at the specified address, pull the HTML file, parse a comment, and use the characters from within the comment to execute various commands through a Switch table. Here, what’s unique is that the user-agent string used to connect to the Internet is dynamically generated based on the minute of execution rather than a static string. It runs for a total of 24 hours and exits soon after. It’s likely going to continue the infection using the malware it might’ve copied/downloaded to disk.