Reverse Engineering Malware

I recently got the opportunity to sit in on (fellow SANS instructor) Lenny Zeltser's "Reverse Engineering Malware" class. It's a terrific course, and I highly recommend it.


During the material on memory analysis, we were comparing the output of "volatility pslist" and "volatility psscan2". It's relatively straightforward for rootkits to hide themselves from pslist, but psscan2 does a much more thorough job of finding the hidden processes. So the differences in the output are always very interesting to the analyst. Here's an example of what I mean:


$ volatility pslist -f memory.imgvolatility psscan2 -f memory.img
Name                 Pid    PPid   Thds   Hnds   Time  
System               4      0      55     260    Thu Jan 01 00:00:00 1970  
smss.exe             540    4      3      21     Thu Jan 28 16:11:40 2010  
csrss.exe            604    540    12     363    Thu Jan 28 16:11:46 2010  
lsass.exe            684    628    18     341    Thu Jan 28 16:11:47 2010  
vmacthlp.exe         836    672    1      24     Thu Jan 28 16:11:47 2010  
svchost.exe          848    672    18     201    Thu Jan 28 16:11:47 2010  
svchost.exe          1024   672    51     1178   Thu Jan 28 16:11:47 2010  
svchost.exe          1072   672    4      75     Thu Jan 28 16:11:47 2010  
svchost.exe          1132   672    15     212    Thu Jan 28 16:11:48 2010  
spoolsv.exe          1476   672    10     115    Thu Jan 28 16:11:49 2010  
explorer.exe         1592   1572   12     4021   Thu Jan 28 16:11:50 2010  
VMwareUser.exe       1656   1592   8      416    Thu Jan 28 16:11:50 2010  
VMwareService.e      1996   672    3      1026   Thu Jan 28 16:11:58 2010  
wscntfy.exe          1396   1024   1      27     Thu Jan 28 16:12:03 2010  
taskmgr.exe          1624   628    3      20201  Tue Feb 02 02:45:05 2010  
mike022.exe          1956   672    2      30     Tue Feb 02 03:25:29 2010  
wordpad.exe          1992   1260   4      102    Tue Feb 02 22:17:03 2010  
calc.exe             828    1592   1      26     Thu Feb 04 00:01:00 2010  
cmd.exe              968    1592   1      32     Thu Feb 04 00:01:13 2010  
wordpad.exe          2008   1256   5      101    Thu Feb 04 00:02:56 2010  
$ 
PID    PPID   Time created             Time exited              Offset     PDB        Remarks
------ ------ ------------------------ ------------------------ ---------- ---------- ----------------

   932    672 Thu Jan 28 16:11:47 2010                          0x01ea3558 0x082c0100 svchost.exe     
  1744    848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe    
  1132    672 Thu Jan 28 16:11:48 2010                          0x01eb4970 0x082c0160 svchost.exe     
  1956    672 Tue Feb 02 03:25:29 2010                          0x020155d8 0x082c02c0 mike022.exe     
  1072    672 Thu Jan 28 16:11:47 2010                          0x02016978 0x082c0140 svchost.exe     
  1172   1592 Tue Feb 02 02:40:48 2010                          0x0204c850 0x082c01c0 cmd.exe         
  1476    672 Thu Jan 28 16:11:49 2010                          0x0209db38 0x082c01a0 spoolsv.exe     
  1996    672 Thu Jan 28 16:11:58 2010                          0x021f0da0 0x082c0180 VMwareService.e 
  1664   1592 Thu Jan 28 16:11:50 2010                          0x021feb88 0x082c0240 msmsgs.exe      
  1024    672 Thu Jan 28 16:11:47 2010                          0x02202880 0x082c0120 svchost.exe     
   604    540 Thu Jan 28 16:11:46 2010                          0x0221f020 0x082c0040 csrss.exe       
  1624    628 Tue Feb 02 02:45:05 2010                          0x02256da0 0x082c02e0 taskmgr.exe     
   272   1820 Thu Feb 04 00:00:55 2010                          0x02293b08 0x082c0300 wordpad.exe     
  1012    672 Thu Jan 28 16:12:02 2010                          0x023a78b0 0x082c0260 alg.exe         
  1656   1592 Thu Jan 28 16:11:50 2010                          0x023a9c28 0x082c0220 VMwareUser.exe  
  1648   1592 Thu Jan 28 16:11:50 2010                          0x023ae980 0x082c0200 VMwareTray.exe  
   848    672 Thu Jan 28 16:11:47 2010                          0x023b3020 0x082c00e0 svchost.exe     
  1748   1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe         
   836    672 Thu Jan 28 16:11:47 2010                          0x02412b58 0x082c00c0 vmacthlp.exe    
   672    628 Thu Jan 28 16:11:47 2010                          0x02448cf8 0x082c0080 services.exe    
   968   1592 Thu Feb 04 00:01:13 2010                          0x024707e8 0x082c0340 cmd.exe         
   684    628 Thu Jan 28 16:11:47 2010                          0x02483da0 0x082c00a0 lsass.exe       
  1992   1260 Tue Feb 02 22:17:03 2010                          0x02491130 0x082c0360 wordpad.exe     
  1396   1024 Thu Jan 28 16:12:03 2010                          0x02492d78 0x082c0280 wscntfy.exe     
  2008   1256 Thu Feb 04 00:02:56 2010                          0x02494988 0x082c03e0 wordpad.exe     
   828   1592 Thu Feb 04 00:01:00 2010                          0x024c86b8 0x082c02a0 calc.exe        
  1592   1572 Thu Jan 28 16:11:50 2010                          0x024ddda0 0x082c01e0 explorer.exe    
   540      4 Thu Jan 28 16:11:40 2010                          0x024f8368 0x082c0020 smss.exe        
   628    540 Thu Jan 28 16:11:46 2010                          0x025314e8 0x082c0060 winlogon.exe    
     4      0                                                   0x025c8830 0x00319000 System
 
Visually you can see that the psscan2 output lists several more 
processes than pslist, but just using your eyeballs it can be difficult 
to figure out exactly what the differences are.  Seems like a job for 
command-line kung fu!

My first thought was to simply extract the 
list of .EXEs from each command and diff them.  In order to do the diff 
properly, I'll need to sort them into canonical order, but that's no 
problem.  Here's how we manage the output from pslist:

$ volatility pslist -f memory.img | tail -n +2 | awk '{print $1}' | sortvolatility psscan2 -f memory.img | 
tail -n +4 | awk '{print $NF}' | sortdiff <(volatility psscan2 -f memory.img | tail -n +4 | awk 
'{print $NF}' | sort) \
     <(volatility pslist -f memory.img | tail -n +2 | awk '{print $1}' | sort)
calc.exe
cmd.exe
csrss.exe
...
I
 use tail to chop off the header line, then awk to extract the name of 
the .EXE from the first column, and finally pipe the whole thing into 
sort.

Dealing with the psscan2 output is very similar:

$ 
alg.exe
calc.exe
cmd.exe
...
In
 this case, there are three header lines we need to skip.  Also the .EXE
 name is in the last column of output-- "print $NF" is a useful awk 
idiom for printing the value in the last column.

So now we need 
to diff the output of these two commands.  We could do this by creating 
temporary files, but why bother when have the magic bash "<(...)" 
syntax that lets us substitute command output in a place where a command
 would normally be looking for a file name:


1d0
< alg.exe
4,5d2
< cmd.exe
< cmd.exe
10,11d6
< msmsgs.exe
< services.exe
18d12
< svchost.exe
23d16
< VMwareTray.exe
25,27d17
< winlogon.exe
< wmiprvse.exe
< wordpad.exe
 
Wicked!  There are 10 processes that appear in the psscan2 output that 
don't show up in the pslist output.  Since we don't see any lines 
starting with ">" there are no processes in the pslist output that 
don't show up in psscan2-- this is what we'd expect, but it's always 
nice to get confirmation.

The only problem here is that as we got
 further into the in-class exercises, I realized I really wanted all of 
the extra detail about each of the hidden processes from the psscan2 
output.  For example, the hex offset values end up being very useful, 
and I'd like to know exactly which two of the three command.exe 
processes are the hidden ones.  Let me show you the command line I came 
up with and then explain it to you:

$ join -v 1 -1 1 -2 2 \
    <(volatility psscan2 -f memory.img | tail -n +4 | sort -n -k 1,1) \
    <(volatility pslist -f memory.img | tail -n +2 | sort -n -k2,2)non-matching lines from psscan2 ("-v 
1").  The tricky bit is that each command output needs to be sorted by 
its PID column for join to work.  So if you look in the "<(...)" 
clauses, you'll see that the final element of the pipeline in each case 
is a numeric sort on the PID column.  Easy, right?

The only fly 
in the ointment is the "not in sorted order" error messages from join.  
The problem is that join only understands alphabetic sorting.  So when 
we go from 9xx PIDs to 1xxx PIDs, join thinks the file has gone all 
unsorted.  There's no "-n" option to join like there is for sort, but in
 some versions of join we can use the "--nocheck-order" option to 
suppress the error messages:

$ join -v 1 -1 1 -2 2 --nocheck-order \
    <(volatility psscan2 -f memory.img | tail -n +4 | sort -n -k 1,1) \
    <(volatility pslist -f memory.img | tail -n +2 | sort -n -k2,2)
272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe 
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe 
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe 
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe 
join: file 1 is not in sorted order
join: file 2 is not in sorted order
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe 
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe 
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe 
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe 
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe 
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 cmd.exe
In
 this case I'm using join rather than diff because the output of the two
 commands is so differently formatted.  Essentially I'm doing a join on 
the PID columns of the psscan2 ("-1 1") and pslist ("-2 2") output and 
telling join to output the 
272 1820 Thu Feb 04 00:00:55 2010 0x02293b08 0x082c0300 wordpad.exe 
628 540 Thu Jan 28 16:11:46 2010 0x025314e8 0x082c0060 winlogon.exe 
672 628 Thu Jan 28 16:11:47 2010 0x02448cf8 0x082c0080 services.exe 
932 672 Thu Jan 28 16:11:47 2010 0x01ea3558 0x082c0100 svchost.exe 
1012 672 Thu Jan 28 16:12:02 2010 0x023a78b0 0x082c0260 alg.exe 
1172 1592 Tue Feb 02 02:40:48 2010 0x0204c850 0x082c01c0 cmd.exe 
1648 1592 Thu Jan 28 16:11:50 2010 0x023ae980 0x082c0200 VMwareTray.exe 
1664 1592 Thu Jan 28 16:11:50 2010 0x021feb88 0x082c0240 msmsgs.exe 
1744 848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 
wmiprvse.exe 
1748 1592 Thu Feb 04 00:02:10 2010 Thu Feb 04 00:06:19 2010 0x0240b9a0 0x082c03a0 
cmd.exe 
The other alternative is obviously to sort the PID columns alphabetically, 
but that offends my sensibilities somehow.
 
Mmmm, hmmm!  That was some tasty fu!  Hey Tim, volatility runs on 
Windows-- what can you do with the output?  I double-dog-dare you to try
 it in CMD.EXE first...
Tim skipped school:python.exe volatility psslist -f memory.img > plist.txtcmd /v:on /c "for /F 
"skip=2 tokens=1,5,10,15" %a in ('python.exe volatility psscan2 -f lab3.img') do
  @(if not "%d"=="" (set name=%d) else (if not "%c"=="" (set name=%c) else (set name=%b))) &
  set pid=%a & (type pslist.txt | findstr /B /R /C:"!name! *!pid! " > NUL || echo !name! !pid!)"

Do cmd.exe, dang Hal. Happy Freaking New Year to me, huh?

Here is what I came up with based on the assumption that pslist returns a subset of psscan2.

C:\> 
C:\> 

svchost.exe 932
wmiprvse.exe 1744
cmd.exe 1172
msmsgs.exe 1664
wordpad.exe 272
alg.exe 1012
VMwareTray.exe 1648
cmd.exe 1748
services.exe 672
winlogon.exe 628

I
 split this command into two for the sake of readability; however, it 
could be easily combined into a one-liner. But I'll leave that simple 
experiment to you. The first line takes the output of psslist and dumps 
the contents into a file. This file will be read numerous times so it is
 significantly faster to just read the file in the second "half" of our 
command. Now, regarding that second half...

We start off by using invoking our shell with /v:on to enable delayed variable expansion and /c
 to cause our spawned shell to exit upon completion. Inside the shell we
 use our trusty For loop. The first three lines are skipped as they are 
headers. The For loop then splits the line based on white space. We are 
trying to get the name of the process, and due to spacing, it may be in 
the 5th, 10th, or 15th token. Yes, it is that confusing. Here is a 
little diagram of what I mean:

PID    PPID   Time created             Time exited              Offset     PDB        Remarks
------ ------ ------------------------ ------------------------ ---------- ---------- ----------------

Token1      2   3   4  5        6    7                                   8          9     10
   932    672 Thu Jan 28 16:11:47 2010                          0x01ea3558 0x082c0100 svchost.exe

Token1      2   3   4  5        6    7   8   9 10       11   12         13         14     15
  1744    848 Thu Feb 04 00:02:53 2010 Thu Feb 04 00:04:23 2010 0x01eaea88 0x082c0380 wmiprvse.exe

Token1      2                                                            3          4      5
     4      0                                                   0x025c8830 0x00319000 System

Our
 for loop will give us 4 variables a, b, c, and d which represent the 
1st, 5th, 10th, and 15th token. We have to use a little trick to figure 
out which of the three variables contains the process name by checking 
each variable from right to left. If %d is not empty, then it contains 
the process name so we set Name equal to %d. If %d is empty we try %c, 
and if %c is empty we use %b. For the sake of nice variable names we set
 !pid! equal to %a. We then have the variable !pid!, which contains the 
process id, and !name!, which contains the process name.

We then 
search the pslist.txt file to see if the current process, represented by
 !name! and !pid!, is in the file. We output the file, using the Type 
command, and use FindStr to search for the matching name and process id.
 The /B switch says our search string must be at the beginning of the 
line, the /R enables regular expression searches.  The default FindStr 
setting is to treat a space in our search string as a logical OR, but 
the /C switch "uses [the] specified string as a literal search string," 
meaning it doesn't treat a space as a logical OR.  In short, it looks 
for the process name at the beginning of the line, followed by some 
number of spaces, then the process id, and then another space.

We
 then use the logical OR (||) in conjunction with the FindStr command to
 determine whether FindStr found something or not. This trick has been 
used repeatedly, but most recently in episode 122.
 If FindStr doesn't find anything we then output the process name and 
PID. This effectively gives us a list of processes that are found with 
psscan2 but not pslist.

Now for a more robust solution using...
PowerShell

I'm
 going to deviate into script land here, only because this mini-script 
may be very useful for manipulating the output of these commands. It 
will take the output and objectify it.

Objectifying psscan2:

PS C:\> $null, $pslist = python volatility pslist -f memory.img
PS C:\> [regex]$regex = '(?\S+)\s+(?[0-9]+)\s+(?[0-9]+)\s+(?[0-9]+)\s+(?[0-9]+)\s+(?
PS C:\> $pslistobjects = foreach ($p in $pslist) {
...        $psobj = "" | Select-Object Name, PID, PPID, Threads, Handles, Time
...        $p -match $regex | Out-Null
...        $psobj.Name = $matches.Name
...        $psobj.PID = $matches.PID
...        $psobj.PPID = $matches.PPID
...        $psobj.Threads = $matches.Threads
...        $psobj.Handles = $matches.Handles
...        $psobj.Time = [datetime]::ParseExact($matches.Time.Trim(), "ddd MMM dd HH:mm:ss yyyy", $null)
...        $psobj
...     }

PS C:\> $pslistobjects | Format-Table
Name            PID  PPID Threads Handles Time
----            ---  ---- ------- ------- ----
System          4    0    55      260     1/1/1970 12:00:00 AM
smss.exe        540  4    3       21      1/28/2010 4:11:40 PM
csrss.exe       604  540  12      363     1/28/2010 4:11:46 PM
...

This takes the output from pslist and converts it to PowerShell objects. Let's look at each line, one at a time.

PS C:\> $null, $pslist = python volatility pslist -f memory.img[regex]$regex = 
'(?\S+)\s+(?[0-9]+)\s+(?[0-9]+)\s+(?[0-9]+)\s+(?[0-9]+)\s+(?$pslistobjects =
 foreach ($p in $pslist) {$psobj =
 "" | Select-Object Name, PID, PPID, Threads, Handles, Time$p -match $regex | Out-Null$psobj.Name = 
$matches.Name$psobj.PID = $matches.PID$psobj.PPID = $matches.PPID$psobj.Threads = 
$matches.Threads$psobj.Handles = 
$matches.Handles$psobj.Time = [datetime]::ParseExact($matches.Time.Trim(), 
"ddd MMM dd HH:mm:ss yyyy", $null)$psobj}$null, $null, $null, $psscan2 = \python25\python.exe volatility psscan2

Here
 we get the output from pslist, send the first line to null, and the 
remainder is put into the variable pslist. This effectively skips the 
first line (header).

PS C:\> 

The next chunk sets up our Regular Expression with named groupings.

PS C:\> 
...        
...        
...        
...        
...        
...        
...        
...        
...        
...     

Inside
 the ForEach-Object loop is where the heavy lifting is done. First, an 
empty object is created. Then the Match operator is used to match the 
string using the regular expression and automatically populate the 
$matches variable. We then set each property of our object. The Time 
property is a bit special since the time format used by pslist isn't one
 of the formats that PowerShell/Windows natively understands. The 
variable $pslistobjects then contains PowerShell'ed objects from 
volatility's pslist. We can then sort, filter, or do perform all sorts 
of tricks once it has been PowerShellized.

A similar mini-script will objectify the output from psscan2:

PS C:\> 
 -f memory.img[regex]$regex = '\s*?(?[0-9]+)\s+(?[0-9]+)\s(?.{24})\s(?.{24})
          \s(?[0-9a-fx]{10})\s(?[0-9a-fx]{10})\s(?.+)'$psscan2objects = foreach ($p in $psscan2)
 {$psobj = "" | Select-Object Name, PID, PPID, Created, Exited, Offset, PDB$p -match $regex |
 Out-Null$psobj.Name = $matches.Name$psobj.PID = $matches.PID$psobj.PPID = $matches.PPID$psobj.Offset =
 $matches.Offset$psobj.PDB = $matches.PDBif ($matches.Created.Trim()) {$psobj.Created = 
[datetime]::ParseExact($matches.Created, "
PS C:\> 
PS C:\> 
...        
...        
...        
...        
...        
...        
...        
...        
...            
ddd MMM dd HH:mm:ss yyyy", $null)}if ($matches.Exited.Trim()) {$psobj.Exited = [datetime]::ParseExact($matches.Exited, "
...        
...        
...            
ddd MMM dd HH:mm:ss yyyy", $null)}$psobj}$psscan2objects | ft
...        
...        
...     

PS C:\> 

Name             PID  PPID Created              Exited               Offset     PDB
----             ---  ---- -------              ------               ------     ---
svchost.exe      932  672  1/28/2010 4:11:47 PM                      0x01ea3558 0x082c0100
wmiprvse.exe     1744 848  2/4/2010 12:02:53 AM 2/4/2010 12:04:23 AM 0x01eaea88 0x082c0380
svchost.exe      1132 672  1/28/2010 4:11:48 PM                      0x01eb4970 0x082c0160
mike022.exe      1956 672  2/2/2010 3:25:29 AM                       0x020155d8 0x082c02c0
...

If
 you are going to use these commands often I would highly suggest making
 these into script files. You could even pass the file name to these 
scripts and have it wrap the volititlity commands.

Ok, so now we have two variables, each contains the output of the
respective volatility command.

PS C:\> $pslistobjects | ft

Name            PID  PPID Threads Handles Time
----            ---  ---- ------- ------- ----
System          4    0    55      260     1/1/1970 12:00:00 AM
smss.exe        540  4    3       21      1/28/2010 4:11:40 PM
csrss.exe       604  540  12      363     1/28/2010 4:11:46 PM
lsass.exe       684  628  18      341     1/28/2010 4:11:47 PM
... 
 
 
PS C:\> $psscan2objects | ftFinally Now, we can then use the Compare-Object cmdlet 
to compare the two sets of processes.

PS C:\> Compare-Object $pslistobjects $psscan2objects -Property name,pid

Name             PID  PPID Created              Exited               Offset     PDB
----             ---  ---- -------              ------               ------     ---
svchost.exe      932  672  1/28/2010 4:11:47 PM                      0x01ea3558 0x082c0100
wmiprvse.exe     1744 848  2/4/2010 12:02:53 AM 2/4/2010 12:04:23 AM 0x01eaea88 0x082c0380
svchost.exe      1132 672  1/28/2010 4:11:48 PM                      0x01eb4970 0x082c0160
mike022.exe      1956 672  2/2/2010 3:25:29 AM                       0x020155d8 0x082c02c0
...



name           pid  SideIndicator
----           ---  -------------
svchost.exe    932  =>
wmiprvse.exe   1744 =>
cmd.exe        1172 =>
msmsgs.exe     1664 =>
wordpad.exe    272  =>
alg.exe        1012 =>
VMwareTray.exe 1648 =>
cmd.exe        1748 =>
services.exe   672  =>
winlogon.exe   628  =>

The
 Property parameter is used to specify the properties to use for 
comparison. We can either use a single property or a comma separated 
list of property names.

From this output it is quickly apparent that there are 10 processes found by psscan2 that were not found by pslist.

Whew, that was a lot of work this week. I hope it gets me on Santa's Nice list...next year.

 
 


Category Article

3 Responses to “c0decstuff”

What's on Your Mind...

Thank f' u C0mment