Sunday, 15 August 2010

WARNING Piratebay drive-by infection

Yesterday I visited piratebay.org and performed a search. I was then surprised so see a Java splash screen and I was immediately suspicious. Within seconds AVG then popped up proclaiming a threat could not be removed would you like to force it and I clicked yes! Looking at AVG's Resident shield detection history showed the following:-



Since there was a relatively large list of executables I believed that AVG may have actually done it's job and stopped the infection in its tracks. However, it quickly became apparent that AVG had been about as much use as a chocoloate fireguard as I was being redirected to a vast array of advertisement websites. Surprisingly this was in Opera which I naively thought was more resistant to malicious programs. Opening Internet Explorer also showed the Tango toolbar had been installed.

I already had MalwareBytes and Spyware Terminator installed so I updated and ran full scans in both. Some more executables were found and removed after a reboot. Full scans from AVG, Malwarebytes and Spyware Terminator were showing as clean and I thought I was in the clear. It didn't take long for me to realise this was not the case as once again I was being redirected to advertisement websites, interestingly enough only in Opera. A bit of research led me to ComboFix.exe. I downloaded this to my Desktop as advised and disabled AVG's resident shield so it did not affect ComboFix while it was running. Right-clicking the downloaded exe and clicking Run as Administrator allows it to run with the highest privileges and maximise the chance of malware removal. Combofix said it had detected Rootkit activity and needed to be rebooted, which was duly completed and upon logon started again. Once again Rootkit activity was detected and Combofix said a further reboot was required. Some more executables were cleared and RDPCDD.sys was removed. Aha, this was it I thought and another full scan from AVG also cleared RDPCCD.sys from the C:\WINDOWS\winsxs directory.

Later that night I started being redirected to advertisement websites again and the situation was now getting very frustrating! I decided to run Combofix.exe again and after a reboot combofix.txt had the following lines amongst others:

Infected copy of c:\windows\explorer.exe was found and disinfected
Restored copy from - c:\windows\winsxs\.....\explorer.exe
Infected copy of c:\windows\system32\wininit.exe was found and disinfected
Restored copy from - c:\windows\winsxs\...\wininit.exe

However, after this apparent fix I was still getting the occasional redirect. I decided to upload my copies of explorer.exe and wininit.exe to VirusTotal and sure enough both came back as infected. I then decided to boot from a Windows 7 recovery cd which allowed me to run a command prompt from a known clean environment. My first step was to attempt to replace these infected files. Unfortunately the recovery cd does not have C:\WINDOWS\explorer.exe but it does have C:\WINDOWS\System32\wininit.exe. Looking at the timestamps of these files implied they were identical (although they couldn't be). I had nothing to lose so I did the following:

c:
cd \windows\system32
move wininit.exe wininit.old
copy x:\windows\system32\wininit.exe .

With regards to explorer.exe I wasn't sure what to do so I decided to hunt my C:\ drive for any available copy using the trusty command:

c:
cd \
dir /s explorer.exe

I was intrigued by a copy of explorer.exe that was found in the C:\WINDOWS\ERDNT\cache directory and since I had nothing to lose I replaced my original explorer.exe with this one:

c:
cd \windows
move explorer.exe explorer.old
copy ERDNT\cache\explorer.exe .

I then rebooted into Windows 7 successfully, however, this was nothing new since I was able to do this previously. My first step was to upload my replacement executables to VirusTotal and they both tested clean, which was encouraging because I couldn't be sure they were not being infected by some other hidden process. I have now been running for a full day without any noticeable problems. Hopefully this will be the end of this saga.

I was intrigued as to what created the cache folder in the ERDNT directory. I think Combofix does this and it goes over and above the ERUNT application which is used to backup the registry only. Whatever created it I am thankful for and if anyone knows feel free to leave a comment.

Thursday, 5 August 2010

Where's my SYSVOL gone!

I had recently installed a second domain controller and made it the PDC for one of our clients and the process had gone smoothly, at least that's what I thought. I received a request from this client to make a configuration change, which could be done via group policy. After opening the Group Policy Management Console I was greeted with the error message, The Network Path was not found. Strange, I tried browsing to \\domain\SYSVOL and I could successfully browse the NETLOGON and SYSVOL shares.

After a bit of research it transpired that the group policy management console tries to connect to the PDC when working with group policies. I tried browsing to my SYSVOL share on the new domain controller and low and behold it was not there! As you can imagine this concerned me greatly. The first place to check was the File Replication Service log in the Event Viewer of both servers. The older of the two domain controllers log was full of Errors with Event ID: 13568
The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR....


This error had been present for more than two years. The second server's log was full of Event ID: 13508 warnings
The File Replication Service is having trouble enabling replication from to for using the DNS name . FRS will keep retrying.
Following are some of the reasons you would see this warning......


This seemed like more of a generic error and so I suspected my fault lied with the first server. I tried restarting the netlogon and ntfrs services as a first resort but the problem still remained. A bit of Googling later and I came across this Microsoft article which sounded like it could be of help.

After reading the article I stopped the ntfrs service on both servers and navigated to the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup registry key. I only had one copy of the SYSVOL directory and so had to be careful to get the next step the right way round, otherwise I would be restoring from backup. On the first server I modified the BurFlags DWORD value to D4 which means do an authoritative restore and on the second server I modified Burflags to D2 which means do a non-authoritative restore.

I started the ntfrs service on the first server and then on the second server. Voila the SYSVOL directory was now replicated and the netlogon service was automatically notified, which in turn shared the SYSVOL directory out. I opened the group policy management console and the network errors were no longer present. A 5 minute group policy change had turned into a nerve racking couple of hours research and fault fixing! However, I am now a wiser man and I hope somebody else will be able to make use of this blog one day.

Good night

Wednesday, 9 June 2010

Managing RAID with VMWare ESXi 4 and Fujitsu Servers

Managing RAID is an integral part of any server installation. Fujitsu servers, with either a typical Windows or Linux installation have no hardware abstraction layer and so the RAID subsystem can be managed with ServerView RAID manager which is provided by Fujitsu.

If, however, you install the server with VMWare ESXi the guest operating systems do not have direct access to the hardware and so using ServerView RAID manager after a default install will not correctly display the RAID subsystem. This was seen as a big negative for using Vmware ESXi 4 as ServerView RAID manager can be very critical when troubleshooting, planning or fixing anything to do with RAID.

After trying to find a way around this situation I stumbled across a brief mention of a way to connect to the CIM API that is provided by VMWare. This allows developers to create software that can talk to the hardware of the server via a CIM broker. ServerView RAID can take advantage of this and a VMWare server can be added via the amCLI command, as shown below (Updated: This works on Windows and Linux):

amCLI -e 21/0 add_server name=1.1.1.1 port=5989 username=root password=*****


Change the server name to an IP address or DNS name of your server, the username and password to the one matching your VMWare installation.

Confirm the addition by running
amCLI -e 21/0 show_server_list


Delete the server by running, changing the name as appropriate
amCLI -e 21/0 delete_server name=1.1.1.1


Log into the ServerView RAID web interface as normal (https://IP Address:3173) using the superuser name and password for the OS. You should now see the RAID adapter from the VMWare ESXi 4 server. This has been tested on a TX150 S6 and TX200 S5 with a LSI1078 RAID card.

If the adapter does not appear then make sure there is a host file entry for the IP address of the guest OS that is running ServerView RAID

Monday, 10 May 2010

Windows Server 2003 R2 Terminal Services and TWAIN Drivers.

I had an interesting problem the other day which is definitely worth a post. We have a customer who run entirely on Thin Clients but needed a document scanner to improve their business. Unfortunately due to budget requirements (as usual) buying a PC was not an option so I was left with the task of configuring the Fujitsu 5120C scanner so the customer could run it at the console.

I downloaded the latest Fujitsu Twain Driver which lists Server 2003 as a supported operating system. The installation went smoothly and I could successfully use the scanner via the Scanners and Cameras option in the Control Panel. However, we needed to use more functionality than this and so I installed the ScandAll21 software that comes with the scanner. After running ScandAll21 I was not able to select the scanner as the correct source. I was running as an Administrator so I didn't suspect permissions.

Some Googling later I came across the following which then lead me to a Microsoft Article KB186499

As explained in the Microsoft article and I created a new Registry Key with the name of the ScandAll21 executable (FIMAGE) and then created a new DWORD value called Flags and gave it a hexadecimal value of 40c. Once I had completed these steps I was able to successfully scan which made both me and the customer very happy.

Saturday, 17 April 2010

Windows 2003 R2 crashing every two days - Event ID 2019 The server was unable to allocate from the system NonPaged pool because the pool was empty.

This was an interesting problem I had recently and is well worthy of a blog post! I was dealing with a server that would stop functioning on the network roughly every two days. There was nothing extraordinary about this server and we have quite a few with very similar configurations. The customer would reboot the server to start functioning again and we would log on remotely to try and determine the cause. After a few crashes we noticed it was always preceded by Event ID 2019 The server was unable to allocate from the system NonPaged pool because the pool was empty.

I started watching the server's Non Paged pool usage with Task Manager and Poolmon but was not able to determine what was causing the problem. At this stage I still wasn't sure whether it was a hardware or software issue so decided to restore the server onto one of ours in the office and let it run for two days. This was over the bank holiday weekend and low and behold the server experienced the same issue. This was great news because now I had the opportunity to do further analysis. I ran Process Explorer, Task Manager and Poolmon but still could not determine the cause (not sure if I was using Poolmon correctly). I have had experience with analysing Minidumps and so thought it would be a good idea to get a full memory dump but needed a way to create a BSOD. In the back of my head I was thinking sysinternals and found reference to NotMyFault.exe which has a /crash switch. I was able to use this to create a BSOD and get a much needed memory dump. You can also use Ctrl+ScrlLck+ScrlLck but must be first enabled in the registry.

Opening this memory dump (C:\WINDOWS\MEMORY.DMP) in Windows Debugging Tools for windows allowed me to do some further analysis. Running the !vm command gave me the following information:-

1: kd> !vm

*** Virtual Memory Usage ***
Physical Memory: 524002 ( 2096008 Kb)
Page File: \??\C:\pagefile.sys
Current: 2095104 Kb Free Space: 1766344 Kb
Minimum: 2095104 Kb Maximum: 4190208 Kb
Available Pages: 178832 ( 715328 Kb)
ResAvail Pages: 439715 ( 1758860 Kb)
Locked IO Pages: 3528 ( 14112 Kb)
Free System PTEs: 234209 ( 936836 Kb)
Free NP PTEs: 319 ( 1276 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 229 ( 916 Kb)
Modified PF Pages: 229 ( 916 Kb)
NonPagedPool Usage: 64932 ( 259728 Kb)
NonPagedPool Max: 65536 ( 262144 Kb)

This shows my NonPagedPool Usage is very close to NonPagedPool Max value. I then ran !poolused 2 which gave me the following:-

kd> !poolused 2
Sorting by NonPaged Pool Consumed

Pool Used:
NonPaged Paged
Tag Allocs Used Allocs Used
AvgU 401672 86761152 0 0 UNKNOWN pooltag 'AvgU', please update pooltag.txt

Although AvgU is a unknown pooltag it was logical to guess that this was related to the Anti virus product AVG 9 and this reference cements these findings. Uninstalling AVG from our test server lead to the problem disappearing.

The customer purchased and installed AVG9 by themselves and so we told them to log a support call with AVG to get a resolution.

Getting to the root cause of the problem in this was very rewarding and highlighted the importance of being able to restore the machine to rule out hardware and to be able to do further diagnosis.

Wednesday, 7 April 2010

Format an RDX cartridge from the command line / scheduled task

Due to the bugginess of Acronis 10.0.11345 I had a situation where I needed to format a removable storage device before a backup plan was scheduled to run. A quick play with the format command and I couldn't get it to run without user interaction. After a bit of research I came across the diskpart command which can be scripted using the /s switch. Create a file with the commands you would like to run eg and save it as format.txt:

Select Volume E:
format FS=NTFS QUICK NOERR OVERRIDE


It goes without saying to change the volume letter to one that matches your configuration. All you have to do is run diskpart /s format.txt and the specified volume letter will be formatted.

This is not limited to RDX devices and so may come in handy for formatting other devices.

Tuesday, 16 March 2010

Is being helpful more trouble than its worth?

After a particular heavy day at work it got me thinking about workloads and the time it was taking to do some tasks in comparison to my colleagues. I work in a team of 7 where there are varying degrees of knowledge. A working day is pretty flexible and there are no specific tasks, apart from one day a week where an individual aids the support department with operating system and hardware calls. However, different individuals treat some of these problems with different attitudes and whilst I may spend up to 30 minutes(or alot more) trying to find the cause of a problem, others may just reboot the server and close the call. Obviously, this can lead to a problem(for me), especially if the problem reoccurs on a day where I'm assisting support. I've lost count of the number of memory leaks or configuration changes I've made after spending the time to understand and diagnose a problem that otherwise seems to have been bouncing around support for days or maybe months.

It then got me thinking about how often my phone rings during the day. Because I actually spend time diagnosing problems it gives me a greater understanding of how things work and therefore better placed to answer specific configuration or scalability questions. I also have a good memory which means throughout the day I am asked what is the IP address of this or how is this set up etc. Answering these questions actually further cements this into memory and the circle continues!

I have lots of ideas of how to improve things but a lot of these need time to be researched and implemented properly. However, I feel as though I spend the majority of my time helping others with their problems or answering questions to things that have been said and documented hundreds of times before!

I love working in IT but some days I feel like I haven't done anything because I have spent more time helping others than doing any tangible work myself. It does sometimes make you think, what is the incentive to be helpful and do a good job? The people at the top are blind to this because its not quantifiable, ie spending 60 minutes now to understand something can save you a lot of time in the future.

It's always easier to ask the guy with the good memory, than to spend some time finding out something for yourself. I'm sure this is true for almost every profession.