Due to the powers that be I was tasked with ensuring no users had any wallpaper and that user's background were set to a specific colour. Easy peasy I thought a few group policy settings and away we go. Unfortunately there is no group policy for Background Colour, so I had to create a custom ADM file which contained the following:
CLASS USER
CATEGORY !!categoryname
KEYNAME "Control Panel\Colors"
POLICY !!policyname
EXPLAIN !!explaintext
PART !!labeltext DROPDOWNLIST REQUIRED
VALUENAME "Background"
ITEMLIST
NAME "Black" VALUE "0 0 0"
NAME "Blue" VALUE "58 110 165"
END ITEMLIST
END PART
END POLICY
END CATEGORY
[strings]
categoryname="Background Color Options"
policyname="Change the background color of the client computer"
explaintext="This policy sets the background color of the client computer"
labeltext="Choose a color"
Save the file in C:\Windows\inf\ as something like BackColour.adm
Using the Group Policy Management console either create and link a new GPO or edit an existing one. Right click Administrative Templates and select Add/Remove Templates, browse to your previous saved adm file and click Add. You should now see Background Color Options under Administrative Templates, however, it is likely the setting is not visible. Click on View, Filtering and untick Only show policy settings that can be fully managed, the setting should now be visible. Modiying the setting should now have the desired effect.
The background colour setting is not much use if a user has a desktop background set. There is already a setting in Group Policy under Desktop\Active Desktop\Active Desktop Wallpaper, which can be modified so the value is just a space ie No Wallpaper. On my Windows 7 machine this cleared my existing wallpaper but under Windows XP the previous wallpaper set by the user was still visible. After some investigating it transpires that this setting modifies the registry at HKCU\Software\Microsoft\Windows\Current Version\Policies\System\Wallpaper but another registry entry at HKCU\Control Panel\Desktop\Wallpaper was taking precedence. This is not the case for Vista\7 machines, presumably Microsoft changed this for situations like this.
To get around this I created another custom adm using the following text:
CLASS USER
CATEGORY !!categoryname
KEYNAME "Control Panel\Desktop"
POLICY !!policyname
EXPLAIN !!explaintext
PART "Path: " EDITTEXT REQUIRED
VALUENAME "Wallpaper"
END PART
END POLICY
END CATEGORY
[strings]
categoryname="Default Background"
policyname="Change the background image of the client computer"
explaintext="This policy sets the background image of the client computer"
labeltext="Choose an image"
Add this into the Group Policy Object using the method described earlier. You can now enter a space for the Path value and this will remove the entry in HKCU\Control Panel\Desktop\Wallpaper which for Windows XP users equates to no background! A little longer than expected but finally the desired effect.
Monday, 15 August 2011
Monday, 25 October 2010
VACUUM FULL crashes PostgreSQL server on Windows.
We use PostgreSQL server for our bespoke software application at work and we run it on both Windows or Linux depending on our clients requirements or current configuration. We do, however, prefer to run it on Linux. I recently installed it on a Windows 2003 Server for one of our customers and all was working well. As part of our install we run some overnight batch files that perform a full backup (pg_dump), a VACUUM FULL and a REINDEX. We have done this numerous times before and never had a problem. After all the customers data had been imported, the following day I received an urgent email saying they could no longer access the system. I remotely accessed the server and could see the Postgres service was no longer running so I started it. I looked at the log file and could see it had failed on the VACUUM FULL with a rather worrying access denied messages:-
I was alarmed that a VACUUM FULL command could cause this sort of message so I set out to find the cause. After some research it transpired that this may be caused by Antivirus installed on the server. We were running AVG 9.0 so I added the PostgreSQL directory to the Resident shield exclude list. The following evening I ran a VACUUM FULL command but this time with Process Monitor running. Process Monitor is an excellent tool from Sysinterals\Microsoft. Again the PostgreSQL server crashed with a similar access denied message so now I had to scour the Process Monitor output to try and find the cause. After a search for the file which had shown in the logs I came accross the following:-
I wasn't sure what a USER MAPPED FILE meant and a quick Google didn't reveal a great deal of information, but there was a post that mentioned AVG. However, it also mentioned that placing the directory in the resident shield exclude list had solved their particular problem, which was not the case here. Further perusing of the Process Monitor log didn't reveal any more information until I realised there was a Filter, Enable Advanced Output option. After ticking this and searching for the file name 16684 again I came up with the following:-
avgchsvx.exe is AVG's Caching server which apparently dramatically increases performance. I decided to disable this feature in AVG's Tools, Advanced settings to see if the problem would go away. I have since been able to run VACUUM FULL on my database server so it looks like this was the cause of the problem. This is a good example of using Process Monitor to diagnose any potential file access issues.
Hopefully this may help someone suffering the same problem.
127.0.0.12010-10-17 19:42:39 BST ERROR: could not truncate relation 1663/16410/16684 to 18544 blocks: Permission denied
127.0.0.12010-10-17 19:42:39 BST STATEMENT: VACUUM FULL;
127.0.0.12010-10-17 19:42:39 BST PANIC: cannot abort transaction 227697, it was already committed
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
2010-10-17 19:42:39 BST LOG: server process (PID 6764) exited with exit code 3
2010-10-17 19:42:39 BST LOG: terminating any other active server processes
127.0.0.12010-10-17 19:42:39 BST WARNING: terminating connection because of crash of another server process
127.0.0.12010-10-17 19:42:39 BST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
I was alarmed that a VACUUM FULL command could cause this sort of message so I set out to find the cause. After some research it transpired that this may be caused by Antivirus installed on the server. We were running AVG 9.0 so I added the PostgreSQL directory to the Resident shield exclude list. The following evening I ran a VACUUM FULL command but this time with Process Monitor running. Process Monitor is an excellent tool from Sysinterals\Microsoft. Again the PostgreSQL server crashed with a similar access denied message so now I had to scour the Process Monitor output to try and find the cause. After a search for the file which had shown in the logs I came accross the following:-
I wasn't sure what a USER MAPPED FILE meant and a quick Google didn't reveal a great deal of information, but there was a post that mentioned AVG. However, it also mentioned that placing the directory in the resident shield exclude list had solved their particular problem, which was not the case here. Further perusing of the Process Monitor log didn't reveal any more information until I realised there was a Filter, Enable Advanced Output option. After ticking this and searching for the file name 16684 again I came up with the following:-
avgchsvx.exe is AVG's Caching server which apparently dramatically increases performance. I decided to disable this feature in AVG's Tools, Advanced settings to see if the problem would go away. I have since been able to run VACUUM FULL on my database server so it looks like this was the cause of the problem. This is a good example of using Process Monitor to diagnose any potential file access issues.
Hopefully this may help someone suffering the same problem.
Sunday, 15 August 2010
WARNING Piratebay drive-by infection
Yesterday I visited piratebay.org and performed a search. I was then surprised so see a Java splash screen and I was immediately suspicious. Within seconds AVG then popped up proclaiming a threat could not be removed would you like to force it and I clicked yes! Looking at AVG's Resident shield detection history showed the following:-
Since there was a relatively large list of executables I believed that AVG may have actually done it's job and stopped the infection in its tracks. However, it quickly became apparent that AVG had been about as much use as a chocoloate fireguard as I was being redirected to a vast array of advertisement websites. Surprisingly this was in Opera which I naively thought was more resistant to malicious programs. Opening Internet Explorer also showed the Tango toolbar had been installed.
I already had MalwareBytes and Spyware Terminator installed so I updated and ran full scans in both. Some more executables were found and removed after a reboot. Full scans from AVG, Malwarebytes and Spyware Terminator were showing as clean and I thought I was in the clear. It didn't take long for me to realise this was not the case as once again I was being redirected to advertisement websites, interestingly enough only in Opera. A bit of research led me to ComboFix.exe. I downloaded this to my Desktop as advised and disabled AVG's resident shield so it did not affect ComboFix while it was running. Right-clicking the downloaded exe and clicking Run as Administrator allows it to run with the highest privileges and maximise the chance of malware removal. Combofix said it had detected Rootkit activity and needed to be rebooted, which was duly completed and upon logon started again. Once again Rootkit activity was detected and Combofix said a further reboot was required. Some more executables were cleared and RDPCDD.sys was removed. Aha, this was it I thought and another full scan from AVG also cleared RDPCCD.sys from the C:\WINDOWS\winsxs directory.
Later that night I started being redirected to advertisement websites again and the situation was now getting very frustrating! I decided to run Combofix.exe again and after a reboot combofix.txt had the following lines amongst others:
However, after this apparent fix I was still getting the occasional redirect. I decided to upload my copies of explorer.exe and wininit.exe to VirusTotal and sure enough both came back as infected. I then decided to boot from a Windows 7 recovery cd which allowed me to run a command prompt from a known clean environment. My first step was to attempt to replace these infected files. Unfortunately the recovery cd does not have C:\WINDOWS\explorer.exe but it does have C:\WINDOWS\System32\wininit.exe. Looking at the timestamps of these files implied they were identical (although they couldn't be). I had nothing to lose so I did the following:
With regards to explorer.exe I wasn't sure what to do so I decided to hunt my C:\ drive for any available copy using the trusty command:
I was intrigued by a copy of explorer.exe that was found in the C:\WINDOWS\ERDNT\cache directory and since I had nothing to lose I replaced my original explorer.exe with this one:
I then rebooted into Windows 7 successfully, however, this was nothing new since I was able to do this previously. My first step was to upload my replacement executables to VirusTotal and they both tested clean, which was encouraging because I couldn't be sure they were not being infected by some other hidden process. I have now been running for a full day without any noticeable problems. Hopefully this will be the end of this saga.
I was intrigued as to what created the cache folder in the ERDNT directory. I think Combofix does this and it goes over and above the ERUNT application which is used to backup the registry only. Whatever created it I am thankful for and if anyone knows feel free to leave a comment.
Since there was a relatively large list of executables I believed that AVG may have actually done it's job and stopped the infection in its tracks. However, it quickly became apparent that AVG had been about as much use as a chocoloate fireguard as I was being redirected to a vast array of advertisement websites. Surprisingly this was in Opera which I naively thought was more resistant to malicious programs. Opening Internet Explorer also showed the Tango toolbar had been installed.
I already had MalwareBytes and Spyware Terminator installed so I updated and ran full scans in both. Some more executables were found and removed after a reboot. Full scans from AVG, Malwarebytes and Spyware Terminator were showing as clean and I thought I was in the clear. It didn't take long for me to realise this was not the case as once again I was being redirected to advertisement websites, interestingly enough only in Opera. A bit of research led me to ComboFix.exe. I downloaded this to my Desktop as advised and disabled AVG's resident shield so it did not affect ComboFix while it was running. Right-clicking the downloaded exe and clicking Run as Administrator allows it to run with the highest privileges and maximise the chance of malware removal. Combofix said it had detected Rootkit activity and needed to be rebooted, which was duly completed and upon logon started again. Once again Rootkit activity was detected and Combofix said a further reboot was required. Some more executables were cleared and RDPCDD.sys was removed. Aha, this was it I thought and another full scan from AVG also cleared RDPCCD.sys from the C:\WINDOWS\winsxs directory.
Later that night I started being redirected to advertisement websites again and the situation was now getting very frustrating! I decided to run Combofix.exe again and after a reboot combofix.txt had the following lines amongst others:
Infected copy of c:\windows\explorer.exe was found and disinfected
Restored copy from - c:\windows\winsxs\.....\explorer.exe
Infected copy of c:\windows\system32\wininit.exe was found and disinfected
Restored copy from - c:\windows\winsxs\...\wininit.exe
However, after this apparent fix I was still getting the occasional redirect. I decided to upload my copies of explorer.exe and wininit.exe to VirusTotal and sure enough both came back as infected. I then decided to boot from a Windows 7 recovery cd which allowed me to run a command prompt from a known clean environment. My first step was to attempt to replace these infected files. Unfortunately the recovery cd does not have C:\WINDOWS\explorer.exe but it does have C:\WINDOWS\System32\wininit.exe. Looking at the timestamps of these files implied they were identical (although they couldn't be). I had nothing to lose so I did the following:
c:
cd \windows\system32
move wininit.exe wininit.old
copy x:\windows\system32\wininit.exe .
With regards to explorer.exe I wasn't sure what to do so I decided to hunt my C:\ drive for any available copy using the trusty command:
c:
cd \
dir /s explorer.exe
I was intrigued by a copy of explorer.exe that was found in the C:\WINDOWS\ERDNT\cache directory and since I had nothing to lose I replaced my original explorer.exe with this one:
c:
cd \windows
move explorer.exe explorer.old
copy ERDNT\cache\explorer.exe .
I then rebooted into Windows 7 successfully, however, this was nothing new since I was able to do this previously. My first step was to upload my replacement executables to VirusTotal and they both tested clean, which was encouraging because I couldn't be sure they were not being infected by some other hidden process. I have now been running for a full day without any noticeable problems. Hopefully this will be the end of this saga.
I was intrigued as to what created the cache folder in the ERDNT directory. I think Combofix does this and it goes over and above the ERUNT application which is used to backup the registry only. Whatever created it I am thankful for and if anyone knows feel free to leave a comment.
Thursday, 5 August 2010
Where's my SYSVOL gone!
I had recently installed a second domain controller and made it the PDC for one of our clients and the process had gone smoothly, at least that's what I thought. I received a request from this client to make a configuration change, which could be done via group policy. After opening the Group Policy Management Console I was greeted with the error message, The Network Path was not found. Strange, I tried browsing to \\domain\SYSVOL and I could successfully browse the NETLOGON and SYSVOL shares.
After a bit of research it transpired that the group policy management console tries to connect to the PDC when working with group policies. I tried browsing to my SYSVOL share on the new domain controller and low and behold it was not there! As you can imagine this concerned me greatly. The first place to check was the File Replication Service log in the Event Viewer of both servers. The older of the two domain controllers log was full of Errors with Event ID: 13568
This error had been present for more than two years. The second server's log was full of Event ID: 13508 warnings
This seemed like more of a generic error and so I suspected my fault lied with the first server. I tried restarting the netlogon and ntfrs services as a first resort but the problem still remained. A bit of Googling later and I came across this Microsoft article which sounded like it could be of help.
After reading the article I stopped the ntfrs service on both servers and navigated to the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup registry key. I only had one copy of the SYSVOL directory and so had to be careful to get the next step the right way round, otherwise I would be restoring from backup. On the first server I modified the BurFlags DWORD value to D4 which means do an authoritative restore and on the second server I modified Burflags to D2 which means do a non-authoritative restore.
I started the ntfrs service on the first server and then on the second server. Voila the SYSVOL directory was now replicated and the netlogon service was automatically notified, which in turn shared the SYSVOL directory out. I opened the group policy management console and the network errors were no longer present. A 5 minute group policy change had turned into a nerve racking couple of hours research and fault fixing! However, I am now a wiser man and I hope somebody else will be able to make use of this blog one day.
Good night
After a bit of research it transpired that the group policy management console tries to connect to the PDC when working with group policies. I tried browsing to my SYSVOL share on the new domain controller and low and behold it was not there! As you can imagine this concerned me greatly. The first place to check was the File Replication Service log in the Event Viewer of both servers. The older of the two domain controllers log was full of Errors with Event ID: 13568
The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR....
This error had been present for more than two years. The second server's log was full of Event ID: 13508 warnings
The File Replication Service is having trouble enabling replication fromto for using the DNS name . FRS will keep retrying.
Following are some of the reasons you would see this warning......
This seemed like more of a generic error and so I suspected my fault lied with the first server. I tried restarting the netlogon and ntfrs services as a first resort but the problem still remained. A bit of Googling later and I came across this Microsoft article which sounded like it could be of help.
After reading the article I stopped the ntfrs service on both servers and navigated to the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup registry key. I only had one copy of the SYSVOL directory and so had to be careful to get the next step the right way round, otherwise I would be restoring from backup. On the first server I modified the BurFlags DWORD value to D4 which means do an authoritative restore and on the second server I modified Burflags to D2 which means do a non-authoritative restore.
I started the ntfrs service on the first server and then on the second server. Voila the SYSVOL directory was now replicated and the netlogon service was automatically notified, which in turn shared the SYSVOL directory out. I opened the group policy management console and the network errors were no longer present. A 5 minute group policy change had turned into a nerve racking couple of hours research and fault fixing! However, I am now a wiser man and I hope somebody else will be able to make use of this blog one day.
Good night
Wednesday, 9 June 2010
Managing RAID with VMWare ESXi 4 and Fujitsu Servers
Managing RAID is an integral part of any server installation. Fujitsu servers, with either a typical Windows or Linux installation have no hardware abstraction layer and so the RAID subsystem can be managed with ServerView RAID manager which is provided by Fujitsu.
If, however, you install the server with VMWare ESXi the guest operating systems do not have direct access to the hardware and so using ServerView RAID manager after a default install will not correctly display the RAID subsystem. This was seen as a big negative for using Vmware ESXi 4 as ServerView RAID manager can be very critical when troubleshooting, planning or fixing anything to do with RAID.
After trying to find a way around this situation I stumbled across a brief mention of a way to connect to the CIM API that is provided by VMWare. This allows developers to create software that can talk to the hardware of the server via a CIM broker. ServerView RAID can take advantage of this and a VMWare server can be added via the amCLI command, as shown below (Updated: This works on Windows and Linux):
Change the server name to an IP address or DNS name of your server, the username and password to the one matching your VMWare installation.
Confirm the addition by running
Delete the server by running, changing the name as appropriate
Log into the ServerView RAID web interface as normal (https://IP Address:3173) using the superuser name and password for the OS. You should now see the RAID adapter from the VMWare ESXi 4 server. This has been tested on a TX150 S6 and TX200 S5 with a LSI1078 RAID card.
If the adapter does not appear then make sure there is a host file entry for the IP address of the guest OS that is running ServerView RAID
If, however, you install the server with VMWare ESXi the guest operating systems do not have direct access to the hardware and so using ServerView RAID manager after a default install will not correctly display the RAID subsystem. This was seen as a big negative for using Vmware ESXi 4 as ServerView RAID manager can be very critical when troubleshooting, planning or fixing anything to do with RAID.
After trying to find a way around this situation I stumbled across a brief mention of a way to connect to the CIM API that is provided by VMWare. This allows developers to create software that can talk to the hardware of the server via a CIM broker. ServerView RAID can take advantage of this and a VMWare server can be added via the amCLI command, as shown below (Updated: This works on Windows and Linux):
amCLI -e 21/0 add_server name=1.1.1.1 port=5989 username=root password=*****
Change the server name to an IP address or DNS name of your server, the username and password to the one matching your VMWare installation.
Confirm the addition by running
amCLI -e 21/0 show_server_list
Delete the server by running, changing the name as appropriate
amCLI -e 21/0 delete_server name=1.1.1.1
Log into the ServerView RAID web interface as normal (https://IP Address:3173) using the superuser name and password for the OS. You should now see the RAID adapter from the VMWare ESXi 4 server. This has been tested on a TX150 S6 and TX200 S5 with a LSI1078 RAID card.
If the adapter does not appear then make sure there is a host file entry for the IP address of the guest OS that is running ServerView RAID
Monday, 10 May 2010
Windows Server 2003 R2 Terminal Services and TWAIN Drivers.
I had an interesting problem the other day which is definitely worth a post. We have a customer who run entirely on Thin Clients but needed a document scanner to improve their business. Unfortunately due to budget requirements (as usual) buying a PC was not an option so I was left with the task of configuring the Fujitsu 5120C scanner so the customer could run it at the console.
I downloaded the latest Fujitsu Twain Driver which lists Server 2003 as a supported operating system. The installation went smoothly and I could successfully use the scanner via the Scanners and Cameras option in the Control Panel. However, we needed to use more functionality than this and so I installed the ScandAll21 software that comes with the scanner. After running ScandAll21 I was not able to select the scanner as the correct source. I was running as an Administrator so I didn't suspect permissions.
Some Googling later I came across the following which then lead me to a Microsoft Article KB186499
As explained in the Microsoft article and I created a new Registry Key with the name of the ScandAll21 executable (FIMAGE) and then created a new DWORD value called Flags and gave it a hexadecimal value of 40c. Once I had completed these steps I was able to successfully scan which made both me and the customer very happy.
I downloaded the latest Fujitsu Twain Driver which lists Server 2003 as a supported operating system. The installation went smoothly and I could successfully use the scanner via the Scanners and Cameras option in the Control Panel. However, we needed to use more functionality than this and so I installed the ScandAll21 software that comes with the scanner. After running ScandAll21 I was not able to select the scanner as the correct source. I was running as an Administrator so I didn't suspect permissions.
Some Googling later I came across the following which then lead me to a Microsoft Article KB186499
As explained in the Microsoft article and I created a new Registry Key with the name of the ScandAll21 executable (FIMAGE) and then created a new DWORD value called Flags and gave it a hexadecimal value of 40c. Once I had completed these steps I was able to successfully scan which made both me and the customer very happy.
Saturday, 17 April 2010
Windows 2003 R2 crashing every two days - Event ID 2019 The server was unable to allocate from the system NonPaged pool because the pool was empty.
This was an interesting problem I had recently and is well worthy of a blog post! I was dealing with a server that would stop functioning on the network roughly every two days. There was nothing extraordinary about this server and we have quite a few with very similar configurations. The customer would reboot the server to start functioning again and we would log on remotely to try and determine the cause. After a few crashes we noticed it was always preceded by Event ID 2019 The server was unable to allocate from the system NonPaged pool because the pool was empty.
I started watching the server's Non Paged pool usage with Task Manager and Poolmon but was not able to determine what was causing the problem. At this stage I still wasn't sure whether it was a hardware or software issue so decided to restore the server onto one of ours in the office and let it run for two days. This was over the bank holiday weekend and low and behold the server experienced the same issue. This was great news because now I had the opportunity to do further analysis. I ran Process Explorer, Task Manager and Poolmon but still could not determine the cause (not sure if I was using Poolmon correctly). I have had experience with analysing Minidumps and so thought it would be a good idea to get a full memory dump but needed a way to create a BSOD. In the back of my head I was thinking sysinternals and found reference to NotMyFault.exe which has a /crash switch. I was able to use this to create a BSOD and get a much needed memory dump. You can also use Ctrl+ScrlLck+ScrlLck but must be first enabled in the registry.
Opening this memory dump (C:\WINDOWS\MEMORY.DMP) in Windows Debugging Tools for windows allowed me to do some further analysis. Running the !vm command gave me the following information:-
1: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 524002 ( 2096008 Kb)
Page File: \??\C:\pagefile.sys
Current: 2095104 Kb Free Space: 1766344 Kb
Minimum: 2095104 Kb Maximum: 4190208 Kb
Available Pages: 178832 ( 715328 Kb)
ResAvail Pages: 439715 ( 1758860 Kb)
Locked IO Pages: 3528 ( 14112 Kb)
Free System PTEs: 234209 ( 936836 Kb)
Free NP PTEs: 319 ( 1276 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 229 ( 916 Kb)
Modified PF Pages: 229 ( 916 Kb)
NonPagedPool Usage: 64932 ( 259728 Kb)
NonPagedPool Max: 65536 ( 262144 Kb)
This shows my NonPagedPool Usage is very close to NonPagedPool Max value. I then ran !poolused 2 which gave me the following:-
kd> !poolused 2
Sorting by NonPaged Pool Consumed
Pool Used:
NonPaged Paged
Tag Allocs Used Allocs Used
AvgU 401672 86761152 0 0 UNKNOWN pooltag 'AvgU', please update pooltag.txt
Although AvgU is a unknown pooltag it was logical to guess that this was related to the Anti virus product AVG 9 and this reference cements these findings. Uninstalling AVG from our test server lead to the problem disappearing.
The customer purchased and installed AVG9 by themselves and so we told them to log a support call with AVG to get a resolution.
Getting to the root cause of the problem in this was very rewarding and highlighted the importance of being able to restore the machine to rule out hardware and to be able to do further diagnosis.
I started watching the server's Non Paged pool usage with Task Manager and Poolmon but was not able to determine what was causing the problem. At this stage I still wasn't sure whether it was a hardware or software issue so decided to restore the server onto one of ours in the office and let it run for two days. This was over the bank holiday weekend and low and behold the server experienced the same issue. This was great news because now I had the opportunity to do further analysis. I ran Process Explorer, Task Manager and Poolmon but still could not determine the cause (not sure if I was using Poolmon correctly). I have had experience with analysing Minidumps and so thought it would be a good idea to get a full memory dump but needed a way to create a BSOD. In the back of my head I was thinking sysinternals and found reference to NotMyFault.exe which has a /crash switch. I was able to use this to create a BSOD and get a much needed memory dump. You can also use Ctrl+ScrlLck+ScrlLck but must be first enabled in the registry.
Opening this memory dump (C:\WINDOWS\MEMORY.DMP) in Windows Debugging Tools for windows allowed me to do some further analysis. Running the !vm command gave me the following information:-
1: kd> !vm
*** Virtual Memory Usage ***
Physical Memory: 524002 ( 2096008 Kb)
Page File: \??\C:\pagefile.sys
Current: 2095104 Kb Free Space: 1766344 Kb
Minimum: 2095104 Kb Maximum: 4190208 Kb
Available Pages: 178832 ( 715328 Kb)
ResAvail Pages: 439715 ( 1758860 Kb)
Locked IO Pages: 3528 ( 14112 Kb)
Free System PTEs: 234209 ( 936836 Kb)
Free NP PTEs: 319 ( 1276 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 229 ( 916 Kb)
Modified PF Pages: 229 ( 916 Kb)
NonPagedPool Usage: 64932 ( 259728 Kb)
NonPagedPool Max: 65536 ( 262144 Kb)
This shows my NonPagedPool Usage is very close to NonPagedPool Max value. I then ran !poolused 2 which gave me the following:-
kd> !poolused 2
Sorting by NonPaged Pool Consumed
Pool Used:
NonPaged Paged
Tag Allocs Used Allocs Used
AvgU 401672 86761152 0 0 UNKNOWN pooltag 'AvgU', please update pooltag.txt
Although AvgU is a unknown pooltag it was logical to guess that this was related to the Anti virus product AVG 9 and this reference cements these findings. Uninstalling AVG from our test server lead to the problem disappearing.
The customer purchased and installed AVG9 by themselves and so we told them to log a support call with AVG to get a resolution.
Getting to the root cause of the problem in this was very rewarding and highlighted the importance of being able to restore the machine to rule out hardware and to be able to do further diagnosis.
Subscribe to:
Posts (Atom)