Monday 25 October 2010

VACUUM FULL crashes PostgreSQL server on Windows.

We use PostgreSQL server for our bespoke software application at work and we run it on both Windows or Linux depending on our clients requirements or current configuration. We do, however, prefer to run it on Linux. I recently installed it on a Windows 2003 Server for one of our customers and all was working well. As part of our install we run some overnight batch files that perform a full backup (pg_dump), a VACUUM FULL and a REINDEX. We have done this numerous times before and never had a problem. After all the customers data had been imported, the following day I received an urgent email saying they could no longer access the system. I remotely accessed the server and could see the Postgres service was no longer running so I started it. I looked at the log file and could see it had failed on the VACUUM FULL with a rather worrying access denied messages:-

127.0.0.12010-10-17 19:42:39 BST ERROR: could not truncate relation 1663/16410/16684 to 18544 blocks: Permission denied
127.0.0.12010-10-17 19:42:39 BST STATEMENT: VACUUM FULL;
127.0.0.12010-10-17 19:42:39 BST PANIC: cannot abort transaction 227697, it was already committed


This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

2010-10-17 19:42:39 BST LOG: server process (PID 6764) exited with exit code 3
2010-10-17 19:42:39 BST LOG: terminating any other active server processes
127.0.0.12010-10-17 19:42:39 BST WARNING: terminating connection because of crash of another server process
127.0.0.12010-10-17 19:42:39 BST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.


I was alarmed that a VACUUM FULL command could cause this sort of message so I set out to find the cause. After some research it transpired that this may be caused by Antivirus installed on the server. We were running AVG 9.0 so I added the PostgreSQL directory to the Resident shield exclude list. The following evening I ran a VACUUM FULL command but this time with Process Monitor running. Process Monitor is an excellent tool from Sysinterals\Microsoft. Again the PostgreSQL server crashed with a similar access denied message so now I had to scour the Process Monitor output to try and find the cause. After a search for the file which had shown in the logs I came accross the following:-



I wasn't sure what a USER MAPPED FILE meant and a quick Google didn't reveal a great deal of information, but there was a post that mentioned AVG. However, it also mentioned that placing the directory in the resident shield exclude list had solved their particular problem, which was not the case here. Further perusing of the Process Monitor log didn't reveal any more information until I realised there was a Filter, Enable Advanced Output option. After ticking this and searching for the file name 16684 again I came up with the following:-



avgchsvx.exe is AVG's Caching server which apparently dramatically increases performance. I decided to disable this feature in AVG's Tools, Advanced settings to see if the problem would go away. I have since been able to run VACUUM FULL on my database server so it looks like this was the cause of the problem. This is a good example of using Process Monitor to diagnose any potential file access issues.

Hopefully this may help someone suffering the same problem.

Sunday 15 August 2010

WARNING Piratebay drive-by infection

Yesterday I visited piratebay.org and performed a search. I was then surprised so see a Java splash screen and I was immediately suspicious. Within seconds AVG then popped up proclaiming a threat could not be removed would you like to force it and I clicked yes! Looking at AVG's Resident shield detection history showed the following:-



Since there was a relatively large list of executables I believed that AVG may have actually done it's job and stopped the infection in its tracks. However, it quickly became apparent that AVG had been about as much use as a chocoloate fireguard as I was being redirected to a vast array of advertisement websites. Surprisingly this was in Opera which I naively thought was more resistant to malicious programs. Opening Internet Explorer also showed the Tango toolbar had been installed.

I already had MalwareBytes and Spyware Terminator installed so I updated and ran full scans in both. Some more executables were found and removed after a reboot. Full scans from AVG, Malwarebytes and Spyware Terminator were showing as clean and I thought I was in the clear. It didn't take long for me to realise this was not the case as once again I was being redirected to advertisement websites, interestingly enough only in Opera. A bit of research led me to ComboFix.exe. I downloaded this to my Desktop as advised and disabled AVG's resident shield so it did not affect ComboFix while it was running. Right-clicking the downloaded exe and clicking Run as Administrator allows it to run with the highest privileges and maximise the chance of malware removal. Combofix said it had detected Rootkit activity and needed to be rebooted, which was duly completed and upon logon started again. Once again Rootkit activity was detected and Combofix said a further reboot was required. Some more executables were cleared and RDPCDD.sys was removed. Aha, this was it I thought and another full scan from AVG also cleared RDPCCD.sys from the C:\WINDOWS\winsxs directory.

Later that night I started being redirected to advertisement websites again and the situation was now getting very frustrating! I decided to run Combofix.exe again and after a reboot combofix.txt had the following lines amongst others:

Infected copy of c:\windows\explorer.exe was found and disinfected
Restored copy from - c:\windows\winsxs\.....\explorer.exe
Infected copy of c:\windows\system32\wininit.exe was found and disinfected
Restored copy from - c:\windows\winsxs\...\wininit.exe

However, after this apparent fix I was still getting the occasional redirect. I decided to upload my copies of explorer.exe and wininit.exe to VirusTotal and sure enough both came back as infected. I then decided to boot from a Windows 7 recovery cd which allowed me to run a command prompt from a known clean environment. My first step was to attempt to replace these infected files. Unfortunately the recovery cd does not have C:\WINDOWS\explorer.exe but it does have C:\WINDOWS\System32\wininit.exe. Looking at the timestamps of these files implied they were identical (although they couldn't be). I had nothing to lose so I did the following:

c:
cd \windows\system32
move wininit.exe wininit.old
copy x:\windows\system32\wininit.exe .

With regards to explorer.exe I wasn't sure what to do so I decided to hunt my C:\ drive for any available copy using the trusty command:

c:
cd \
dir /s explorer.exe

I was intrigued by a copy of explorer.exe that was found in the C:\WINDOWS\ERDNT\cache directory and since I had nothing to lose I replaced my original explorer.exe with this one:

c:
cd \windows
move explorer.exe explorer.old
copy ERDNT\cache\explorer.exe .

I then rebooted into Windows 7 successfully, however, this was nothing new since I was able to do this previously. My first step was to upload my replacement executables to VirusTotal and they both tested clean, which was encouraging because I couldn't be sure they were not being infected by some other hidden process. I have now been running for a full day without any noticeable problems. Hopefully this will be the end of this saga.

I was intrigued as to what created the cache folder in the ERDNT directory. I think Combofix does this and it goes over and above the ERUNT application which is used to backup the registry only. Whatever created it I am thankful for and if anyone knows feel free to leave a comment.

Thursday 5 August 2010

Where's my SYSVOL gone!

I had recently installed a second domain controller and made it the PDC for one of our clients and the process had gone smoothly, at least that's what I thought. I received a request from this client to make a configuration change, which could be done via group policy. After opening the Group Policy Management Console I was greeted with the error message, The Network Path was not found. Strange, I tried browsing to \\domain\SYSVOL and I could successfully browse the NETLOGON and SYSVOL shares.

After a bit of research it transpired that the group policy management console tries to connect to the PDC when working with group policies. I tried browsing to my SYSVOL share on the new domain controller and low and behold it was not there! As you can imagine this concerned me greatly. The first place to check was the File Replication Service log in the Event Viewer of both servers. The older of the two domain controllers log was full of Errors with Event ID: 13568
The File Replication Service has detected that the replica set "DOMAIN SYSTEM VOLUME (SYSVOL SHARE)" is in JRNL_WRAP_ERROR....


This error had been present for more than two years. The second server's log was full of Event ID: 13508 warnings
The File Replication Service is having trouble enabling replication from to for using the DNS name . FRS will keep retrying.
Following are some of the reasons you would see this warning......


This seemed like more of a generic error and so I suspected my fault lied with the first server. I tried restarting the netlogon and ntfrs services as a first resort but the problem still remained. A bit of Googling later and I came across this Microsoft article which sounded like it could be of help.

After reading the article I stopped the ntfrs service on both servers and navigated to the HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NtFrs\Parameters\Backup/Restore\Process at Startup registry key. I only had one copy of the SYSVOL directory and so had to be careful to get the next step the right way round, otherwise I would be restoring from backup. On the first server I modified the BurFlags DWORD value to D4 which means do an authoritative restore and on the second server I modified Burflags to D2 which means do a non-authoritative restore.

I started the ntfrs service on the first server and then on the second server. Voila the SYSVOL directory was now replicated and the netlogon service was automatically notified, which in turn shared the SYSVOL directory out. I opened the group policy management console and the network errors were no longer present. A 5 minute group policy change had turned into a nerve racking couple of hours research and fault fixing! However, I am now a wiser man and I hope somebody else will be able to make use of this blog one day.

Good night

Wednesday 9 June 2010

Managing RAID with VMWare ESXi 4 and Fujitsu Servers

Managing RAID is an integral part of any server installation. Fujitsu servers, with either a typical Windows or Linux installation have no hardware abstraction layer and so the RAID subsystem can be managed with ServerView RAID manager which is provided by Fujitsu.

If, however, you install the server with VMWare ESXi the guest operating systems do not have direct access to the hardware and so using ServerView RAID manager after a default install will not correctly display the RAID subsystem. This was seen as a big negative for using Vmware ESXi 4 as ServerView RAID manager can be very critical when troubleshooting, planning or fixing anything to do with RAID.

After trying to find a way around this situation I stumbled across a brief mention of a way to connect to the CIM API that is provided by VMWare. This allows developers to create software that can talk to the hardware of the server via a CIM broker. ServerView RAID can take advantage of this and a VMWare server can be added via the amCLI command, as shown below (Updated: This works on Windows and Linux):

amCLI -e 21/0 add_server name=1.1.1.1 port=5989 username=root password=*****


Change the server name to an IP address or DNS name of your server, the username and password to the one matching your VMWare installation.

Confirm the addition by running
amCLI -e 21/0 show_server_list


Delete the server by running, changing the name as appropriate
amCLI -e 21/0 delete_server name=1.1.1.1


Log into the ServerView RAID web interface as normal (https://IP Address:3173) using the superuser name and password for the OS. You should now see the RAID adapter from the VMWare ESXi 4 server. This has been tested on a TX150 S6 and TX200 S5 with a LSI1078 RAID card.

If the adapter does not appear then make sure there is a host file entry for the IP address of the guest OS that is running ServerView RAID

Monday 10 May 2010

Windows Server 2003 R2 Terminal Services and TWAIN Drivers.

I had an interesting problem the other day which is definitely worth a post. We have a customer who run entirely on Thin Clients but needed a document scanner to improve their business. Unfortunately due to budget requirements (as usual) buying a PC was not an option so I was left with the task of configuring the Fujitsu 5120C scanner so the customer could run it at the console.

I downloaded the latest Fujitsu Twain Driver which lists Server 2003 as a supported operating system. The installation went smoothly and I could successfully use the scanner via the Scanners and Cameras option in the Control Panel. However, we needed to use more functionality than this and so I installed the ScandAll21 software that comes with the scanner. After running ScandAll21 I was not able to select the scanner as the correct source. I was running as an Administrator so I didn't suspect permissions.

Some Googling later I came across the following which then lead me to a Microsoft Article KB186499

As explained in the Microsoft article and I created a new Registry Key with the name of the ScandAll21 executable (FIMAGE) and then created a new DWORD value called Flags and gave it a hexadecimal value of 40c. Once I had completed these steps I was able to successfully scan which made both me and the customer very happy.

Saturday 17 April 2010

Windows 2003 R2 crashing every two days - Event ID 2019 The server was unable to allocate from the system NonPaged pool because the pool was empty.

This was an interesting problem I had recently and is well worthy of a blog post! I was dealing with a server that would stop functioning on the network roughly every two days. There was nothing extraordinary about this server and we have quite a few with very similar configurations. The customer would reboot the server to start functioning again and we would log on remotely to try and determine the cause. After a few crashes we noticed it was always preceded by Event ID 2019 The server was unable to allocate from the system NonPaged pool because the pool was empty.

I started watching the server's Non Paged pool usage with Task Manager and Poolmon but was not able to determine what was causing the problem. At this stage I still wasn't sure whether it was a hardware or software issue so decided to restore the server onto one of ours in the office and let it run for two days. This was over the bank holiday weekend and low and behold the server experienced the same issue. This was great news because now I had the opportunity to do further analysis. I ran Process Explorer, Task Manager and Poolmon but still could not determine the cause (not sure if I was using Poolmon correctly). I have had experience with analysing Minidumps and so thought it would be a good idea to get a full memory dump but needed a way to create a BSOD. In the back of my head I was thinking sysinternals and found reference to NotMyFault.exe which has a /crash switch. I was able to use this to create a BSOD and get a much needed memory dump. You can also use Ctrl+ScrlLck+ScrlLck but must be first enabled in the registry.

Opening this memory dump (C:\WINDOWS\MEMORY.DMP) in Windows Debugging Tools for windows allowed me to do some further analysis. Running the !vm command gave me the following information:-

1: kd> !vm

*** Virtual Memory Usage ***
Physical Memory: 524002 ( 2096008 Kb)
Page File: \??\C:\pagefile.sys
Current: 2095104 Kb Free Space: 1766344 Kb
Minimum: 2095104 Kb Maximum: 4190208 Kb
Available Pages: 178832 ( 715328 Kb)
ResAvail Pages: 439715 ( 1758860 Kb)
Locked IO Pages: 3528 ( 14112 Kb)
Free System PTEs: 234209 ( 936836 Kb)
Free NP PTEs: 319 ( 1276 Kb)
Free Special NP: 0 ( 0 Kb)
Modified Pages: 229 ( 916 Kb)
Modified PF Pages: 229 ( 916 Kb)
NonPagedPool Usage: 64932 ( 259728 Kb)
NonPagedPool Max: 65536 ( 262144 Kb)

This shows my NonPagedPool Usage is very close to NonPagedPool Max value. I then ran !poolused 2 which gave me the following:-

kd> !poolused 2
Sorting by NonPaged Pool Consumed

Pool Used:
NonPaged Paged
Tag Allocs Used Allocs Used
AvgU 401672 86761152 0 0 UNKNOWN pooltag 'AvgU', please update pooltag.txt

Although AvgU is a unknown pooltag it was logical to guess that this was related to the Anti virus product AVG 9 and this reference cements these findings. Uninstalling AVG from our test server lead to the problem disappearing.

The customer purchased and installed AVG9 by themselves and so we told them to log a support call with AVG to get a resolution.

Getting to the root cause of the problem in this was very rewarding and highlighted the importance of being able to restore the machine to rule out hardware and to be able to do further diagnosis.

Wednesday 7 April 2010

Format an RDX cartridge from the command line / scheduled task

Due to the bugginess of Acronis 10.0.11345 I had a situation where I needed to format a removable storage device before a backup plan was scheduled to run. A quick play with the format command and I couldn't get it to run without user interaction. After a bit of research I came across the diskpart command which can be scripted using the /s switch. Create a file with the commands you would like to run eg and save it as format.txt:

Select Volume E:
format FS=NTFS QUICK NOERR OVERRIDE


It goes without saying to change the volume letter to one that matches your configuration. All you have to do is run diskpart /s format.txt and the specified volume letter will be formatted.

This is not limited to RDX devices and so may come in handy for formatting other devices.

Tuesday 16 March 2010

Is being helpful more trouble than its worth?

After a particular heavy day at work it got me thinking about workloads and the time it was taking to do some tasks in comparison to my colleagues. I work in a team of 7 where there are varying degrees of knowledge. A working day is pretty flexible and there are no specific tasks, apart from one day a week where an individual aids the support department with operating system and hardware calls. However, different individuals treat some of these problems with different attitudes and whilst I may spend up to 30 minutes(or alot more) trying to find the cause of a problem, others may just reboot the server and close the call. Obviously, this can lead to a problem(for me), especially if the problem reoccurs on a day where I'm assisting support. I've lost count of the number of memory leaks or configuration changes I've made after spending the time to understand and diagnose a problem that otherwise seems to have been bouncing around support for days or maybe months.

It then got me thinking about how often my phone rings during the day. Because I actually spend time diagnosing problems it gives me a greater understanding of how things work and therefore better placed to answer specific configuration or scalability questions. I also have a good memory which means throughout the day I am asked what is the IP address of this or how is this set up etc. Answering these questions actually further cements this into memory and the circle continues!

I have lots of ideas of how to improve things but a lot of these need time to be researched and implemented properly. However, I feel as though I spend the majority of my time helping others with their problems or answering questions to things that have been said and documented hundreds of times before!

I love working in IT but some days I feel like I haven't done anything because I have spent more time helping others than doing any tangible work myself. It does sometimes make you think, what is the incentive to be helpful and do a good job? The people at the top are blind to this because its not quantifiable, ie spending 60 minutes now to understand something can save you a lot of time in the future.

It's always easier to ask the guy with the good memory, than to spend some time finding out something for yourself. I'm sure this is true for almost every profession.

Monday 22 February 2010

Complete server restore to different hardware using BackupExec 11d SP5 and Windows 2003 R2 SP2

I haven't mentioned it earlier, but I work for a relatively small IT company and so alot of things are done on a finite budget with a finite amount of time. This means that things are not always documented and servers are not always specced as well as they should be. It also means things like disaster recovery are just well, overlooked. The powers that be don't see £'s from disaster recovery planning, therefore it just never really happens.

When a Terminal Server, Mail Server, Database Server and Print Server failed to boot recently it was up to me to try and pick up the pieces. It was pretty clear from the off that things were not in a good state. Allthough the server was RAID 5 there was no MBR and after booting into the Windows Recovery and running fixmbr there was still no joy. I knew the previous nights backup was good and decided to take the plunge and perform a complete server restore.

The server was a Fujitsu Econel 200 and we had a spare Fujitsu TX150 S5, clearly not the same hardware. The restore was pretty tedious so I decided to blog about it here.

Here is what I did. Please note, I do not take responsibility for anything that goes wrong after following these steps.

  • Partition the HDD the same and install the same version of Windows (R2 etc) and update to the same service pack. Also give the server the same name because when you restore with BackupExec later it looks at the name of the server. Also if possible write protect the backup tape/cartridge, just in case!
  • Install BackupExec (11d in this case) but make sure you CHANGE the install path. Choose something like C:\Program Files\SymantecTemp
  • I updated BackupExec to the latest version (SP5 at time of writing), just in case there were any restore bugs that may have bitten me.
  • The backup device was a Tandberg RDX and in BackupExec you have to create a Backup to Disk device. I recreated this and pointed it at my B2D folder.
  • Once you have added the B2D folder, perform an inventory so BackupExec queries the backups.
  • By default BackupExec splits the "Media" into 1GB files and so after the inventory I was left with lots of "Media" in my B2D media set.
  • This part was very tedious. I couldn't find a way to associate these individual "Media" with a specific backup so I had to select them all, right click and select Catalog Media. I had to wait quite a while for BackupExec to go through each one and work its magic. Don't be too alarmed if alot of them fail, they did for me.
  • I then selected New restore job using wizard, click Next and looked through each Media Label until I found the backup that I wanted to restore. Be aware that different drive letters and the System State may appear in Media with different labels.
  • Click Next and if you are not sure of the logon credentials for restoring the data you can test them on this page. If you're confident click Next, give the restore job a name, select the relevant device, select Overwrite the file on disk and click finish to run the job now.
  • Allow the restore job to run (restore time varies greatly depending on the amount of data and type of backup device). When the job is complete BackupExec will prompt to restart the machine.
  • If you're lucky the server will boot. For me it didn't. I was greeted with :-
Windows could not start because the following file is missing or corrupt:
<Windows root>\system32\ntoskrnl.exe
Please reinstall a copy of the above file
  • My first thoughts were bugger, the hardware must be too different. Then I thought, no, a too bigger hardware difference is likely to manifest as a BSOD. A little bit of research led me to believe the boot.ini file must be different between the servers. To get around this, you will need your Windows 2003 installation media. Boot from the CD and after the drivers have loaded press R to enter the recovery console.
  • If the server is relatively recent it is likely that Windows has not got the correct drivers for your RAID/SATA controller. If this is the case download and install nLite. This app is superb and amongst other things allows you to slipstream service packs and drivers into a windows install. Copy your Windows Installation CD to a directory on your machine, open nLite, point it to your Windows installation. Follow the wizard and point nLite to the drivers for your RAID controller etc and then either create a new ISO or burn the modified OS directly. It is so straight forward that I'm not going to bother describing the process here.
  • Logon to your Windows installation using the admin password from the original server. Then run bootcfg /rebuild. This command took a few minutes to finish but when complete it should find your Windows installation (probably C:\WINDOWS) and ask if you want to add this installation to the boot list. Press Y and enter. For the load identifier enter something like "Windows 2003 Standard Edition R2" and for OS Load Options enter the default "/fastdetect"
  • Type exit, the server should reboot and hopefully load into Windows. How windows acts now is highly dependent on how different the hardware is from the original installation. I was lucky because the server booted up albeit very slowly. I disabled some hardware specific services, installed a Chipset driver, checked the Event Viewer and everything seemed OK. After another reboot the server was as good as the original!
  • If you want to free up some hard disk space you are free to delete C:\Program Files\SymantecTemp that we created earlier. The restored OS knows nothing about this install because we have restored the System State.
Hopefully, if you are reading this I have helped you restore a stricken server. If not maybe you have learnt something new.

I am aware this is not an exhaustive step by step guide to restoring a server and I'm sure this procedure could fall down in lots of other places. If and when I experience these different scenarios it is likely that I will update this post. Sometime in the future this may become a very useful resource.

Regards

Monday 15 February 2010

Enable local relay on a Microsoft Exchange 2007 Server

We have an application that sends email by relaying through an SMTP server and unfortunately its quite basic and so you cannot specify any logon credentials. Therefore I needed to allow the application to relay through a locally running Microsoft Exchange Server. This was the first time I've used Microsoft Exchange 2007, but I thought this should be easy as I knew how to do it on Microsoft Exchange 2003. How wrong was I! This is when working in IT becomes really frustrating, when things appear to be changed just for the sake of it with no apparent improvement in functionality. An hour or so later I had the solution, which was alot more long winded than I was expecting.

First open the Exchange Management Console, expand server configuration and click on hub transport. On the right hand side click New Receive Connector and a New SMTP Receive Connector wizard will open. Give the connector a name and leave Select the intended use for this Receive Connector set to Custom. If the server is multi-homed set the next page so the connector is only listening on the LAN adapter. The next part is important because you want to restrict relaying as much as possible. In this case it is a single IP address so the Start and End IP address will be the same. 127.0.0.1 didn't appear to work for me, so I used the LAN IP address of the server. Click Next and then New to create the new connector.

We now need to configure authentication parameters for this connector. Highlight the newly created connector and click on properties. Leave the Authentication Tab at defaults (Transport Layer Security Ticked) and the click on the permission group tab and ensure only Anonymous users is ticked.

Anonymous users are not granted the relay permission by default. Run the following command in the Exchange Shell but replace *NAME* with the name of the Receive Connector created earlier.

Get-ReceiveConnector "*NAME*" | Add-ADPermission -User "NT AUTHORITY\ANONYMOUS LOGON" -ExtendedRights "ms-Exch-SMTP-Accept-Any-Recipient"

Thats it, you should now be able to relay locally, which you can test using telnet. When Server applications are supposed to be moving forward I find it absolutely incomprehensible that an Admin needs to go through this process to configure relaying.

Regards

Tuesday 9 February 2010

After install of Acronis Backup and Recovery 10 System Event Log is full of Distributed COM errors from user Acronis Agent User.

I had a problem after installing Acronis Backup and Recovery 10 whereby the System Event Log was filling with Distributed COM errors.

This is caused by the Acronis Agent User not having Local Activation permission for the relevant component service. To resolve this issue click on Start, Run, dcomcnfg and press enter.

Expand Component Services, Computers, My Computer, DCOM Config and scroll down until you find the entry {9730B9A2-1CDF-11D2-950E-0000E817385C}

Right click on this entry and click properties, click on the security tab. In the Launch and Activation Permissions area click the radio button on customize and click Edit. Click Add and then select the Acronis Agent User and click on OK.

Tick Allow for Local Launch and Local Activation. Ok through the windows and the error should now stop being logged in the system event log.

This was on a Domain Controller so not sure if running dcpromo after installing Acronis is causing this.

Sunday 31 January 2010

HP 1505n and Windows Server 2003 R2

I had a problem getting this printer to work a few months ago and after recently explaining this to someone else with the same issue I've decided to make a post about it here. If you download the HP LaserJet Hostbased Plug and Play Basic Driver or the HP LaserJet Full Feature Software and Driver from the HP website and use it to install the printer everything appears to work fine but then after a few prints it stopped working. In typical HP driver fashion it also sometimes takes the print spooler with it!

I've only seen this happen on Windows 2003 Terminal Servers so I'm not sure if the cause of the problem is when a Standard User tries to print. After looking at it for a while I decided to try the HP LaserJet PCL5 Basic Driver listed below which appeared to cure the problem. Hopefully if anyone has the same problem then this post may save them. This driver shows with the following name:


Regards

Monday 18 January 2010

Windows 2003 Terminal Server doesn't decrement Per User license count

This had me going for a while so thought I'd post about it here. I was asked to add some more licenses to one of our Terminal Services and decided to check Terminal Services Licensing Manager (licmgr) to double check the current license situation. There are 5 terminal services Per User licenses but I noticed that the number issued was 0.

At the time there were 5 users connected so I decided to make another connection and I was allowed to exceed the number of licenses! I didn't think this was possible and it wasn't anything I'd really paid attention to. However after looking over at Microsoft's support site I found this:

Currently, Windows Server 2003 does not manage User CALs. This means that even though there is a User CAL in the license server database, the User CAL will not be decremented when it is used. This does not remove administrators from End User License Agreement (EULA) requirements to have a valid terminal server (TS) CAL for each user. Failure to have a User CAL for each user, if Device CALs are not being used, is a violation of the EULA.

It appears this is by design, which is something to be aware of if you're expecting Windows to alert you about any license shortfalls.

Wednesday 13 January 2010

Emails arriving delayed or not at all and a Netgear Router

I had a strange problem at work yesterday that took me a good few hours to solve and I'm posting it here to try and potentially help others. We receive ALOT of spam and have a dedicated ADSL line to cope with the volume. Every now and again we receive an email where the sent time is an hour or two before it arrives in a users inbox. I'd never thought anything of it and just assumed it got caught up in the myriad of spam.

However, more recently the time differences have got larger and some emails have not arrived at all, causing some of our customers to get slightly alarmed after receiving a bounce back from us! I decided to bury my head into the problem and try and find a cause. After scouring the mail server (CentOS 5 + Kerio Mailserver) and checking the bandwidth usage, nothing really seemed to be at stretching point. I could sometimes reproduce the problem by merely telnetting to the server on port 25, where I would get an initial response but then the connection would just hang. Control + C was not successful in releasing the connection! After doing this and running tcpdump on the mail server I could see that I was never actually hitting the mail server, therefore it must be the router!

After a 5km jog I had a flash of inspiration, the router is a Netgear DG834G, which runs a mini version of Linux. A quick Google revealed you can enable telnet by browsing to the router with the following URL http://RouterIP/setup.cgi?todo=debug. After logging in you should see a web page with Enable Debug.

I then telnetted to the router (no user or password required) and checked /proc/sys/net/ipv4/netfilter/ip_conntrack_max as I knew this can be limiting factor. It was set to 2048. I then looked at the live ip_conntrack in /proc/net and could see it was full of UDP connections to OpenDNS. The ip_conntrack was getting full of UDP connections from all the DNS lookups to Spamcop etc. I was now full of hope so decided to lower ip_conntrack_udp_timeout from 60 to 10 and raise ip_conntrack_max to 4096

echo 4096 > /proc/sys/net/ipv4/netfilter/ip_conntrack_max

echo 10 > /proc/sys/net/ipv4/netfilter/ip_conntrack_udp_timeout


I checked the Kerio Mailserver which was still resolving properly and so decided to leave it at that for the following day. A full day has passed and all mail seems to be arriving as normal and we have received no more complaints. Hopefully this will solve the problem and if anything else arises from this I will post an update.

Over a month has passed and we have been problem free. This was definitely one of the more rewarding fixes!
Regards

Monday 11 January 2010

Symbol / Motorola LS2208 keyboard wedge barcode scanner issues

I had the pleasure of installing 5 Symbol / Motorola LS2208 keyboard wedge scanners. It was a job that I was expecting to take under an hour and the first three where installed in about 20 minutes and I was feeling confident it would all be over soon!

The final two scanners, where being installed in a different department and I strolled confidently into the room. I connected up a scanner, configured it using the provided configuration sheet and rebooted the PC. After a reboot neither the mouse or keyboard were working, which left me slightly perplexed considering how quickly I had installed the previous three. Nevermind, I'll try the other one. Exactly the same. After trying numerous configurations, checking BIOS settings and cables I had wasted about 40 minutes and was still confused.

I noticed both keyboards were identical so decided to try a spare one I had, rebooted the PC and it worked! From what I can gather some keyboards must not provide enough power, or for some other reason do not like being plugged in via a keyboard wedge. Either way it's another bit of information confined to the memory banks.

Regards

Sunday 10 January 2010

Iomega REV Drives format from the command line in Windows

Iomega REV drives are supposed to be a replacement for tape and we have installed many of these in servers we have deployed. From my experience they seem to be really unreliable and if we are not swapping drives out we are implementing workarounds for sporadic backup failures.

The backup success rate appears to be dramatically improved by formatting the REV drive before a scheduled bakup. This can either be done via a scheduled task or in the pre backup command for the backup software that is used.

First of all you need the Rev System Software which can be downloaded fom here.

Secondly run the following from a command prompt or a batch file:

"C:\Program Files\Iomega\REV System Software\ImDrvCLI.exe" /Drive=E: /Format

Obviously replace the drive letter with one that corresponds to your particular setup.

Regards

Server 2008 TS Licensing Diagnosis has not discovered any license servers

I came across an interesting problem a few months ago with a Terminal Server I was configuring. After installing the necessary roles via Server Manager and activating my Terminal Server licenses I noticed that TS Licensing Diagnosis said it could not discover any license servers.

Naturally I googled the problem and couldn't find anything of any use, so, for the first time I decided to phone Microsoft. Now, Microsoft's support were exceptional but after approximately 8 hours on the phone we were still no closer to finding a solution! Microsoft did say something interesting regarding the server manufacturers installation media which got me thinking that maybe it was a manufacturer specific problem. The server was a Fujistu TX200 S4 so I decided to do a fresh install with an original Windows Server 2008 installation DVD.

After going through the same procedure, voila, TS Licensing Diagnosis was correctly discovering my license server! I now knew it was something specific to Fujitsu's install DVD. I opened add/remove programs and decided to uninstall Serverview_Agents, went back to TS Licensing Diagnosis and the server could now discover the license server!

Apparently this is a Microsoft bug and KB977686 has been assigned but we are still awaiting a patch. More information can be found in the following threads, both of which I started:

Technet

Fujitsu

UPDATE:

Finally Microsft have released a hotfix for this problem, which I have installed and tested successfully.

http://support.microsoft.com/kb/977686