Monday, 25 October 2010

VACUUM FULL crashes PostgreSQL server on Windows.

We use PostgreSQL server for our bespoke software application at work and we run it on both Windows or Linux depending on our clients requirements or current configuration. We do, however, prefer to run it on Linux. I recently installed it on a Windows 2003 Server for one of our customers and all was working well. As part of our install we run some overnight batch files that perform a full backup (pg_dump), a VACUUM FULL and a REINDEX. We have done this numerous times before and never had a problem. After all the customers data had been imported, the following day I received an urgent email saying they could no longer access the system. I remotely accessed the server and could see the Postgres service was no longer running so I started it. I looked at the log file and could see it had failed on the VACUUM FULL with a rather worrying access denied messages:-

127.0.0.12010-10-17 19:42:39 BST ERROR: could not truncate relation 1663/16410/16684 to 18544 blocks: Permission denied
127.0.0.12010-10-17 19:42:39 BST STATEMENT: VACUUM FULL;
127.0.0.12010-10-17 19:42:39 BST PANIC: cannot abort transaction 227697, it was already committed


This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

2010-10-17 19:42:39 BST LOG: server process (PID 6764) exited with exit code 3
2010-10-17 19:42:39 BST LOG: terminating any other active server processes
127.0.0.12010-10-17 19:42:39 BST WARNING: terminating connection because of crash of another server process
127.0.0.12010-10-17 19:42:39 BST DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.


I was alarmed that a VACUUM FULL command could cause this sort of message so I set out to find the cause. After some research it transpired that this may be caused by Antivirus installed on the server. We were running AVG 9.0 so I added the PostgreSQL directory to the Resident shield exclude list. The following evening I ran a VACUUM FULL command but this time with Process Monitor running. Process Monitor is an excellent tool from Sysinterals\Microsoft. Again the PostgreSQL server crashed with a similar access denied message so now I had to scour the Process Monitor output to try and find the cause. After a search for the file which had shown in the logs I came accross the following:-



I wasn't sure what a USER MAPPED FILE meant and a quick Google didn't reveal a great deal of information, but there was a post that mentioned AVG. However, it also mentioned that placing the directory in the resident shield exclude list had solved their particular problem, which was not the case here. Further perusing of the Process Monitor log didn't reveal any more information until I realised there was a Filter, Enable Advanced Output option. After ticking this and searching for the file name 16684 again I came up with the following:-



avgchsvx.exe is AVG's Caching server which apparently dramatically increases performance. I decided to disable this feature in AVG's Tools, Advanced settings to see if the problem would go away. I have since been able to run VACUUM FULL on my database server so it looks like this was the cause of the problem. This is a good example of using Process Monitor to diagnose any potential file access issues.

Hopefully this may help someone suffering the same problem.