Wednesday, 13 January 2010

Emails arriving delayed or not at all and a Netgear Router

I had a strange problem at work yesterday that took me a good few hours to solve and I'm posting it here to try and potentially help others. We receive ALOT of spam and have a dedicated ADSL line to cope with the volume. Every now and again we receive an email where the sent time is an hour or two before it arrives in a users inbox. I'd never thought anything of it and just assumed it got caught up in the myriad of spam.

However, more recently the time differences have got larger and some emails have not arrived at all, causing some of our customers to get slightly alarmed after receiving a bounce back from us! I decided to bury my head into the problem and try and find a cause. After scouring the mail server (CentOS 5 + Kerio Mailserver) and checking the bandwidth usage, nothing really seemed to be at stretching point. I could sometimes reproduce the problem by merely telnetting to the server on port 25, where I would get an initial response but then the connection would just hang. Control + C was not successful in releasing the connection! After doing this and running tcpdump on the mail server I could see that I was never actually hitting the mail server, therefore it must be the router!

After a 5km jog I had a flash of inspiration, the router is a Netgear DG834G, which runs a mini version of Linux. A quick Google revealed you can enable telnet by browsing to the router with the following URL http://RouterIP/setup.cgi?todo=debug. After logging in you should see a web page with Enable Debug.

I then telnetted to the router (no user or password required) and checked /proc/sys/net/ipv4/netfilter/ip_conntrack_max as I knew this can be limiting factor. It was set to 2048. I then looked at the live ip_conntrack in /proc/net and could see it was full of UDP connections to OpenDNS. The ip_conntrack was getting full of UDP connections from all the DNS lookups to Spamcop etc. I was now full of hope so decided to lower ip_conntrack_udp_timeout from 60 to 10 and raise ip_conntrack_max to 4096

echo 4096 > /proc/sys/net/ipv4/netfilter/ip_conntrack_max

echo 10 > /proc/sys/net/ipv4/netfilter/ip_conntrack_udp_timeout


I checked the Kerio Mailserver which was still resolving properly and so decided to leave it at that for the following day. A full day has passed and all mail seems to be arriving as normal and we have received no more complaints. Hopefully this will solve the problem and if anything else arises from this I will post an update.

Over a month has passed and we have been problem free. This was definitely one of the more rewarding fixes!
Regards

2 comments:

  1. Thanks for providing such a nice blog about Netgear, it really helpful to us and get lots of information from it. For any technical issue resolution contact at 0800-090-3220 or visit Netgear Helpline Number UK

    ReplyDelete
  2. This is such a great resource that you are providing and you give it away for free. I love seeing the blog that understands the value. I'm glad to have found this post as its such an interesting one! I am always on the lookout for quality posts and articles so I suppose I'm lucky to have found this! I hope you will be adding more in the future… for any kind of Netgear support, you can call us 0800-090-3240 or visit Netgear phone number UK.

    ReplyDelete