Friday, December 17, 2010

Faulty Physical Ram

Some times it's very difficult to find out exact issue. Specially When it's related with hardware. Similar scenario, i faced with a client. I have been provided with a new box to setup mysql server. After setup mysql along with other application, mysql frequently goes down without any comment in mysql error log file. Spending few days verifying os, logs, mysql and later i found the culprit using memtester tool. Thanks to memtester tool.
[root@voice ~]# memtester 5 1
memtester version 4.1.2 (32-bit)
Copyright (C) 2009 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 5MB (5242880 bytes)
got  5MB (5242880 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address       : ok      
Random Value        : ok
Compare XOR         : ok
Compare SUB         : ok
FAILURE: 0x7888cfc4 != 0xe088cfc4 at offset 0x00039fba.
Compare MUL         : FAILURE: 0x00000001 != 0x00000002 at offset 0x00039fba.
Compare DIV         : FAILURE: 0x7fff9d53 != 0x7fff9d52 at offset 0x00039fba.
Compare OR          : FAILURE: 0x77fb0403 != 0x77fb0402 at offset 0x00039fba.
Compare AND         :   Sequential Increment: ok
Solid Bits          : ok      
Block Sequential    : ok      
Checkerboard        : ok      
Bit Spread          : ok      
Bit Flip            : ok      
Walking Ones        : ok      
Walking Zeroes      : ok      

Done.
[root@voice ~]#

After replacing faulty ram with a new one. Things looks good and working fine.

[root@voice ~]# memtester 5 1
memtester version 4.1.2 (32-bit)
Copyright (C) 2009 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 5MB (5242880 bytes)
got  5MB (5242880 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address       : ok      
Random Value        : ok
Compare XOR         : ok
Compare SUB         : ok
Compare MUL         : ok
Compare DIV         : ok
Compare OR          : ok
Compare AND         : ok
Sequential Increment: ok
Solid Bits          : ok      
Block Sequential    : ok      
Checkerboard        : ok      
Bit Spread          : ok      
Bit Flip            : ok      
Walking Ones        : ok      
Walking Zeroes      : ok      

Done.
[root@voice ~]#

3 comments:

  1. Ever consider ECC memory for your important applications?

    ReplyDelete
  2. I think that production servers should use ECC memory. Otherwise, bad ram will sooner or later corrupt important data.

    Unfortunately even ECC ram can't protect against either faulty cache SRAM, or other faulty components corrupting data.

    Sooner or later, in production, your data will get corrupted due to bad hardware. At least, that is my experience.

    As data become very large, only a minor hardware fault is required to break it (a bit)

    ReplyDelete
  3. Good tip. I have a web forms from php forms and sometimes I need this function to store ip's.

    ReplyDelete