News:

RAID is not a replacement for a backup! Here's why.

Main Menu

"disk writing error" on TeraStation Pro

Started by sreiner, September 10, 2009, 08:50:45 AM

Previous topic - Next topic

sreiner

   

My equipment:  TeraStation Pro II (model TS-HTGL/R5 F/W 1.26)

The unit is configured as a 3TB RAID-5 set, with four physical 1TB disks.

 

Yesterday, while writing some large (backup) files to our TeraStation Pro, it began generating quite a large number of email notification messages about disk errors.  The unit generated approx 40 email messages yesterday.  We've had this unit for over a year and have never seen this happen before.  Most of the email messages looked like this, although several of the disks were involved:

 

DISK Error Notification

HDD error occured

Disk(s) the error occurred:Disk 4

(sdd) READ sector:592712 count:11

Disk writing error

RAID drive error will be repaired automatically.

Continuous Back-up is recommended.

 

[TeraStation PROInformation]

TeraStation PROName: CM2BUFFALO (TS-HTGL/R5)

Time: 2009/09/09 16:17:38

Setting Screen: http://10.40.44.199/" target="_blank">http://10.40.44.199/

 

 

The daily activity report for this TeraStation looked like this:

 

Activity Report

 

[TeraStation PROInformation]

TeraStation PROName: CM2BUFFALO

Time: 2009/09/10 00:00:02

IP Address: 10.40.44.199

Setting Screen: http://10.40.44.199/

Running Time : 371 days, 07:24:47

 

[HDD Usage Status]

RAID Array 1 Usage Rate : 1709330840 kbytes / 2926654528 kbytes (Usage Rate 58%)

[DISK error status]

DISK1 15

DISK2 12

DISK3 0

DISK4 13

 

As you can see, there were errors that occurred on three of the four disks.

 

On the unit's web page, it indicates the following:

  System performance is decreasing now.

 

 

Looking at the unit's system log, here are the messages from yesterday:

 

Sep 9 07:58:20 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845477 1

Sep 9 07:58:25 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845488 2

Sep 9 08:01:03 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845477 3

Sep 9 08:01:08 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845488 4

Sep 9 08:02:16 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845493 5

Sep 9 08:05:07 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845493 6

Sep 9 08:07:26 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845488 7

Sep 9 08:09:40 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845488 8

Sep 9 08:14:33 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845480 9

Sep 9 08:14:38 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845488 10

Sep 9 08:15:04 CM2BUFFALO TeraStation: WARNING I/O count 10 sda

Sep 9 08:16:14 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845496 11

Sep 9 08:17:54 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845496 12

Sep 9 08:19:00 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845496 13

Sep 9 08:19:56 CM2BUFFALO kernelmon: cmd=ioerr sda READ 845496 14

Sep 9 08:45:35 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 845477 1

Sep 9 08:47:46 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 845477 2

Sep 9 08:50:38 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 845477 3

Sep 9 08:53:49 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 845477 4

Sep 9 09:18:46 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 845477 5

 

Yesterday afternoon, we decided to run a "RAID scan".

That produced the following: 

 

Sep 9 16:13:38 CM2BUFFALO TeraStation PRO[16745]: [Web] RAID fail shutdown status was changed

Sep 9 16:13:38 CM2BUFFALO TeraStation PRO[16745]: [Web] Change value : info.raidfail_shutdown=off

Sep 9 16:13:39 CM2BUFFALO mdscan: mdscan start

Sep 9 16:13:40 CM2BUFFALO kernelmon: cmd=raidscan 0 1 1

Sep 9 16:14:52 CM2BUFFALO kernelmon: cmd=raidscan 1 1 2

Sep 9 16:14:59 CM2BUFFALO kernelmon: cmd=raidscan 2 1 3

Sep 9 16:15:45 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751264 1

Sep 9 16:16:06 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751269 2

Sep 9 16:16:33 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751264 3

Sep 9 16:16:38 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751264 4

Sep 9 16:16:42 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751269 5

Sep 9 16:16:47 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751269 6

Sep 9 16:16:51 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841680 6

Sep 9 16:16:54 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841685 7

Sep 9 16:16:57 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 845488 8

Sep 9 16:17:01 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841685 9

Sep 9 16:17:04 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841685 10

Sep 9 16:17:07 CM2BUFFALO TeraStation: WARNING I/O count 10 sdb

Sep 9 16:17:17 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592712 7

Sep 9 16:17:23 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592719 8

Sep 9 16:17:28 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592728 9

Sep 9 16:17:31 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592712 10

Sep 9 16:17:34 CM2BUFFALO TeraStation: WARNING I/O count 10 sdd

Sep 9 16:17:38 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592712 11

Sep 9 16:17:41 CM2BUFFALO kernelmon: cmd=raidscan 0 0 2

Sep 9 16:17:45 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 1588493 18

 

What are we to make of this?

Do we have several of the disks going bad at once?

Or, does it point to a RAID controller problem?  Or something else?

Any ideas are welcomed.

Thanks very much.

  -- Steve

 

Steve Reiner

Logan Aluminum, Inc.

Russellville, KY

 

 

 


Dustrega

In the Web UI does the RAID integrity show as okay? Either way I would perform a RAID check under the Disk Management->RAID Setup section. I will check into other possibilities but the kernelmon line would lead me to believe that there might be a problem with the firmware partition on the HDDs. I'll see what I can find for ya.

sreiner

   

Hello -

 

Thanks for your post.

 

RAID status presently shows "Normal". As far as I know, it's always showed normal, even during/after the episodes of errors.

Yesterday, we did initiate a RAID scan... not sure what it does, but we started it anyway.

Got several emails about the scan(s) starting (on md0, md1, and md2), and then a couple about scan completions.

I couldn't find an email saying that the scan on md1 completed, but maybe I missed it.

 

Here is the content of the system log for the period when the RAID scan was started & running.

We didn't explicitly change the "RAID fail shutdown status" -- but maybe upon completion of the scan, it sets the auto shutdown to "on"?

 

Sep 9 16:13:38 CM2BUFFALO TeraStation PRO[16745]: [Web] RAID fail shutdown status was changed

Sep 9 16:13:38 CM2BUFFALO TeraStation PRO[16745]: [Web] Change value : info.raidfail_shutdown=off

Sep 9 16:13:39 CM2BUFFALO mdscan: mdscan start

Sep 9 16:13:40 CM2BUFFALO kernelmon: cmd=raidscan 0 1 1

Sep 9 16:14:52 CM2BUFFALO kernelmon: cmd=raidscan 1 1 2

Sep 9 16:14:59 CM2BUFFALO kernelmon: cmd=raidscan 2 1 3

Sep 9 16:15:45 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751264 1

Sep 9 16:16:06 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751269 2

Sep 9 16:16:33 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751264 3

Sep 9 16:16:38 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751264 4

Sep 9 16:16:42 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751269 5

Sep 9 16:16:47 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 751269 6

Sep 9 16:16:51 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841680 6

Sep 9 16:16:54 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841685 7

Sep 9 16:16:57 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 845488 8

Sep 9 16:17:01 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841685 9

Sep 9 16:17:04 CM2BUFFALO kernelmon: cmd=ioerr sdb READ 841685 10

Sep 9 16:17:07 CM2BUFFALO TeraStation: WARNING I/O count 10 sdb

Sep 9 16:17:17 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592712 7

Sep 9 16:17:23 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592719 8

Sep 9 16:17:28 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592728 9

Sep 9 16:17:31 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592712 10

Sep 9 16:17:34 CM2BUFFALO TeraStation: WARNING I/O count 10 sdd

Sep 9 16:17:38 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 592712 11

Sep 9 16:17:41 CM2BUFFALO kernelmon: cmd=raidscan 0 0 2

Sep 9 16:17:45 CM2BUFFALO kernelmon: cmd=ioerr sdd READ 1588493 18

Sep 10 19:30:44 CM2BUFFALO kernelmon: cmd=raidscan 2 0 0

Sep 10 21:54:02 CM2BUFFALO TeraStation PRO[17723]: [Web] RAID fail shutdown status was changed

Sep 10 21:54:02 CM2BUFFALO TeraStation PRO[17723]: [Web] Change value : info.raidfail_shutdown=on

 

Thanks for your help.

Regards,

Steve

 

Dustrega

If the scan on md1 possibly didn't complete I think that might be an early indication to a bad second HDD. Please give me a little bit of time to confirm this. Thank you.

Colin137

It looks like you have quite a few bad sectors on multiple drives. I recommend pulling as much information off, then replacing the drives. If you're still under warranty, you can call support to get an RMA.

Browser ID: smf (is_webkit)
Templates: 4: index (default), Display (default), GenericControls (default), GenericControls (default).
Sub templates: 6: init, html_above, body_above, main, body_below, html_below.
Language files: 5: index+Modifications.english (default), Post.english (default), Editor.english (default), Drafts.english (default), StopForumSpam.english (default).
Style sheets: 4: index.css, attachments.css, jquery.sceditor.css, responsive.css.
Hooks called: 187 (show)
Files included: 35 - 1354KB. (show)
Memory used: 1053KB.
Tokens: post-login.
Queries used: 16.

[Show Queries]