News:

RAID is not a replacement for a backup! Here's why.

Main Menu

TS1400R not mounting

Started by andyinv, March 09, 2020, 05:36:39 AM

Previous topic - Next topic

andyinv

Morning(?) all

Got a TS1400R who has decided not to mount array1 any more. No sign of why in the GUI. Full log is here: https://pastebin.com/GSQrtY66

Subset:
Mar  9 10:03:06 AA-NAS start_data_array.sh:  -- Mount local disks --
Mar  9 10:03:06 AA-NAS start_data_array.sh: array1 is raid5 : try to mounting...
Mar  9 10:03:06 AA-NAS start_data_array.sh: array1 is not encrypted
Mar  9 10:03:07 AA-NAS start_data_array.sh: mounting /dev/md10 to /mnt/array1
Mar  9 10:03:07 AA-NAS MountXFS: mount /dev/md10 fail.
Mar  9 10:03:07 AA-NAS MountXFS: mount /dev/md10 fail.
Mar  9 10:03:08 AA-NAS hdd_check_normal: /usr/local/bin/hdd_check_normal.sh : array1
Mar  9 10:03:09 AA-NAS MountXFS: mount /dev/md10 fail.
Mar  9 10:03:09 AA-NAS MountXFS: mount /dev/md10 fail.
Mar  9 10:03:09 AA-NAS start_data_array.sh: Failed to mount.
Mar  9 10:03:09 AA-NAS start_data_array.sh:  -- checkRaidStatus 1 /dev/md10 --
Mar  9 10:03:10 AA-NAS start_data_array.sh: array2 is off : skipping to mount ...
Mar  9 10:03:10 AA-NAS start_data_array.sh: disk1 is array1 : skipping to mount ...
Mar  9 10:03:10 AA-NAS start_data_array.sh: disk2 is normal : try to mounting...
Mar  9 10:03:10 AA-NAS start_data_array.sh: disk2 is not encrypted
Mar  9 10:03:10 AA-NAS start_data_array.sh: mounting /dev/md101 to /mnt/disk2
Mar  9 10:03:11 AA-NAS start_data_array.sh: Success to mount.
Mar  9 10:03:11 AA-NAS start_data_array.sh: disk3 is array1 : skipping to mount ...
Mar  9 10:03:11 AA-NAS start_data_array.sh: disk4 is array1 : skipping to mount ...
Mar  9 10:03:12 AA-NAS start_data_array.sh: recover_shareinfo_sub : checking disk1
Mar  9 10:03:12 AA-NAS start_data_array.sh: recover_shareinfo_sub : checking disk2
Mar  9 10:03:12 AA-NAS start_data_array.sh: Checking /mnt/disk2/spool
Mar  9 10:03:12 AA-NAS start_data_array.sh: recover_shareinfo_sub : checking disk3
Mar  9 10:03:13 AA-NAS start_data_array.sh: recover_shareinfo_sub : checking disk4
Mar  9 10:03:13 AA-NAS start_data_array.sh: recover_shareinfo_sub : checking array1
Mar  9 10:03:13 AA-NAS start_data_array.sh: Checking /mnt/array1/spool
Mar  9 10:03:13 AA-NAS start_data_array.sh: recover_shareinfo_sub : checking array2
Mar  9 10:03:15 AA-NAS liblvm: LibLvmShowDev_list>/dev/md101<>1920182080<><><>;
Mar  9 10:03:16 AA-NAS handle-mdadm-events: NewArray, /dev/md2,
Mar  9 10:03:17 AA-NAS handle-mdadm-events: NewArray, /dev/md1,
Mar  9 10:03:18 AA-NAS handle-mdadm-events: NewArray, /dev/md0,
Mar  9 10:03:19 AA-NAS handle-mdadm-events: NewArray, /dev/md101,
Mar  9 10:03:20 AA-NAS handle-mdadm-events: DegradedArray, /dev/md101,


What are my options for recovering this situation? I notice no telnet or ssh available on the box, so not like I can cajole it along from the command line..

Thanks

andyinv

#1
Managed to get onto it and found this. sdd is the hot spare.

[root@AA-NAS etc]# mdadm --examine /dev/sd[abcd]6
/dev/sda6:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aa2644ba:5eff5ef0:e661a761:9b97d1a8
           Name : TS1400R0DF:10
  Creation Time : Tue Jan 20 10:16:21 2015
     Raid Level : raid5
   Raid Devices : 4

Avail Dev Size : 3840364544 (1831.23 GiB 1966.27 GB)
     Array Size : 5760546240 (5493.69 GiB 5898.80 GB)
  Used Dev Size : 3840364160 (1831.23 GiB 1966.27 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 215991c4:940e86a3:292c4216:10d1b693

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Nov 18 13:22:49 2019
       Checksum : 82381d73 - correct
         Events : 72318

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 0
   Array State : A.A. ('A' == active, '.' == missing)
mdadm: No md superblock detected on /dev/sdb6.
/dev/sdc6:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aa2644ba:5eff5ef0:e661a761:9b97d1a8
           Name : TS1400R0DF:10
  Creation Time : Tue Jan 20 10:16:21 2015
     Raid Level : raid5
   Raid Devices : 4

Avail Dev Size : 3840364544 (1831.23 GiB 1966.27 GB)
     Array Size : 5760546240 (5493.69 GiB 5898.80 GB)
  Used Dev Size : 3840364160 (1831.23 GiB 1966.27 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 8e6857bc:48298be2:f8bf5174:1829c59b

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Nov 18 13:22:49 2019
       Checksum : 1f21336e - correct
         Events : 72318

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : Active device 2
   Array State : A.A. ('A' == active, '.' == missing)
/dev/sdd6:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : aa2644ba:5eff5ef0:e661a761:9b97d1a8
           Name : TS1400R0DF:10
  Creation Time : Tue Jan 20 10:16:21 2015
     Raid Level : raid5
   Raid Devices : 4

Avail Dev Size : 3840364544 (1831.23 GiB 1966.27 GB)
     Array Size : 5760546240 (5493.69 GiB 5898.80 GB)
  Used Dev Size : 3840364160 (1831.23 GiB 1966.27 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 7d767213:92a50d73:a258a420:2a8c08ab

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Nov 18 13:22:49 2019
       Checksum : c254b963 - correct
         Events : 72318

         Layout : left-symmetric
     Chunk Size : 64K

   Device Role : spare
   Array State : A.A. ('A' == active, '.' == missing)

andyinv

More:

[root@AA-NAS etc]# mdadm -D /dev/md10
/dev/md10:
        Version : 1.2
  Creation Time : Tue Jan 20 10:16:21 2015
     Raid Level : raid5
  Used Dev Size : 1920182080 (1831.23 GiB 1966.27 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Nov 18 13:22:49 2019
          State : active, FAILED, Not Started
Active Devices : 2
Working Devices : 3
Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           Name : TS1400R0DF:10
           UUID : aa2644ba:5eff5ef0:e661a761:9b97d1a8
         Events : 72318

    Number   Major   Minor   RaidDevice State
       0       8        6        0      active sync   /dev/sda6
       1       0        0        1      removed
       2       8       38        2      active sync   /dev/sdc6
       3       0        0        3      removed

       4       8       54        -      spare   /dev/sdd6

[root@AA-NAS etc]# cat mdadm.conf
ARRAY /dev/md10  metadata=1.2 UUID=aa2644ba:5eff5ef0:e661a761:9b97d1a8 name=TS1400R0DF:10
   spares=1

andyinv

Also:

[root@AA-NAS ~]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md10 : inactive sda6[0] sdd6[4](S) sdc6[2]
      5760546816 blocks super 1.2

md0 : active raid1 sda1[0] sdd1[3] sdc1[2]
      5000128 blocks [4/3] [U_UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md1 : active raid1 sda2[0] sdd2[4] sdc2[2]
      15991680 blocks super 1.2 [4/3] [U_UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

md2 : active raid1 sda5[0] sdd5[4] sdc5[2]
      3998656 blocks super 1.2 [4/3] [U_UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

unused devices: <none>

1000001101000

Was this a RAID5 with only 3 disks and now 1 has failed?

if so, this sounds pretty close to your issue:
https://serverfault.com/questions/676638/mdadm-drive-replacement-shows-up-as-spare-and-refuses-to-sync

andyinv

#5
Hi, yes it was a RAID5 with one hot spare (from what I was told - all this is being done remotely for a client, and it wasn't well documented. I haven't even seen the kit). Anecdotally, I heard it has had a faulty disk some time back that was replaced.

As above, mdadm --examine on the individual disks does show:
[root@AA-NAS md]# mdadm --misc --detail /dev/md10
/dev/md10:
        Version : 1.2
  Creation Time : Tue Jan 20 10:16:21 2015
     Raid Level : raid5
  Used Dev Size : 1920182080 (1831.23 GiB 1966.27 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Mon Nov 18 13:22:49 2019
          State : active, FAILED, Not Started
Active Devices : 2
Working Devices : 3
Failed Devices : 0
  Spare Devices : 1

         Layout : left-symmetric
     Chunk Size : 64K

           Name : TS1400R0DF:10
           UUID : aa2644ba:5eff5ef0:e661a761:9b97d1a8
         Events : 72318

    Number   Major   Minor   RaidDevice State
       0       8        6        0      active sync   /dev/sda6
       1       0        0        1      removed
       2       8       38        2      active sync   /dev/sdc6
       3       0        0        3      removed

       4       8       54        -      spare   /dev/sdd6


But, all scans and probes of that disk show it is there. And the array uuid shows it belongs to md10.

I checked out that link, and there's no sync_action file, but there is an array-state which coresponds to the mdstat output:


[root@AA-NAS md]# pwd
/sys/block/md10/md
[root@AA-NAS md]# cat array_state
inactive


This however appears to be the issue - it looks like the array rebuild has been stuck since November(!), I'm guessing this is when the disk was replaced

[root@AA-NAS md]# cat /var/log/raidsync|egrep -i "rebuild|State :|Update"
    Update Time : Wed Nov  6 16:54:19 2019
          State : active, degraded, recovering
Rebuild Status : 0% complete
       4       8       53        3      spare rebuilding   /dev/sdd5
    Update Time : Wed Nov  6 16:54:46 2019
          State : active, degraded, recovering
Rebuild Status : 9% complete
       4       8       53        3      spare rebuilding   /dev/sdd5
    Update Time : Wed Nov  6 16:54:57 2019
          State : active, degraded, recovering
Rebuild Status : 13% complete
       4       8       53        3      spare rebuilding   /dev/sdd5
    Update Time : Wed Nov  6 16:57:03 2019
          State : active
    Update Time : Wed Nov  6 16:57:46 2019
          State : active
    Update Time : Wed Nov  6 17:01:56 2019
          State : active
    Update Time : Mon Nov 18 08:55:44 2019
          State : active, degraded, resyncing (DELAYED)
       5       8       21        1      spare rebuilding   /dev/sdb5
    Update Time : Mon Nov 18 08:56:09 2019
          State : active, degraded, recovering
Rebuild Status : 0% complete
       5       8       21        1      spare rebuilding   /dev/sdb5
    Update Time : Mon Nov 18 08:56:30 2019
          State : active, degraded, recovering
Rebuild Status : 1% complete
       5       8       21        1      spare rebuilding   /dev/sdb5
    Update Time : Mon Nov 18 08:59:26 2019
          State : active
    Update Time : Mon Nov 18 09:00:05 2019
          State : active
    Update Time : Mon Nov 18 09:04:12 2019
          State : active
[root@AA-NAS md]# mdadm --monitor md10
Mar 12 10:14:52: DeviceDisappeared on md10 unknown device


parted won't even speak to sdb..

[root@AA-NAS ~]# parted /dev/sda
GNU Parted 3.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: ATA ST2000DM001-1ER1 (scsi)
Disk /dev/sda: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
1      1049kB  5121MB  5120MB  ext3         primary
2      5121MB  21.5GB  16.4GB               primary
3      21.5GB  21.5GB  1049kB               primary  bios_grub
4      21.5GB  21.5GB  1049kB               primary
5      21.5GB  25.6GB  4097MB               primary
6      25.6GB  1992GB  1966GB               primary

(parted) q
[root@AA-NAS ~]# parted /dev/sdb
Error: Error opening /dev/sdb: No such device or address

andyinv

#6
[root@AA-NAS ~]# mdadm --stop /dev/md10
mdadm: stopped /dev/md10
[root@AA-NAS ~]# mdadm --assemble --force /dev/md10 /dev/sda6 /dev/sdd6 /dev/sdc6
mdadm: /dev/md10 assembled from 2 drives and 1 spare - not enough to start the array


Oh crap... (rubber-ducking in real time happening here, folks!)

On plus side, ejected and re-inserted sdb, and it's holding steady. WTH is going on?

[root@AA-NAS md]# parted /dev/sdb
GNU Parted 3.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: ATA ST2000DM008-2FR1 (scsi)
Disk /dev/sdb: 2000GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name     Flags
1      1049kB  5121MB  5120MB               primary
2      5121MB  21.5GB  16.4GB               primary
3      21.5GB  21.5GB  1049kB               primary  bios_grub
4      21.5GB  21.5GB  1049kB               primary
5      21.5GB  25.6GB  4097MB               primary
6      25.6GB  1992GB  1966GB               primary


But...

[root@AA-NAS md]# mdadm --assemble --force /dev/md10 /dev/sda6 /dev/sdd6 /dev/sdc6 /dev/sdb6
mdadm: /dev/sda6 is busy - skipping
mdadm: /dev/sdd6 is busy - skipping
mdadm: /dev/sdc6 is busy - skipping
mdadm: no recogniseable superblock on /dev/sdb6
mdadm: /dev/sdb6 has no superblock - assembly aborted

[root@AA-NAS md]# mdadm -E /dev/sdb6
mdadm: No md superblock detected on /dev/sdb6.



Browser ID: smf (is_webkit)
Templates: 4: index (default), Display (default), GenericControls (default), GenericControls (default).
Sub templates: 6: init, html_above, body_above, main, body_below, html_below.
Language files: 5: index+Modifications.english (default), Post.english (default), Editor.english (default), Drafts.english (default), StopForumSpam.english (default).
Style sheets: 4: index.css, attachments.css, jquery.sceditor.css, responsive.css.
Hooks called: 275 (show)
Files included: 35 - 1354KB. (show)
Memory used: 1073KB.
Tokens: post-login.
Queries used: 15.

[Show Queries]