檢查硬碟有無支援TLER/ERC/CCTL

首先,這也是方便我自行查詢的文章。這幾天一直在研究硬碟,其中TLER/ERC/CCTL功能實在是個頭疼的問題,其對於硬體式RAID是很重要的功能。其為判定硬碟是否失效的重要功能,只要超過硬碟S.M.A.R.T所設定時間,硬體式RAID控制會自動將硬碟判為失效,甚至自動退出該硬碟。如果是不支援TLER/ERC/CCTL功能之硬碟,由於其等於可允許失效時間為1~2秒,系統會立即判定該硬碟offline.

又由於RAID組成的硬碟大都是同時間購買,零件壽命也差不多在同一期間,所以如果在更換硬碟前後或重建系統時,同時間被判定失效硬碟數量超過可允許失效硬碟數量上限,那將會面臨整個RAID系統資料毀滅的慘況。

至於Linux軟體式RAID (mdadm),其Read Command Timer跟 Write Command Timer 之Timerout預設值為30秒,遠超過一般支援TLER/ERC/CCTL硬碟所預設的7~13秒。但超過時間的硬碟,將判定為失效,會將該硬碟離線(offline)。而S.M.A.R.T不支援TLER / ERC / CCTL的硬碟,在遇到Command Timerout, 系統也會立即判定失效。

然而在NAS廠商的硬碟相容表中,只要是硬體S.M.A.R.T不支援TLER/ERC/CCTL功能的硬碟,就肯定是不建議使用硬碟。
至於S.M.A.R.T有支援TLER/ERC/CCTL功能,但預設值為Disable或0 sec, 由於等於沒有Recovery Timer上限時間,所以會依照預設值30秒或廠商設定時間來判定是否失效。(HGST跟Toshiba一般硬碟均是預設值為Disable或0 sec, 所以其Recovery Timer為無上限。)

所以在挑選RAID用硬碟時,TLER/ERC/CCTL功能是非常重要的。避震/共振均可以用硬體(消振式機殼+避震消振墊)來解決。

WD跟Seagate在Desktop等級硬碟均已取消S.M.A.R.T支援TLER/ERC/CCTL功能,完全無法藉由S.M.A.R.T軟體開啟跟設定,所以即使買9顆貴到爆的WD黑標,其架設成RAID的危險性遠比紅標還高上許多。然而此兩品牌商業版硬碟,  Seagate的ERC time為10 sec, WD為7 sec。

至於Toshiba跟HGST, 即使是Desktop等級硬碟則依然支援ERC/CCTL功能,預設值皆為Disable或0 sec,可利用S.M.A.R.T軟體設定開啟以及設定秒數。所以NAS廠商均有將此兩廠商的Desktop等級硬碟列為建議。至於這Toshiba跟HGST的商業等級硬碟,其支援ERC/CCTL功能,但預設值皆為Disable或0 sec。(HGST原廠答覆:這樣回復時間才沒有上限,一般裝置皆應為Disable, 但特殊型號有做設定。)

雖然大約知道toshiba跟HGST的Desktop等級硬碟並沒有取消CCTL功能,但實際上接上電腦時,有啟動嗎? 真的有支援嗎? 這一直對我來說是個大問號? 總算在今天試出個結果跟方法來!並且也透過台灣代理威健詢問了HGST相關ERC timer的預設值問題。

現在總結以下心得,方便大家也一起自我檢查。

首先有幾個先決條件:
1. MB的BIOS有支援最新的ACHI或相關修正。(我的GA-990FXA-UD5, 要更新至F12,才有修正ACHI問題.),並且開啟S.M.A.R.T.
2. 你的SATA或IDE的驅動程式有支援開啟S.M.A.R.T。(我的GA-990FXA-UD5,使用AMD SATA Controller 1.2.1.331版驅動程式才開啟S.M.A.R.T。至於Marvel 88SE9172要更新到1.2.0.1020才能正常。)
3. 電腦安裝smartmontools, 請下載 http://sourceforge.net/projects/smartmontools/

安裝smartmontools的安裝檔時,windows 64bit的請點取64-bit version.
image

雖然還有一套HDAT2,但其要在純DOS執行,且如果不熟悉操作的話,可能會誤將硬碟資料刪除,所以在此不教。

  1. 你要有硬碟接在支援S.M.A.R.T的SATA或IDE埠上!

5.  TLER/ERC/CCTL 各廠牌有各自的稱呼,
TLER: WD
ERC: Seagate
CCTL: Hitalchi、HGST、Toshiba 、三星


再來就開始檢查

  1. 執行smartmontools 的smartctl(Admin CMD)

image

image

  1. 輸入smartctl –scan,  找出電腦的硬碟代號 (是連字號:兩個『-』scan, 不知道為何在WordPress會變成超長的破折號)
    C:\Program Files\smartmontools\bin>smartctl –scan

可以看到此電腦有四個硬碟, 分別是
/dev/sda (Hitachi HDS721010CLA332)
/dev/sdb (
ST3640323AS
)
/dev/sdc   (Toshiba 3.5″ HDD DT01ACA)
/dev/sdd (WDC WD10EARS-00Y5B1)

請依照圖內文字輸入
image

  1. 首先檢查/dev/sdc (Toshiba 3.5″ HDD DT01ACA)所有的S.M.A.R.T資訊, 輸入smartctl -a /dev/sdc

如果執行後,沒有跑出S.M.A.R.T訊息,或說明不支援S.M.A.R.T,請檢查RAID/IDE/AHCI Controller驅動程式是否為新版,以及檢查BIOS是否開啟S.M.A.R.T.

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5″ HDD DT01ACA…
Device Model: TOSHIBA DT01ACA200
Serial Number: 34VU2UVKS
LU WWN Device Id: 5 000039 ff3f5ae6d
Firmware Version: MX4OABB0
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jun 22 22:08:42 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (14344) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 239) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported. (這就是有支援TLER/ERC/CCTL功能)
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always –
0
2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline –
0
3 Spin_Up_Time 0x0007 136 136 024 Pre-fail Always –
268 (Average 287)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always –
18
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always –
0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always –
0
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline –
0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always –
3
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always –
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always –
17
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always –
18
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always –
18
194 Temperature_Celsius 0x0002 176 176 000 Old_age Always –
34 (Min/Max 25/37)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always –
0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always –
0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline –
0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always –
0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

  1. 再來檢查/dev/sdc (Toshiba 3.5″ HDD DT01ACA)的CCTL的運作狀態為何? 輸入smartctl -l scterc /dev/sdc
C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled

結果SCT Error Recovery Control是關閉Disable的。
(重點說明: HDD ERC的Disable不是指HDD的ERC功能關閉,而是HDD ERC限定時間無限,改由RAID控制器或 軟體RAID的ERC設定值來做判定是否要將HDD離線。下面還會有解釋。)

  1. 現在要開啟設定/dev/sdc (Toshiba 3.5″ HDD DT01ACA)的TLER功能容許上限時間, 輸入smartctl -l scterc,70,70 /dev/sdc
    (補充說明:其實這不是開啟ERC/TLER的功能,而只是設定ERC的容許上限時間。在HGST跟Toshiba的ERC HDD 做這個設定變更其實沒有任何意義,即使設定了,重開機就又回復為預設值。因為還有 RAID控制器跟軟體RAID的ERC可以決定HDD是否該離線的容許時間上限。)
C:\Program Files\smartmontools\bin>smartctl -l scterc,70,70 /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control set to:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

現在已經改成的7秒。(WD/Seagate企業級硬碟的TLER/ERC/CCTL回應時間大多設定在7~10秒左右。Toshiba/HGST則是預設Disable)

  1. 如果是不支援的硬碟,其S.M.A.R.T資訊為何?
    剛好我的電腦之dev/sdd WDC WD10EARS-00Y5B1(WD 黑標1TB), 就是徹底的不支援。
C:\Program Files\smartmontools\bin>smartctl -a /dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF)
Device Model: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV5A820423
LU WWN Device Id: 5 0014ee 2aeff8abf
Firmware Version: 80.00A80
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Sun Jun 22 22:13:02 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command
from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20460) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 236) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
(找不到SCT Error Recovery Control supported)

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always –
0
3 Spin_Up_Time 0x0027 148 128 021 Pre-fail Always –
5600
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always –
45
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always –
0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always –
0
9 Power_On_Hours 0x0032 066 066 000 Old_age Always –
25090
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always –
0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always –
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always –
40
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always –
22
193 Load_Cycle_Count 0x0032 003 003 000 Old_age Always –
591010
194 Temperature_Celsius 0x0022 114 098 000 Old_age Always –
33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always –
0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always –
0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline –
0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always –
0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline –
0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA
_of_first_error
# 1 Short offline Completed without error 00% 25029 –
# 2 Extended offline Completed without error 00% 24963 –
# 3 Conveyance offline Completed without error 00% 0 –

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

  1. 執行smartctl -l scterc /dev/sdd 檢查, 也是沒用,因為硬體就是不支援,無法由韌體或軟體開啟。所以這類硬碟就千萬不要用在RAID或NAS系統。
C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control command not supported
  1. 但就連我古老的Hitachi HDS721010CLA332有支援ERC\CCTL,預設是Disable, 也是可以利用輸入smartctl -l scterc,100,100 /dev/sda 變更。(原廠建議上限10 sec, 但最終建議為Disable.)
:\Program Files\smartmontools\bin>smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 7K1000.C
Device Model: Hitachi HDS721010CLA332

 

SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled
C:\Program Files\smartmontools\bin>smartctl -l scterc,100,100 /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control set to:
Read: 100 (10.0 seconds)
Write: 100 (10.0 seconds)

10.  /dev/sdb ST3640323AS也有支援ERC,預設也是Disable, 也是可以利用輸入smartctl -l scterc,100,100 /dev/sdb 開啟

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11
Device Model: ST3640323AS
Serial Number: 9VK05ZCK
LU WWN Device Id: 5 000c50 00d8a191c
Firmware Version: SD1B
User Capacity: 640,133,946,880 bytes [640 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm

 

SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled

 

C:\Program Files\smartmontools\bin>smartctl -l scterc,100,100 /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control set to:
Read: 100 (10.0 seconds)
Write: 100 (10.0 seconds)

但即使設定了,只要關機後,重新開機就沒了…

  1. 那HGST 企業級Ultrastar 7K4000 為何? 我擁有的型號是HGST HUS724020ALA640
C:\Program Files\smartmontools\bin>smartctl -a /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Device Model: HGST HUS724020ALA640
Serial Number: PN2131P6GMEKTP
LU WWN Device Id: 5 000cca 22dc8d601
Firmware Version: MF6OAA70
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jun 23 23:21:35 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command
from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 28) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 322) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always –
0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline –
80
3 Spin_Up_Time 0x0007 128 128 024 Pre-fail Always –
486 (Average 488)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always –
18
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always –
0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always –
0
8 Seek_Time_Performance 0x0005 145 145 020 Pre-fail Offline –
24
9 Power_On_Hours 0x0012 100 100 000 Old_age Always –
763
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always –
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always –
16
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always –
22
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always –
22
194 Temperature_Celsius 0x0002 153 153 000 Old_age Always –
39 (Min/Max 25/48)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always –
0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always –
0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline –
0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always –
0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

但結果HGST HUS724020ALA640的SCT Error Recovery Control竟然是Disable! 這對當時的我來說可真是晴天霹靂! 因為台灣的媒體一直在強調ERC time的重要性。

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled

並且用HDAT2 5.0檢查也一樣…Read Command Timer跟 Write Command Timer 皆為0…

20140624_00371020140624_00403320140624_004236

所以HGST 企業級Ultrastar 7K4000 硬碟難道不該用在硬體式RAID儲存上嗎?  (Intel/AMD chipset內建的可是硬體Raid。更不要說企業級專用的LSI RAID Controller.)

於是我就請台灣代理商威健的陳先生,請他幫忙詢問HGST原廠相關的問題。

請威健代詢HGST原廠的問題答覆

image

1. 為何同樣是企業級硬碟,Seagate Constellation ES.3其ERC皆為10 sec, WD為7 sec, 為何HGST則是為Disable?是獨有的優良傳統嗎?

=> If not default disable, what value should be set? Disable means no recovery time limit. (Disable就表示回復時間沒有上限.)

Of course, we have some unique p/n who have unique default value. But generic drive should be default disable.(然,HGST有特殊型號有特殊的設定值,但一般裝置的預設值則為Disable)

2. 依照86頁說明,其ERC Command規格上限時間為10 sec(跟Seagate Constellation ES.3  default value相同),請幫忙確認User是否可自行設定為此值?

=> No, only unique p/n support.(不,只有獨特的型號支援)

3. These command timers are volatile. The default value is 0 (i.e. disable command time-out). 這句話就已經明確表示,即使我自行使用S.M.A.R.T軟體設定command timers,只要冷開機,就是歸回default value. 原廠是否願意提供韌體,讓User開啟此功能?

(否則我幹嘛買企業級硬碟做RAID? 沒有做Enable the Error Recovery Control跟setup timer, Hardware RAID跟高階企業用NAS只要偵測到Disk Error(即使是誤判), 下一秒就是判定Disk fail跟Offline; 即使Reliability再好也沒用啊。)

=> Setting are volatile, so return to default (disable) by power cycle .Disable means no recovery time limit.

(其設定是揮發性的,所以只要電源關閉重啟,其就回到預設值,Disable就表示回復時間沒有上限.)

所以依照HGST原廠的意思:

一般裝置的預設值就應該是Disable

其才沒有回復時間上限

現在只能保佑QNAP的工程師的程式設計有將每一個支援CCTL的硬碟,將其設定Read Command Timer跟 Write Command Timer 之預設值為7秒或更長的時間。

Linux的軟體式RAID的SCSI Disk lyaer的command timeout預設設定為30秒。


TOSHIBA MG03ACA200 此硬碟為菲律賓製,跟日本限定版(MD03ACA200)幾乎同外型,但底部有明顯的不同。

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-win8] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MG03ACA200
Serial Number:    53OHKA0MF
LU WWN Device Id: 5 000039 4cbf02379
Firmware Version: FL1A
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul 03 22:17:59 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 337) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       –
0
2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      –
0
3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       –
6206
4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       –
3
5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       –
0
7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       –
0
8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      –
0
9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –
0
10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       –
0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –
3
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       –
0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       –
1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       –
4
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       –
32 (Min/Max 23/32)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –
0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       –
0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      –
0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       –
0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       –
0
222 Loaded_Hours            0x0032   100   100   000    Old_age   Always       –
0
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       –
0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       –
0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       –
103
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      –
0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1        0        0  Not_testing
2        0        0  Not_testing
3        0        0  Not_testing
4        0        0  Not_testing
5        0        0  Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-win8] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
廣告

檢查硬碟有無支援TLER/ERC/CCTL 有 “ 14 則迴響 ”

  1. 您好 我想請問一下 H3IK40003272SA 是否適合使用於 has 上呢 ?
    有支援 ERC/CCTL 嗎?
    Disabled 就是 時間無上限 意指 有支援嗎 ?
    因為沒用過 nas 最近買了一台 thecus n5550 硬碟 慢慢購置 目前 有一台h3ik40003272sa
    最近發現wd red 有TLER 心想 會不會完了 買錯 nas 又買錯hd
    看到您這篇 寫的很好 心中又燃起一絲絲希望 所以在此請教您一下

    1. Toshiba跟HGST, 即使是Desktop等級硬碟則依然支援ERC/CCTL功能(SCT Error Recovery Control supported.),預設值皆為Disable或0 sec.
      但由於h3ik40003272sa為Desktop級硬碟,無防共振(所以在五Bay以下的NAS剛好),無保證MTBF、無保證『Load/Unload Cycles』上限或壽命較短;
      如果是家庭低讀寫量的使用,只要注意散熱的話,其實是足夠的! 不用太擔心。

      只是…同批型號中,容量最大的硬碟是最容易出問題(因為碟片最多,四/五片之一出問題,就必須換硬碟,所以相對容量較大硬碟者,風險較大。)

      1. 謝謝您 耐心專業的回覆
        您建的MD03ACA400V 之前沒注意到 昨晚Google 一下 有幾個疑問
        單碟 非1tb
        影音監控 nas 專用, 但多了「影音監控」 容錯機制不知如何?
        但看到是企業級 打下來的就放心不少!

        我的 nas 若用不同廠牌hd 主起來 不知效率會不會很差 ?
        (因為以前公司遇過一次 12顆 x牌 1tb 組raid6 使用1個多月後 在兩個禮拜內壞3台 最後兩台還是在下班時間 不到24小時一起壞 讓網管傻眼)
        所以我才有想混 hd 做 nas ,但也可以同廠牌 做nas 在加一份另一牌子做備媛
        但好花錢

        所以您推薦這顆 cp看來真不錯 可省不少錢

        Liked by 1 person

      2. 1. 12顆組RAID 6實在太冒險,RAID 6只能壞兩顆<我個人認為RAID 6最大只能到10 Bay。一般都是會改成RAID 10 或 RAID 60 或遠端備份. 你們該不會買的是WD黑標或綠標吧?因為正常支援s.m.a.r.t的ERC/CCTL功能,都會發出警示,提醒準備更換。
        2. RAID效能決定於最慢的硬碟跟RAID系統效能。不會卡在硬碟品牌。
        3. 一開始我就發現買HGST跟TOSHIBA可以省下很多,WD跟Seagate硬碟我也有一堆,但由於相關心得資料實在太多,所以才會將心得放在Blog.
        4. 真正企業級硬碟要注意的是『Load/Unload Cycles』、『 Nonrecoverable Read Errors per Bits Read, Max』、『Mean Time Between Failures (MTBF)』,以及抗共振。保固最好是用不到!一般都是60萬次、10E15、120萬小時以上。
        5. 一般NAS級硬碟也逐漸有RVS(必有)跟DSA(高階)功能,但相對的『Load/Unload Cycles』、『 Nonrecoverable Read Errors per Bits Read, Max』、『Mean Time Between Failures (MTBF)』就會比企業級差上許多。一般都是30萬次、10E14、80萬小時以上。

        6. 我想Toshiba沒那麼大成本去搞實體去做影音監控強化的ic,並且只提供3年保固。日本人認為保固跟售後服務是很大的成本。

發表迴響

在下方填入你的資料或按右方圖示以社群網站登入:

WordPress.com 標誌

您的留言將使用 WordPress.com 帳號。 登出 /  變更 )

Google+ photo

您的留言將使用 Google+ 帳號。 登出 /  變更 )

Twitter picture

您的留言將使用 Twitter 帳號。 登出 /  變更 )

Facebook照片

您的留言將使用 Facebook 帳號。 登出 /  變更 )

連結到 %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.