檢查硬碟有無支援TLER/ERC/CCTL

Written by:

首先,這也是方便我自行查詢的文章。這幾天一直在研究硬碟,其中TLER/ERC/CCTL功能實在是個頭疼的問題,其對於硬體式RAID是很重要的功能。其為判定硬碟是否失效的重要功能,只要超過硬碟S.M.A.R.T所設定時間,硬體式RAID控制會自動將硬碟判為失效,甚至自動退出該硬碟。如果是不支援TLER/ERC/CCTL功能之硬碟,由於其等於可允許失效時間為1~2秒,系統會立即判定該硬碟offline.

又由於RAID組成的硬碟大都是同時間購買,零件壽命也差不多在同一期間,所以如果在更換硬碟前後或重建系統時,同時間被判定失效硬碟數量超過可允許失效硬碟數量上限,那將會面臨整個RAID系統資料毀滅的慘況。

至於Linux軟體式RAID (mdadm),其Read Command Timer跟 Write Command Timer 之Timerout預設值為30秒,遠超過一般支援TLER/ERC/CCTL硬碟所預設的7~13秒。但超過時間的硬碟,將判定為失效,會將該硬碟離線(offline)。而S.M.A.R.T不支援TLER / ERC / CCTL的硬碟,在遇到Command Timerout, 系統也會立即判定失效。

然而在NAS廠商的硬碟相容表中,只要是硬體S.M.A.R.T不支援TLER/ERC/CCTL功能的硬碟,就肯定是不建議使用硬碟。
至於S.M.A.R.T有支援TLER/ERC/CCTL功能,但預設值為Disable或0 sec, 由於等於沒有Recovery Timer上限時間,所以會依照預設值30秒或廠商設定時間來判定是否失效。(HGST跟Toshiba一般硬碟均是預設值為Disable或0 sec, 所以其Recovery Timer為無上限。)

所以在挑選RAID用硬碟時,TLER/ERC/CCTL功能是非常重要的。避震/共振均可以用硬體(消振式機殼+避震消振墊)來解決。

WD跟Seagate在Desktop等級硬碟均已取消S.M.A.R.T支援TLER/ERC/CCTL功能,完全無法藉由S.M.A.R.T軟體開啟跟設定,所以即使買9顆貴到爆的WD黑標,其架設成RAID的危險性遠比紅標還高上許多。然而此兩品牌商業版硬碟,  Seagate的ERC time為10 sec, WD為7 sec。

至於Toshiba跟HGST, 即使是Desktop等級硬碟則依然支援ERC/CCTL功能,預設值皆為Disable或0 sec,可利用S.M.A.R.T軟體設定開啟以及設定秒數。所以NAS廠商均有將此兩廠商的Desktop等級硬碟列為建議。至於這Toshiba跟HGST的商業等級硬碟,其支援ERC/CCTL功能,但預設值皆為Disable或0 sec。(HGST原廠答覆:這樣回復時間才沒有上限,一般裝置皆應為Disable, 但特殊型號有做設定。)

雖然大約知道toshiba跟HGST的Desktop等級硬碟並沒有取消CCTL功能,但實際上接上電腦時,有啟動嗎? 真的有支援嗎? 這一直對我來說是個大問號? 總算在今天試出個結果跟方法來!並且也透過台灣代理威健詢問了HGST相關ERC timer的預設值問題。

現在總結以下心得,方便大家也一起自我檢查。

首先有幾個先決條件:
1. MB的BIOS有支援最新的ACHI或相關修正。(我的GA-990FXA-UD5, 要更新至F12,才有修正ACHI問題.),並且開啟S.M.A.R.T.
2. 你的SATA或IDE的驅動程式有支援開啟S.M.A.R.T。(我的GA-990FXA-UD5,使用AMD SATA Controller 1.2.1.331版驅動程式才開啟S.M.A.R.T。至於Marvel 88SE9172要更新到1.2.0.1020才能正常。)
3. 電腦安裝smartmontools, 請下載 http://sourceforge.net/projects/smartmontools/

安裝smartmontools的安裝檔時,windows 64bit的請點取64-bit version.
image

雖然還有一套HDAT2,但其要在純DOS執行,且如果不熟悉操作的話,可能會誤將硬碟資料刪除,所以在此不教。

  1. 你要有硬碟接在支援S.M.A.R.T的SATA或IDE埠上!

5.  TLER/ERC/CCTL 各廠牌有各自的稱呼,
TLER: WD
ERC: Seagate
CCTL: Hitalchi、HGST、Toshiba 、三星


再來就開始檢查

  1. 執行smartmontools 的smartctl(Admin CMD)

image

image

  1. 輸入smartctl –scan,  找出電腦的硬碟代號 (是連字號:兩個『-』scan, 不知道為何在WordPress會變成超長的破折號)
    C:\Program Files\smartmontools\bin>smartctl –scan

可以看到此電腦有四個硬碟, 分別是
/dev/sda (Hitachi HDS721010CLA332)
/dev/sdb (
ST3640323AS
)
/dev/sdc   (Toshiba 3.5″ HDD DT01ACA)
/dev/sdd (WDC WD10EARS-00Y5B1)

請依照圖內文字輸入
image

  1. 首先檢查/dev/sdc (Toshiba 3.5″ HDD DT01ACA)所有的S.M.A.R.T資訊, 輸入smartctl -a /dev/sdc

如果執行後,沒有跑出S.M.A.R.T訊息,或說明不支援S.M.A.R.T,請檢查RAID/IDE/AHCI Controller驅動程式是否為新版,以及檢查BIOS是否開啟S.M.A.R.T.

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5″ HDD DT01ACA…
Device Model: TOSHIBA DT01ACA200
Serial Number: 34VU2UVKS
LU WWN Device Id: 5 000039 ff3f5ae6d
Firmware Version: MX4OABB0
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jun 22 22:08:42 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (14344) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 239) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported. (這就是有支援TLER/ERC/CCTL功能)
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always –
0
2 Throughput_Performance 0x0005 100 100 054 Pre-fail Offline –
0
3 Spin_Up_Time 0x0007 136 136 024 Pre-fail Always –
268 (Average 287)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always –
18
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always –
0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always –
0
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail Offline –
0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always –
3
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always –
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always –
17
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always –
18
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always –
18
194 Temperature_Celsius 0x0002 176 176 000 Old_age Always –
34 (Min/Max 25/37)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always –
0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always –
0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline –
0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always –
0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

  1. 再來檢查/dev/sdc (Toshiba 3.5″ HDD DT01ACA)的CCTL的運作狀態為何? 輸入smartctl -l scterc /dev/sdc
C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled

結果SCT Error Recovery Control是關閉Disable的。
(重點說明: HDD ERC的Disable不是指HDD的ERC功能關閉,而是HDD ERC限定時間無限,改由RAID控制器或 軟體RAID的ERC設定值來做判定是否要將HDD離線。下面還會有解釋。)

  1. 現在要開啟設定/dev/sdc (Toshiba 3.5″ HDD DT01ACA)的TLER功能容許上限時間, 輸入smartctl -l scterc,70,70 /dev/sdc
    (補充說明:其實這不是開啟ERC/TLER的功能,而只是設定ERC的容許上限時間。在HGST跟Toshiba的ERC HDD 做這個設定變更其實沒有任何意義,即使設定了,重開機就又回復為預設值。因為還有 RAID控制器跟軟體RAID的ERC可以決定HDD是否該離線的容許時間上限。)
C:\Program Files\smartmontools\bin>smartctl -l scterc,70,70 /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control set to:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

現在已經改成的7秒。(WD/Seagate企業級硬碟的TLER/ERC/CCTL回應時間大多設定在7~10秒左右。Toshiba/HGST則是預設Disable)

  1. 如果是不支援的硬碟,其S.M.A.R.T資訊為何?
    剛好我的電腦之dev/sdd WDC WD10EARS-00Y5B1(WD 黑標1TB), 就是徹底的不支援。
C:\Program Files\smartmontools\bin>smartctl -a /dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF)
Device Model: WDC WD10EARS-00Y5B1
Serial Number: WD-WCAV5A820423
LU WWN Device Id: 5 0014ee 2aeff8abf
Firmware Version: 80.00A80
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Sun Jun 22 22:13:02 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command
from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (20460) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 236) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
(找不到SCT Error Recovery Control supported)

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always –
0
3 Spin_Up_Time 0x0027 148 128 021 Pre-fail Always –
5600
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always –
45
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always –
0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always –
0
9 Power_On_Hours 0x0032 066 066 000 Old_age Always –
25090
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always –
0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always –
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always –
40
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always –
22
193 Load_Cycle_Count 0x0032 003 003 000 Old_age Always –
591010
194 Temperature_Celsius 0x0022 114 098 000 Old_age Always –
33
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always –
0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always –
0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline –
0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always –
0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline –
0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA
_of_first_error
# 1 Short offline Completed without error 00% 25029 –
# 2 Extended offline Completed without error 00% 24963 –
# 3 Conveyance offline Completed without error 00% 0 –

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

  1. 執行smartctl -l scterc /dev/sdd 檢查, 也是沒用,因為硬體就是不支援,無法由韌體或軟體開啟。所以這類硬碟就千萬不要用在RAID或NAS系統。
C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdd
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control command not supported
  1. 但就連我古老的Hitachi HDS721010CLA332有支援ERC\CCTL,預設是Disable, 也是可以利用輸入smartctl -l scterc,100,100 /dev/sda 變更。(原廠建議上限10 sec, 但最終建議為Disable.)
:\Program Files\smartmontools\bin>smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Hitachi Deskstar 7K1000.C
Device Model: Hitachi HDS721010CLA332

 

SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled
C:\Program Files\smartmontools\bin>smartctl -l scterc,100,100 /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control set to:
Read: 100 (10.0 seconds)
Write: 100 (10.0 seconds)

10.  /dev/sdb ST3640323AS也有支援ERC,預設也是Disable, 也是可以利用輸入smartctl -l scterc,100,100 /dev/sdb 開啟

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.11
Device Model: ST3640323AS
Serial Number: 9VK05ZCK
LU WWN Device Id: 5 000c50 00d8a191c
Firmware Version: SD1B
User Capacity: 640,133,946,880 bytes [640 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm

 

SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled

 

C:\Program Files\smartmontools\bin>smartctl -l scterc,100,100 /dev/sdb
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control set to:
Read: 100 (10.0 seconds)
Write: 100 (10.0 seconds)

但即使設定了,只要關機後,重新開機就沒了…

  1. 那HGST 企業級Ultrastar 7K4000 為何? 我擁有的型號是HGST HUS724020ALA640
C:\Program Files\smartmontools\bin>smartctl -a /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org=== START OF INFORMATION SECTION ===
Device Model: HGST HUS724020ALA640
Serial Number: PN2131P6GMEKTP
LU WWN Device Id: 5 000cca 22dc8d601
Firmware Version: MF6OAA70
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jun 23 23:21:35 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command
from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 28) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 322) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always –
0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline –
80
3 Spin_Up_Time 0x0007 128 128 024 Pre-fail Always –
486 (Average 488)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always –
18
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always –
0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always –
0
8 Seek_Time_Performance 0x0005 145 145 020 Pre-fail Offline –
24
9 Power_On_Hours 0x0012 100 100 000 Old_age Always –
763
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always –
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always –
16
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always –
22
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always –
22
194 Temperature_Celsius 0x0002 153 153 000 Old_age Always –
39 (Min/Max 25/48)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always –
0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always –
0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline –
0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always –
0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

但結果HGST HUS724020ALA640的SCT Error Recovery Control竟然是Disable! 這對當時的我來說可真是晴天霹靂! 因為台灣的媒體一直在強調ERC time的重要性。

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sde
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-vista-sp2] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.orgSCT Error Recovery Control:
Read: Disabled
Write: Disabled

並且用HDAT2 5.0檢查也一樣…Read Command Timer跟 Write Command Timer 皆為0…

20140624_00371020140624_00403320140624_004236

所以HGST 企業級Ultrastar 7K4000 硬碟難道不該用在硬體式RAID儲存上嗎?  (Intel/AMD chipset內建的可是硬體Raid。更不要說企業級專用的LSI RAID Controller.)

於是我就請台灣代理商威健的陳先生,請他幫忙詢問HGST原廠相關的問題。

請威健代詢HGST原廠的問題答覆

image

1. 為何同樣是企業級硬碟,Seagate Constellation ES.3其ERC皆為10 sec, WD為7 sec, 為何HGST則是為Disable?是獨有的優良傳統嗎?

=> If not default disable, what value should be set? Disable means no recovery time limit. (Disable就表示回復時間沒有上限.)

Of course, we have some unique p/n who have unique default value. But generic drive should be default disable.(然,HGST有特殊型號有特殊的設定值,但一般裝置的預設值則為Disable)

2. 依照86頁說明,其ERC Command規格上限時間為10 sec(跟Seagate Constellation ES.3  default value相同),請幫忙確認User是否可自行設定為此值?

=> No, only unique p/n support.(不,只有獨特的型號支援)

3. These command timers are volatile. The default value is 0 (i.e. disable command time-out). 這句話就已經明確表示,即使我自行使用S.M.A.R.T軟體設定command timers,只要冷開機,就是歸回default value. 原廠是否願意提供韌體,讓User開啟此功能?

(否則我幹嘛買企業級硬碟做RAID? 沒有做Enable the Error Recovery Control跟setup timer, Hardware RAID跟高階企業用NAS只要偵測到Disk Error(即使是誤判), 下一秒就是判定Disk fail跟Offline; 即使Reliability再好也沒用啊。)

=> Setting are volatile, so return to default (disable) by power cycle .Disable means no recovery time limit.

(其設定是揮發性的,所以只要電源關閉重啟,其就回到預設值,Disable就表示回復時間沒有上限.)

所以依照HGST原廠的意思:

一般裝置的預設值就應該是Disable

其才沒有回復時間上限

現在只能保佑QNAP的工程師的程式設計有將每一個支援CCTL的硬碟,將其設定Read Command Timer跟 Write Command Timer 之預設值為7秒或更長的時間。

Linux的軟體式RAID的SCSI Disk lyaer的command timeout預設設定為30秒。


TOSHIBA MG03ACA200 此硬碟為菲律賓製,跟日本限定版(MD03ACA200)幾乎同外型,但底部有明顯的不同。

C:\Program Files\smartmontools\bin>smartctl -a /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-win8] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MG03ACA200
Serial Number:    53OHKA0MF
LU WWN Device Id: 5 000039 4cbf02379
Firmware Version: FL1A
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jul 03 22:17:59 2014
SMART support is: Available – device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 337) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       –
0
2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      –
0
3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       –
6206
4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       –
3
5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       –
0
7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       –
0
8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      –
0
9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –
0
10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       –
0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –
3
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       –
0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       –
1
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       –
4
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       –
32 (Min/Max 23/32)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –
0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       –
0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      –
0
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       –
0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       –
0
222 Loaded_Hours            0x0032   100   100   000    Old_age   Always       –
0
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       –
0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       –
0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       –
103
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      –
0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1        0        0  Not_testing
2        0        0  Not_testing
3        0        0  Not_testing
4        0        0  Not_testing
5        0        0  Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

C:\Program Files\smartmontools\bin>smartctl -l scterc /dev/sdc
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-win8] (sf-6.2-1)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, http://www.smartmontools.org
SCT Error Recovery Control:
Read: Disabled
Write: Disabled

14 回應至「檢查硬碟有無支援TLER/ERC/CCTL」

  1. 「hugo」的個人頭像
    hugo

    您好 我想請問一下 H3IK40003272SA 是否適合使用於 has 上呢 ?
    有支援 ERC/CCTL 嗎?
    Disabled 就是 時間無上限 意指 有支援嗎 ?
    因為沒用過 nas 最近買了一台 thecus n5550 硬碟 慢慢購置 目前 有一台h3ik40003272sa
    最近發現wd red 有TLER 心想 會不會完了 買錯 nas 又買錯hd
    看到您這篇 寫的很好 心中又燃起一絲絲希望 所以在此請教您一下

    1. 「im5481」的個人頭像
      im5481

      Toshiba跟HGST, 即使是Desktop等級硬碟則依然支援ERC/CCTL功能(SCT Error Recovery Control supported.),預設值皆為Disable或0 sec.
      但由於h3ik40003272sa為Desktop級硬碟,無防共振(所以在五Bay以下的NAS剛好),無保證MTBF、無保證『Load/Unload Cycles』上限或壽命較短;
      如果是家庭低讀寫量的使用,只要注意散熱的話,其實是足夠的! 不用太擔心。

      只是…同批型號中,容量最大的硬碟是最容易出問題(因為碟片最多,四/五片之一出問題,就必須換硬碟,所以相對容量較大硬碟者,風險較大。)

      1. 「im5481」的個人頭像
        im5481

        另外,提醒您,如果你組五顆h3ik40003272sa為NAS RAID的話,會很吵的!

      2. 「im5481」的個人頭像
        im5481

        如果是我的話,我會買:Toshiba 東芝 4TB 3.5吋 64M快取 SATA3 NAS專用硬碟(MD03ACA400V),VIP專有福利價4720元.

      3. 「hugo」的個人頭像
        hugo

        謝謝您 耐心專業的回覆
        您建的MD03ACA400V 之前沒注意到 昨晚Google 一下 有幾個疑問
        單碟 非1tb
        影音監控 nas 專用, 但多了「影音監控」 容錯機制不知如何?
        但看到是企業級 打下來的就放心不少!

        我的 nas 若用不同廠牌hd 主起來 不知效率會不會很差 ?
        (因為以前公司遇過一次 12顆 x牌 1tb 組raid6 使用1個多月後 在兩個禮拜內壞3台 最後兩台還是在下班時間 不到24小時一起壞 讓網管傻眼)
        所以我才有想混 hd 做 nas ,但也可以同廠牌 做nas 在加一份另一牌子做備媛
        但好花錢

        所以您推薦這顆 cp看來真不錯 可省不少錢

        Liked by 1 person

      4. 「im5481」的個人頭像
        im5481

        1. 12顆組RAID 6實在太冒險,RAID 6只能壞兩顆<我個人認為RAID 6最大只能到10 Bay。一般都是會改成RAID 10 或 RAID 60 或遠端備份. 你們該不會買的是WD黑標或綠標吧?因為正常支援s.m.a.r.t的ERC/CCTL功能,都會發出警示,提醒準備更換。
        2. RAID效能決定於最慢的硬碟跟RAID系統效能。不會卡在硬碟品牌。
        3. 一開始我就發現買HGST跟TOSHIBA可以省下很多,WD跟Seagate硬碟我也有一堆,但由於相關心得資料實在太多,所以才會將心得放在Blog.
        4. 真正企業級硬碟要注意的是『Load/Unload Cycles』、『 Nonrecoverable Read Errors per Bits Read, Max』、『Mean Time Between Failures (MTBF)』,以及抗共振。保固最好是用不到!一般都是60萬次、10E15、120萬小時以上。
        5. 一般NAS級硬碟也逐漸有RVS(必有)跟DSA(高階)功能,但相對的『Load/Unload Cycles』、『 Nonrecoverable Read Errors per Bits Read, Max』、『Mean Time Between Failures (MTBF)』就會比企業級差上許多。一般都是30萬次、10E14、80萬小時以上。

        6. 我想Toshiba沒那麼大成本去搞實體去做影音監控強化的ic,並且只提供3年保固。日本人認為保固跟售後服務是很大的成本。

      5. 「im5481」的個人頭像
        im5481

        家庭用NAS其實真的用不到企業級硬碟! 買一般有支援ERC/CCTL功能的即可,讀寫量沒那麼大! 而且便宜,換了或升級也不痛!

    2. 「im5481」的個人頭像
      im5481

      抱歉!更正一個訊息,h3ik40003272sa其實不吵,我發現我吵的都是 Toshiba MD03ACA200 2TB硬碟,換下來之後,NAS就不吵了!

  2. 「杠头」的個人頭像
    杠头

    我的东芝硬盘,linux上看有,win10上看没有,win10用的系统自带的AHCI驱动,主板芯片Z170

    1. 「im5481」的個人頭像
      im5481

      windows自帶的驅動大都無法啟動,需要安裝廠商提供的驅動程式。

      1. 「杠头」的個人頭像
        杠头

        intel的rst驱动在win10 64位上有性能问题,不如系统自带的性能好,而且目前是不是没有办法永久开启CCTL?

      2. 「im5481」的個人頭像
        im5481

        東芝的設定是有支援CCTL,但沒有限定失效時間,由軟體,Raid或系統決定失效時間上限。

      3. 「杠头」的個人頭像
        杠头

        我有一个基于linux的家用nas,有什么方法能看到raid或系统对CCTL的失效时间的配置?

      4. 「im5481」的個人頭像
        im5481

        直接問原廠的技術服務吧。

回覆給im5481 取消回覆

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料