Rootop 服务器运维与web架构

ubuntu中扫描硬盘坏块

作者: “Zhwt”,一位极其爱折腾的小哥。

今天收到粉丝投稿  – ↑ –

使用 badblocks 命令扫描磁盘上的坏块.

# badblocks -b 4096 -o badblocks.txt -nsv /dev/sda
  • -b 4096: 指定扫描的块大小, 可以用 tune2fs -l partition | grep 'Block size' 查看某个分区的信息
  • -o badblocks.txt 坏块信息输出到 badblocks.txt 而不是 stdout
  • -n 指定用非破坏性读写测试. 在测试之前自动备份扇区的原始内容, 然后写入随机数据并读取, 最后从备份中恢复原始内容. 适合硬盘上已经有数据的情况, 这种方式不会破坏硬盘上原有的数据
  • -s 显示进度
  • -v 显示详细信息, 显示坏块信息到 stdout

需要注意的是在使用 badblocks 检查坏块时, 被检查的磁盘不能处于被挂载的状态, 如果提示:

/dev/sda is mounted; it's not safe to run badblocks!

说明需要先 umount 解除磁盘的挂载, 然后再运行 badblocks 扫描. 建议可以用 Clonezilla 或者 Ubuntu 系统安装盘起一个 live 环境然后扫描.

示例输出:

# badblocks -b 4096 -o badblocks.txt -nsv /dev/sda
Checking for bad blocks in non-destructive read-write mode
From block 0 to 3909653
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern:  11.45% done, 1:41 elapsed. (1/2/3 errors)

最后的 (1/2/3 errors) 指的是 1 个读取错误, 2 个写入错误, 3 个 corruption error(损坏错误, 不会翻译)

利用已知坏块信息重新创建分区:

# mke2fs -t filesystem-type -l badblocks.txt /dev/sda2

查看 SMART 信息

列出挂载的磁盘信息:

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda      8:0    0 14.9G  0 disk
├─sda1   8:1    0  711M  0 part /boot/efi
└─sda2   8:2    0 14.2G  0 part /

查看硬盘 SMART 信息:

smartctl -a /dev/sda

如果这一步提示找不到命令, 需要安装 smartmoontools 包:

apt install smartmontools

示例 SMART 信息:

# smartctl -a /dev/sda
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.8.0-39-generic] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     FORESEE 16GB SSD
Serial Number:    I48883J003721
LU WWN Device Id: 5 02b2a2 01d1c1b1a
Add. Product Id:  mavlsata
Firmware Version: V3.24
User Capacity:    16,013,942,784 bytes [16.0 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Jul 25 07:01:14 2024 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x35) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Abort Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x00) Error logging NOT supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   1) minutes.
Conveyance self-test routine
recommended polling time:        (   1) minutes.

SMART Attributes Data Structure revision number: 5
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0033   100   100   010    Pre-fail  Always       -       4559
 12 Power_Cycle_Count       0x0033   100   100   010    Pre-fail  Always       -       1107
161 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
164 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       42474
165 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       51
166 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       17
167 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       40
169 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
176 Erase_Fail_Count_Chip   0x0033   100   100   010    Pre-fail  Always       -       0
177 Wear_Leveling_Count     0x0033   100   100   010    Pre-fail  Always       -       0
178 Used_Rsvd_Blk_Cnt_Chip  0x0033   100   100   010    Pre-fail  Always       -       0
192 Power-Off_Retract_Count 0x0033   100   100   010    Pre-fail  Always       -       14
194 Temperature_Celsius     0x0033   100   100   010    Pre-fail  Always       -       48
195 Hardware_ECC_Recovered  0x0033   100   100   010    Pre-fail  Always       -       0
199 UDMA_CRC_Error_Count    0x0033   100   100   010    Pre-fail  Always       -       0
241 Total_LBAs_Written      0x0033   100   100   010    Pre-fail  Always       -       237
242 Total_LBAs_Read         0x0033   100   100   010    Pre-fail  Always       -       659
243 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
244 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
245 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
246 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
250 Read_Error_Retry_Rate   0x0033   100   100   010    Pre-fail  Always       -       0
251 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
252 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
253 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always       -       0
254 Unknown_SSD_Attribute   0x0033   100   100   010    Pre-fail  Always       -       0

SMART Error Log not supported

SMART Self-test Log not supported

Selective Self-tests/Logging not supported

The above only provides legacy SMART information - try 'smartctl -x' for more

如果提示信息如下说明硬盘不支持 SMART, 例如在 VMWare 虚拟机里的虚拟磁盘:

# smartctl -a /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-113-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               VMware,
Product:              VMware Virtual S
Revision:             1.0
User Capacity:        107,374,182,400 bytes [107 GB]
Logical block size:   512 bytes
Rotation Rate:        Solid State Device
Device type:          disk
Local Time is:        Thu Jul 25 15:04:32 2024 CST
SMART support is:     Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature:     0 C
Drive Trip Temperature:        0 C

Error Counter logging not supported

Device does not support Self Test logging

需要注意 SMART 信息中的 Reallocated_Sector_Ct, Reallocated_Event_Count, Current_Pending_Sector 这三条数据, 如果开始升高说明可能磁盘上出现坏块而且数量在增加

参考:

  1. https://wiki.archlinux.org/title/Badblocks
  2. https://www.baeldung.com/linux/disk-check-repair-bad-sectors
  3. https://forum.openmediavault.org/index.php?thread/21047-clip-out-bad-sectors/

原创文章,转载请注明。本文链接地址: https://www.rootop.org/pages/5480.html

作者:Venus

服务器运维与性能优化

评论已关闭。