Last active
January 2, 2016 07:18
-
-
Save b333z/8268711 to your computer and use it in GitHub Desktop.
This Systemtap script is to provide a workaround with removal of slog device that fails due to vs_alloc for the device != 0. As a safely precaution ZFS refuses to remove the device. It basically implements a linux version of a workaround described here: http://zpool.org/2012/01/10/how-to-fix-a-stuck-zfs-log-device.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
diff --git a/module/zfs/spa.c b/module/zfs/spa.c | |
index 4c8f8bf..6b4ca40 100644 | |
--- a/module/zfs/spa.c | |
+++ b/module/zfs/spa.c | |
@@ -5250,6 +5250,13 @@ spa_vdev_remove_from_namespace(spa_t *spa, vdev_t *vd) | |
ASSERT(spa_config_held(spa, SCL_ALL, RW_WRITER) == SCL_ALL); | |
ASSERT(vd == vd->vdev_top); | |
+ // slog stuck hack - barnes333@gmail.com - https://github.com/zfsonlinux/zfs/issues/1422 | |
+ if (vd->vdev_islog == 1 && vd->vdev_removing == 1 | |
+ && vd->vdev_state == VDEV_STATE_OFFLINE && vd->vdev_stat.vs_alloc > 0) { | |
+ vd->vdev_stat.vs_alloc = 0; | |
+ } | |
+ | |
/* | |
* Only remove any devices which are empty. | |
*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* Script: zfs_slog_zero_vsalloc_inline.stp | |
* | |
* DISCLAIMER: THIS IS DANGEROUS ENSURE YOU HAVE BACKUPS Of YOUR POOL BEFORE ATTEMPTING! | |
* | |
* Author: Andrew Barnes <barnes333@gmail.com> | |
* Date: 2014/01/06 | |
* | |
* Description: | |
* This Systemtap script is to provide a workaround for removal of slog device that | |
* fails due to vs_alloc for the device != 0. As a safely precaution ZFS refuses to | |
* remove the device. It basically implements a linux version of a workaround | |
* described here: http://zpool.org/2012/01/10/how-to-fix-a-stuck-zfs-log-device | |
* | |
* Symptoms: | |
* - Removing the log device with: zpool remove <pool> <slog_device> | |
* has been run with no errors, but device still shows in zpool status -v and zpool iostat -v | |
* with status: ONLINE | |
* - After that examining the output of: zdb -C <poolname> | |
* the slog device shows the property: removing: 1 | |
* - Although the slog device is ONLINE, no writes are being sent to the slog, causing all sync io to go to | |
* other log devices if present or the main pool vdevs. | |
* | |
* Cause: | |
* There seems to be a way for a log device to get to a state where the removal process | |
* cannot totally empty the device. In my case the pool is over 3 | |
* years old and vs_alloc of the slog was at 640K, this pools has gone though just about all zfsonlinux | |
* versions so am unsure of when the data was leaked or whether causing this condition is still possible | |
* with latest code. | |
* | |
* Workaround: | |
* This systemtap script sets the slog vdev_stat->vs_alloc field to 0 ( this is the field you see in | |
* zpool iostat -v in the USED column ). This effectively bypasses the safety check that | |
* the removal gets stuck on due to the leak. | |
* | |
* If you have a pool where all your main pool vdevs are mirrors, then another solution is to use | |
* zpool split if you want more details on this feel free to email me with you pools config | |
* and I can provide more details. | |
* | |
* References: | |
* - http://zpool.org/2012/01/10/how-to-fix-a-stuck-zfs-log-device | |
* - https://github.com/zfsonlinux/zfs/issues/1422 | |
* | |
* Requirements: | |
* - Systemtap installed ( tested with version 2.4 ) | |
* - Debug symbols installed for the zfs module ( distro specific ) | |
* | |
* Usage: | |
* - If this log device is mirrored detach on of the sides first so it becomes a non-mirrored zlog using zpool detach | |
* - You then need to offline the slog (script checks this for added safety), eg: zpool offline <pool> <slog_device> | |
* - Then try to remove the device using normal means, then check you have symptoms above before proceeding. | |
* - Replace values below <pool> and <slog_device> before running | |
* | |
* sudo stap -g zfs_slog_zero_vsalloc_inline.stp -c "zpool remove <pool> <slog_device>" | |
*/ | |
// Would be better to use a function probe but this gets inlined... | |
// so instead we use statement probe on the caller. | |
// probe module("zfs").function("spa_vdev_remove_from_namespace") | |
probe module("zfs").statement("spa_vdev_remove@*:*") | |
{ | |
printf("%s\n", pp()); | |
// Ensure the vdev we are about to hack is: | |
// a) A log device | |
// b) Has been marked for removal | |
// c) Offline ($vd->vdev_state == 2) | |
// d) vs_alloc > 0 | |
if (@defined($vd) | |
&& $vd->vdev_islog == 1 | |
&& $vd->vdev_removing == 1 | |
&& $vd->vdev_state == 2 | |
&& $vd->vdev_stat->vs_alloc > 0) | |
{ | |
printf("Current value of $vd->vdev_stat->vs_alloc: %d\n", $vd->vdev_stat->vs_alloc); | |
printf("Setting $vd->vdev_stat->vs_alloc to 0;\n"); | |
$vd->vdev_stat->vs_alloc = 0; | |
printf("New value of $vd->vdev_stat->vs_alloc: %d\n", $vd->vdev_stat->vs_alloc); | |
} | |
} | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment