Randomly Expressed

About

Welcome to my blog “randomly expressed”. I created this website to publish helpful tips. It’s mainly technology driven, but I will blog about other topics. I am a Unix sysadmin that is always looking to learn new things. My goal is to be able to share knowledge that others may find useful. xkcd.com

Continue Reading »

Contact

Connect With US

Connect with us on the following social networking sites.

Most Popular Posts.

Add Some Content to This Area

You should either deactivate this panel on the Theme Settings page, or add some content via the Widgets page in your WordPress dashboard.

ONTAP 9 system node migrate-root fails

By on September 22, 2018 in Storage, Technology with 2 Comments

The system node migrate-root command available only on ONTAP 9 or later, migrates the root aggregate of a node to a different set of disks non disruptively. The command starts a job that backs up the node configuration, creates a new aggregate, set it as new root aggregate, restores the node configuration and restores the names of original aggregate and volume. The job might take as long as a few hours depending on time it takes for zeroing the disks, rebooting the node and restoring the node configuration.
Reference the following NetApp KB article on how to use the command: NetApp KB article on system node migrate-root command

Occasionally the system node migrate-root command fails with the following error. When you get this error then you must perform steps 1, 2, 3, 5, 6 if you want to rename, and 7 from the KB article I referenced earlier.

nac01-lax::*> job show -id 26515 -instance

Job ID: 26515
Owning Vserver: nac01-lax
Name: Migrate root aggregate
Description: Root aggregate migration job for node nac01-04-lax
Priority: High
Node: nac01-03-lax
Affinity: Cluster
Schedule: @now
Queue Time: 09/21 06:01:59
Start Time: 09/21 06:01:59
End Time: 09/21 06:16:16
Drop-dead Time: -
Restarted?: false
State: Failure
Status Code: 1
Completion String: Internal error. Failed to destroy the volume "vol0". Reason: .
Job Type: NDO Migrate Root
Job Category: Non Disruptive Operation
UUID: 869003f4-bd9e-11e8-907b-00a098a0f358
Execution Progress: Complete: Internal error. Failed to destroy the volume "vol0". Reason: . [1]
User Name: admin
Restart Is Delayed by Module: -

This error indicates that the system node migrate-root command failed to perform the clean up steps to remove the old root aggregate. You may try restarting the job again to see if it successfully completes the second time by running the following command:

::> set advanced
::*> system node migrate-root -node nac01-lax -resume true

If it aborts again, then we will have to perform the following clean up steps manually.
Verify that the old root volume is offline before you delete.

nac01-lax::*> set -privilege diag
nac01-lax::*> run -node nac01-04-lax
Type 'exit' or 'Ctrl-D' to return to the CLI
nac01-04-lax> vol status
Volume State Status Options
new_root online raid_dp, flex root, space_slo=none
64-bit
vol0 offline raid_dp, flex space_slo=none(disabled)
64-bit


Once you have have verified the old root volume is offline go ahead and delete.

nac01-04-lax> vol destroy vol0
Are you sure you want to destroy volume 'vol0'? y
Volume 'vol0' destroyed.

You should now be left with the new root volume.

nac01-04-lax> vol status
Volume State Status Options
new_root online raid_dp, flex root, space_slo=none
64-bit

Now we need to delete the old root aggregate that doesn’t show root under the options.

nac01-04-lax> aggr status
Aggr State Status Options
aggr0_n4 online raid_dp, aggr nosnap=on, raidsize=3
64-bit
new_root online raid_dp, aggr root, nosnap=on, raidsize=3
64-bit

Delete the old root aggregate, remember to type exit to get out of the node level prompt first.

nac01-lax::*> aggr delete -aggregate aggr0_n4

Warning: Are you sure you want to destroy aggregate "aggr0_n4"? {y|n}: y
[Job 26519] Job succeeded: DONE

When moving the root volume, some of the volume and aggregate changes are done outside of the knowledge of the Volume Location Database (VLDB).
As a result, it is important to make sure that the VLDB is modified to know about the aggregate and volume changes made during this maintenance. To make the VLDB aware of the changes, run the following diag level commands:

nac01-lax::*> volume remove-other-volume -volume vol0 -vserver nac01-04-lax

nac01-lax::*> volume add-other-volumes -node nac01-04-lax

Verify the correctness of the VLDB with the following diag level command:

nac01-lax::*> debug vreport show
aggregate Differences:

Name Reason Attributes
-------- ------- ---------------------------------------------------
new_root(a70fbb44-03b1-4414-af84-80587c317376)
Present in WAFL Only
Node Name: nac01-04-lax
Aggregate UUID: a70fbb44-03b1-4414-af84-80587c317376
Aggregate State: online
Aggregate Raid Status: raid_dp
Aggregate HA Policy: cfo
Is Aggregate Root: true
Is Composite Aggregate: false

Run the following command to fix the differences, using the name in the output of the vreport for the object:

nac01-lax::*> debug vreport fix -node nac01-04-lax -type aggregate -object new_root(a70fbb44-03b1-4414-af84-80587c317376)

Run the vreport show command again.

nac01-lax::*> debug vreport show
aggregate Differences:

Name Reason Attributes
-------- ------- ---------------------------------------------------
new_root Present both in VLDB and WAFL with differences
Node Name: nac01-04-lax
Aggregate UUID: a70fbb44-03b1-4414-af84-80587c317376
Aggregate State: online
Aggregate Raid Status: raid_dp
Aggregate HA Policy: cfo
Is Aggregate Root: true
Is Composite Aggregate: false
Differing Attribute: Volume Count (Use commands 'volume add-other-volume' and 'volume remove-other-volume' to fix 7-Mode volumes on this aggregate)
WAFL Value: 1
VLDB Value: 0

Run the volume add-other-volumes command:

nac01-lax::*> volume add-other-volumes -node nac01-04-lax

Now run the vreport one more time and it should run clean.

nac01-lax::*> debug vreport show
This table is currently empty.

Info: WAFL and VLDB volume/aggregate records are consistent.

Now it’s time to rename the new root volume to whatever naming convention you use. Return back to the node shell prompt to verify.

nac01-lax::*> run -node nac01-04-lax
Type 'exit' or 'Ctrl-D' to return to the CLI
nac01-04-lax> vol status
Volume State Status Options
new_root online raid_dp, flex root, space_slo=none
64-bit

Run the following command from the cluster shell prompt to rename the new root volume.

nac01-lax::*> vol rename -vserver nac01-04-lax -volume new_root -newname vol0
[Job 26521] Job succeeded: Successful

Rename the new root aggregate to whatever naming convention you use.

nac01-lax::> aggr rename -aggregate new_root -newname aggr0_n4

Now add the NVFAIL option to the new root volume. As you can see in the following command it shows off.

nac01-lax::*> vol show -volume vol0 -fields nvfail
vserver volume nvfail
------------ ------ ------
nac01-01-lax vol0 on
nac01-02-lax vol0 on
nac01-03-lax vol0 on
nac01-04-lax vol0 off
nac01-05-lax vol0 on
nac01-06-lax vol0 on
6 entries were displayed.

To turn it on run the following command:

nac01-lax::*> node run -node nac01-04-lax vol options AUTOROOT nvfail on

Finally revert all LIF(s) back to their home port(s):

nac01-lax::> network interface revert -vserver * -lif *

Facebook Comments

Tagged With: , ,

2 Reader Comments

Trackback URL | Comments RSS Feed

  1. Christophe says:

    same error today
    thanks for the share

  2. Henri Daniels says:

    Thanks for this article. It helped me a lot when reinstalling a FAS2240 using ontap 9.1P16….

Post a Comment

Your email address will not be published. Required fields are marked *

Top