Repairing missing DAOS objects

With the tell DAOSmgr Repair command, Domino administrators can scan the DAOS catalog for objects that were marked as missing during DAOS resynchronization and attempt to repair them from the server's cluster mates.

DAOS maintains an index of all DAOS objects that are referenced by the DAOS-enabled databases on a server. On a healthy server, every DAOS .nlo file (Notes large object) will have a corresponding entry in the daoscat.nsf catalog and every entry in the daoscat.nsf catalog will have a single corresponding .nlo. While unusual, there are some scenarios where the daoscat.nsf and the .nlo repository get out of sync. For example, when restoring older copies of databases but not restoring required .nlo files or disk issues causing data loss. DAOS recognizes these scenarios and requires a catalog resync. When you run tell DAOSmgr Resync and a missing object is detected, a message such as the following will be printed to the Domino console:
The database C:\domino\Data\test\example.nsf has a reference to C:\domino\Data\DAOS\0001\960D465FF5C94E9BE1607C800FEFC5C8E64C549C0001B842.nlo, which does not exist. An entry has been added to the DAOS catalog indicating the NLO is missing:

This message indicates that DAOS is aware that there is an object referenced by some database on the server which it does not have. You can see which objects DAOS knows are missing using the tell DAOSmgr Repair List Missing command:

> tell daosmgr repair list missing

DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\960D465FF5C94E9BE1607C800FEFC5C8E64C549C0001B842.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\3CAEC046CAD7A07BA132C70D7DC9ACEAB127581E00018F2D.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\4F6ECF991DD6387442A5ED246FE31560A36FCF9D00006943.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\AA5FD47BFE81C8119F78CD33A8FC608D1212AF7A00016ED0.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\0E93F7FABCBDD7BBA1EBF421AB1039C078A478060000671B.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\B02F6276EF3F30CCD1BE58538BCED5FCB4DC09E300019481.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\A6C7F8C49BE39557851EE8505965BDC9147F8E4C00018EB8.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\2FB8545DDAF0D3EA591C12882727C665114F7A32000186F0.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\952E816259300ABD6DE972FA81BFF3B41C2309BB000186E9.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\490F1230DFF5DB385515B5FA1706BD651D493EB90001AA7C.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\D9C8F2194FC02B0725C09DF8498EF227C25610A700006545.nlo
DAOS Repair: Missing Object C:\domino\Fender\Data\DAOS\0001\D34A8CCFE589D2901031F8627993FF7C5CD88F1100006151.nlo
DAOS Repair: 12 Missing Objects Found

It is important to note that the Repair List Missing only lists .nlo files which were missing at the time of resync.

In older versions of Domino, the Domino Administrator would need to restore missing .nlo files from backup. Beginning in Domino 14, the DAOSmgr program can find and copy (repair) known missing objects from a server’s cluster mates. When the catalog is in sync, you can run DAOS object repair using the tell DAOSmgr Repair Objects command:
> tell daosmgr repair objects

DAOSMGR: REPAIR started
DAOS REPAIR: 12 objects examined.
DAOS REPAIR: 12 objects repaired.
DAOS REPAIR: 0 objects encountered an error while attempting a repair.
Summary of Repair Objects command in C:\Domino\Fender\Data\daosrepair.txt.
DAOSMGR: REPAIR complete: No error
In this example, all 12 missing .nlo files were able to be repaired from this server’s available cluster mates. Note that a summary of the repair objects command was written to a file. If you open that file, you will see the following:
DAOS REPAIR: Repaired Objects 10/20/2023 08:19:11 AM.
DAOS REPAIR: Repair Limit:    100.
DAOS REPAIR: Repair Logging:  Failed only.
DAOS REPAIR: 12 objects examined.
DAOS REPAIR: 12 objects repaired.

DAOS REPAIR: 0 objects encountered an error while attempting a repair.
In this case, the log does not provide much more information than the console output for the command but there are two good pieces of information, as follows:
  • DAOS REPAIR: Repair Limit: 100. tells that there is a limit to the number of objects that will be repaired when the command is run. You can change this limit. For details, refer to the later section in this article, Choosing the repair limit.
  • DAOS REPAIR: Repair Logging: Failed only. tells that the DAOS repair command will only log information about objects which could not be repaired and the reason they could not be repaired. This might indicate that a possible donor (or cluster mate) is not available and the repair can be tried again later. It might also indicate that all other donors have been tried and none of the other servers have this object. In this case, the object should be restored from backup.
    That said, if we are curious about where the server got a specific .nlo file, we can enable more detailed logging using the DAOS_REPAIR_LOG_ALL=1 .ini setting. In this case, the summary file would contain the following:
    DAOS REPAIR: Repaired Objects 10/20/2023 08:16:08 AM.
    DAOS REPAIR: Repair Limit:    100.
    DAOS REPAIR: Repair Logging:  All.
    DAOS REPAIR: 12 objects examined.
    DAOS REPAIR: 12 objects repaired.
    DAOS REPAIR: 0 objects encountered an error while attempting a repair.
    DAOS REPAIR: Repaired 960D465FF5C94E9BE1607C800FEFC5C8E64C549C0001B842 from donor CN=MusicMan/O=Guitars.
    DAOS REPAIR: Repaired 3CAEC046CAD7A07BA132C70D7DC9ACEAB127581E00018F2D from donor CN=MusicMan/O=Guitars.
    DAOS REPAIR: Repaired 4F6ECF991DD6387442A5ED246FE31560A36FCF9D00006943 from donor CN=Gibson/O=Guitars.
    DAOS REPAIR: Repaired AA5FD47BFE81C8119F78CD33A8FC608D1212AF7A00016ED0 from donor CN=Gibson/O=Guitars.
    DAOS REPAIR: Repaired 0E93F7FABCBDD7BBA1EBF421AB1039C078A478060000671B from donor CN=MusicMan/O=Guitars.
    DAOS REPAIR: Repaired B02F6276EF3F30CCD1BE58538BCED5FCB4DC09E300019481 from donor CN=Gibson/O=Guitars.
    DAOS REPAIR: Repaired A6C7F8C49BE39557851EE8505965BDC9147F8E4C00018EB8 from donor CN=Gibson/O=Guitars.
    DAOS REPAIR: Repaired 2FB8545DDAF0D3EA591C12882727C665114F7A32000186F0 from donor CN=Gibson/O=Guitars.
    DAOS REPAIR: Repaired 952E816259300ABD6DE972FA81BFF3B41C2309BB000186E9 from donor CN=Gibson/O=Guitars.
    DAOS REPAIR: Repaired 490F1230DFF5DB385515B5FA1706BD651D493EB90001AA7C from donor CN=Gibson/O=Guitars.
    DAOS REPAIR: Repaired D9C8F2194FC02B0725C09DF8498EF227C25610A700006545 from donor CN=MusicMan/O=Guitars.
    DAOS REPAIR: Repaired D34A8CCFE589D2901031F8627993FF7C5CD88F1100006151 from donor CN=MusicMan/O=Guitars.
    This detailed output shows exactly which cluster mate provided each specific .nlo file. This is interesting information, but the logging provides more useful information for when objects cannot be repaired.
    DAOS REPAIR: Repaired Objects 10/20/2023 09:23:34 AM.
    DAOS REPAIR: Repair Limit:    100.
    DAOS REPAIR: Repair Logging:  Failed only.
    DAOS REPAIR: 12 objects examined.
    DAOS REPAIR: 8 objects repaired.
    DAOS REPAIR: 4 objects encountered an error while attempting a repair.
    DAOS REPAIR: Attempt to repair 5AC70BE47DB282929653ADBBC433C1AAFCEDDBD5000188E5 failed : File does not exist.
    			Donor CN=MusicMan/O=Guitars returned 'File does not exist'
    			Donor CN=Gibson/O=Guitars returned 'File does not exist'
    DAOS REPAIR: Attempt to repair AA5FD47BFE81C8119F78CD33A8FC608D1212AF7A00016ED0 failed : File does not exist.
    			Donor CN=MusicMan/O=Guitars returned 'File does not exist'
    			Donor CN=Gibson/O=Guitars returned 'File does not exist'
    DAOS REPAIR: Attempt to repair A6C7F8C49BE39557851EE8505965BDC9147F8E4C00018EB8 failed : File does not exist.
    			Donor CN=MusicMan/O=Guitars returned 'File does not exist'
    			Donor CN=Gibson/O=Guitars returned 'File does not exist'
    DAOS REPAIR: Attempt to repair 952E816259300ABD6DE972FA81BFF3B41C2309BB000186E9 failed : File does not exist.
    			Donor CN=MusicMan/O=Guitars returned 'File does not exist'
    			Donor CN=Gibson/O=Guitars returned 'File does not exist'

    In this case, both donors were tried and no other server has this object, so these files will need to be restored from backup. In some cases where an individual donor server is unreachable or not running, it would be good to run tell DAOSmgr Repair Objects at another time when all of the donors are available.

How do I know who my donors are and if they are available?

In Domino 14.0, donor servers are our cluster mates. A donor server is another Domino server that has an object which our server needs and which trusts our server to make a copy of the object.

You can use the show cluster console command to get this information:
> sh cl
Cluster information:
Cluster name: ElectricGuitar, Server name: Fender/Guitars
Server cluster probe timeout: 1 minute(s)
Server cluster probe count: 2556
Server cluster default port: *
Server cluster auxiliary ports:
Server availability threshold: 0
Server availability index: 100 (state: AVAILABLE)
Server availability default minimum transaction time: 3000
Cluster members (3):
      Server: Gibson/Guitars, availability index: 100
      Server: MusicMan/Guitars, availability index: 100
      Server: Fender/Guitars, availability: UNREACHABLE		this is the server doing the repair so this can be ignored
It is very important to know that while tell DAOSmgr Repair is new in Domino 14, any Domino server version 10.0.1 or higher can act as a donor if the cluster symmetry feature is enabled – in older releases, this is D10_ENABLE_REPAIR=1 but in Domino 14, this ini is no longer required).

Repairing a single .nlo file

The DAOSmgr Repair Objects command will only repair DAOS objects which the catalog knows are missing (via resync). However, there is another variation of the command that will check if an .nlo file is missing and repair it if necessary: tell daosmgr repair 2FB8545DDAF0D3EA591C12882727C665114F7A32000186F0.nlo

This command can be useful for scenarios like restoring a database from a backup. For example, say you want to restore an older copy of database such as example.nsf. You would need to check if there are any .nlo files unaccounted for by doing the following:
    1. On Domino console, run tell daosmgr listnlo missing example.nsf to see if there are any .nlo files that are unaccounted for.
    2. In a command or terminal window, get the names of missing .nlo files, for example through commands type, notepad or cat, vi, and so on.
    3. Then repair the mising .nlo files.
      > tell daosmgr repair B02F6276EF3F30CCD1BE58538BCED5FCB4DC09E300019481.nlo
      
      DAOS REPAIR: 1 objects examined.
      DAOS REPAIR: 1 objects repaired.
      DAOS REPAIR: 0 objects encountered an error while attempting a repair.

When to run DAOS repair

You can run DAOS repair at any time when the DAOS catalog is in-sync – to check if the catalog is in SYNCHRONIZED state, run tell DAOSmgr Status. A logical time to run it is immediately after a DAOS catalog resync finishes. You can configure DAOS to do this automatically using the automatic repair option on the DAOS tab of the server’s server document.

Alternatively, you can schedule a DAOS repair at a time of your choosing using a Program Document or run it manually as needed.

Choosing the repair limit

By default, DAOS repair will stop after 100 attempted repairs. Under normal operating conditions, DAOS should never have any missing objects but sometimes you might need to temporarily restore databases or bring back an older copy of a database and that there can therefore be cases where DAOSmgr Repair Objects is very useful. However, the repair feature is not a substitute for effective backup procedures. Significant storage events, such as a lost DAOS storage volume, should be managed using your restore procedure. DAOSmgr repair is single-threaded and would not perform well in this use case. Additionally, a large-scale repair task could add significant network and disk load to cluster mates.

Another important consideration is that objects are always repaired to Tier 1 storage (local DAOS disk). If your server utilizes a two-tier storage model using an S3 compatible storage device to expand your storage space, a large-scale repair could cause the server under repair to unexpectedly use a large amount of local disk space.

That said, you can change the limit at your discretion using the DAOS_REPAIR_LIMIT=n notes.ini setting.

Encryption considerations

We highly recommend that all DAOS objects be encrypted on disk. By default, .nlo files are encrypted with the server's .id file and are therefore only usable by the server that created them. If you are using server-specific encryption, you cannot copy .nlo files between servers. Fortunately, DAOS repair will securely copy the object from the donor server and then encrypt it using the local encryption settings, so it is not necessary for the donor server(s) to have the same encryption type or strength as the server requesting the repair.

While DAOS repair does the right thing, note that Domino 12 supports the use of shared encryption keys, which can be used to allow several servers to use the same .nlo files (and this is the backbone of the Tier 2 shared object feature). Also, there is a new DAOS encryption manager (daosencmgr) which can examine the DAOS repository to verify that all of your DAOS objects are encrypted with the same key and can re-encrypt objects to bring all of your DAOS objects to the desired encryption level.

For more information, see Using a shared key to encrypt DAOS objects across servers and Making DAOS object encryption keys consistent.

Is DAOS repair connected to the server’s repair command?

Sort of. The server’s repair command works in conjunction with the cluster symmetry feature to monitor the health and availability of selected databases (.nsf files). If a monitored database is missing or corrupt, then the database is "repaired" from a donor and the repaired database's DAOS objects are scanned. Any missing DAOS objects will then be repaired as part of the database repair process.

The DAOSmgr Repair command looks at the entire DAOS repository for missing objects, regardless of which databases they are referenced by.

Note:
  • Repair will only run if the DAOS catalog is in SYNCHRONIZED state. This is because it relies on the catalog to identify objects that have been marked as missing.
  • .NLOs are repaired from Domino cluster mates. This feature does not apply to non-clustered Domino servers.
  • Repair requests are distributed to donor servers randomly. Every available donor will be tried. A request will only be marked as failed after every available donor has been tried.
  • DAOSmgr Repair Objects will automatically convert repaired objects to the correct encryption type for the server being repaired.
  • This feature is "opportunistic". It attempts to restore any missing objects from the available donors and if the donor is not available (either offline or busy) then the object will be skipped. It has only limited retry capability. If the repair command reports that some objects could not be repaired because of unavailable donors, you can rerun it at a later time when the donors are known to be available.
  • This feature is not a substitute for effective backup procedures. Significant storage events such as a lost DAOS storage volume should be managed using your restore procedure. DAOSmgr Repair is single threaded and would not perform well in this use case. Additionally, a large-scale repair task could add significant network and disk load to cluster mates.