Documentum to Portal Consistency Checker – Proof of Concept

TSG has several clients using Documentum as a repository and a custom front end application for consumption of the records or renditions of records.  In most cases there is a mechanism in place such as SCS (Site Caching Services) or TSG’s OpenMigrate PUMA (See CIS Case Study for more details).  While a typical Documentum application (ex: Webtop) provides a “one stop shop” for authors and approvers, the interface can be challenging when “consumers” are just looking for quick search and retrieval.  This solution provides improved performance, business continuity, and ability to add documents from other systems.  One potential risk to using a cache of documents and metadata for search and retrieval is the integrity of data.  Publishing techniques are designed to accurately cache records; however there are uncontrollable circumstances that may result in a mismatch.  

Some possibilities that could cause data inconsistencies between Documentum and the cache include:

  • Server outages
  • Database corruption
  • File system corruption
  • Poor or inaccurate database updates / administration
  • Documentum Errors
  • Neglected Development Environments

While these circumstances are unlikely the bottom line is there is no way to know that all the documents that should exist in the portal cache are all accurate, and all the documents that do not belong on the portal cache are removed.

It was this uncertainty that inspired the development for a proof of concept consistency checker program.  We wanted the program to be configurable, run on a schedule or as a Documentum job, evaluate some or all records (depending on cache size), optionally notify administrators of job results, and optionally fix any discrepancies on the spot that it may find.  In environments where there are simply too many records to process at once, we devised a system where the job will only process a portion of the documents on each run, but documents will not be re-inspected until all have been evaluated.

The code architecture is quite simple as the following pseudo code details:

  1. Query Source (Documentum) for all records or record subset – store to Map
  2. Query Target (Cache Database) for all records or record subset– store to Map
  3. Loop through Source record map and check for mismatches or missing Target records
  4. Loop through Target document map and check for missing Source records
  5. Send notification to administrator of all discrepancies (If enabled)
  6. Fix discrepancies (If enabled)

We tested the code on a development environment that we were aware had many issues.  The code processed 2,500 documents in 12 seconds and was able to identify and email a report of 67 inconsistencies.  We ran the code again with a parameter injected to actually fix the issues it located and that run completed in 45 seconds.