Documentum Migration – “Two-Step” Bulk Load versus a “One-Step” Migration Approach

Recently, we have seen an uptick in the number of requests for OpenMigrate support for migrating from Documentum to either another Documentum repository or a new repository like Alfresco or M-Files.  Many times clients will opt to use a vendor supplied bulk upload tool as part of the migration and use OpenMigrate simply to dump the files from Documentum to a file system.  This “two-step” approach doesn’t leverage the full ability of OpenMigrate to complete the migration in one step.  For this post, we will discuss benefits of the one step approach as well as issues with typical bulk load utilities.

Overview of the Documentum Migration – Migration Infrastructure versus “One and Done”

Most clients will look at Documentum migration as a “one and done” effort.  It is a means to an end to either introduce a new Documentum repository or move from Documentum permanently.  We recommend clients look for more of a migration infrastructure that supports more than just one time migration needs.  A migration infrastructure should include:

  • Ability to repeat the process for ongoing migration needs
  • Ability to apply business logic throughout the migration process
  • Ability to configure lots of different migrations quickly and ongoing
  • Ability to address documents/data that failed to migrate consistently across migrations
  • Ability to repeat the process for different data sources

Typical Documentum extract migrations involve extracting large amounts of documents and data from long-term Documentum use.  Some common migration effort considerations, specifically related to a bulk upload approach, include:

  • Limitations of file systems and migration consultants
  • Issue resolution responsibility during two-step migration
  • Issues with the “dump and load” approach versus delta migrations
  • The impact of database versus API transactions on migration speed and accuracy

The remainder of this post will focus on understanding how a one-step approach differs from a two-step approach for the above considerations.

Limitations of File Systems and Migration Consultants

With a two-step approach, we will often see a home-grown “bulk load” tool available from the target repository vendor.  Keep in mind, this tool is typically geared to pulling in files from a file system where all metadata about the document is stored in the file system or the file name.  As such, it will typically have limited ability to pull in external data from other systems and instead relies on the documents being in the exact right format in the file system.  For clients moving from a robust system like Documentum, storing every attribute into the correct place in the file system or naming the document correctly can be very difficult.  Typically clients will hire consultants to conduct the migration efforts.

Vendors, hoping to get the new repository launched as quickly as possible, will often provide their own consultants familiar with bulk import tool and will customize or configure the bulk import to meet the client needs to pull in metadata from other sources.

The effort to make the Documentum object model fit into a file system is problematic, particularly for systems that have been in production for 10+ years.  Versions, renditions, lifecycles, and custom attributes all need a place to be stored that allows the bulk import tool to repopulate these in the new target repository.  Most of these requests for OpenMigrate have us dump the files into the file system but place all the attributes, other metadata in an excel file in a specific configuration for the bulk load system leverage.  Typically this file can get very complex.  Vendor consultants, familiar with the new tool but with no real experience with Documentum, often times don’t understand the data complexities of experienced Documentum clients.

An additional issue is often simply procuring the necessary temporary space for the dump itself.  As highlighted in an earlier post about a Canon ImageWare migration, the process required a dump and load approach since Canon did not supply the system API.  Due to the extreme size and volume of the files, procuring the necessary space to store the temporary files was difficult.

With an OpenMigrate one-step approach, the metadata, versions, renditions, and lifecycle values are mapped directly from the old Documentum repository to the new system repository.  Migration consultants with Documentum experience can often help identify and correct any object model issues in the new system  based on understanding the old system.

Issue Resolution Responsibility during Two-Step Migrations

One major issue encountered with a two-step approach is problem solving during a migration run.  While we recommend test migration runs for any migration to find and fix any migration issues, large migrations will often encounter document and metadata issues and anomalies that are unexpected.  In a two-step approach, the documents will likely be exported successfully from Documentum but fail during the bulk load, or worse, not fail and the issue will go unnoticed until encountered by a user in the new repository.

Documentum Migration

Responsibility for correcting the issue, particularly if the bulk load job itself fails, can be problematic, as it might require a code change in either the dump process or the bulk load, delaying the migration.

OpenMigrate, when run in a one-step mode, provides an error log where the all failed documents are logged. OpenMigrate supports re-running the migration moving only those documents in the error log.  In this manner, the issue can be quickly addressed by taking any or all of the following actions:

  1. Making a change to the document/metadata in the source Documentum system.
  2. Modifying the OpenMigrate mapping to correct the issue for the failed documents.
  3. Re-running a small job to migrate only the failed documents again, allowing the bulk of the other documents to continue migrating.

Issues with the “Dump and Load” Approach versus Delta Migrations

One item we highlighted during EMC World in reviewing Documentum’s migration utility, DEMA, was the benefits of a delta migration approach.  Too often clients think of a Documentum migration as an activity that starts and ends in a single “blackout” weekend while the system unavailable.  In the two-step process, the first step would be to export all the Documentum documents and then upload those documents to the target system.  The Documentum or new system would be unavailable during this time.

With the delta migration approach, batches of documents are migrated before the blackout weekend to gradually move the documents into the new repository while Documentum is still actively being used.  The blackout weekend activities can be very limited and involve primarily moving any updated or new documents.  This approach has several key benefits over the “dump and load” two-step approach:

  1. Shorter blackout period – typically reduced from days to hours.
  2. Ability to verify and check the documents and performance with the new system hardware before moving any users.
  3. Ability to see traceable progress and address any issues in the migration gradually rather than in a big-bang over one weekend.
  4. Decreased full text search index creation time – for many migrations, creating the full text search index makes up a large percentage of the migration time. In a big bang approach, regardless of how long the migration takes, the complete index will need to be created prior to users using the system. The delta approach allows for the index to be created over time.

OpenMigrate provides several capabilities particularly designed for the delta approach including:

  1. Ability to specify small batches of nodes of documents leveraging standard Documentum DQL queries.  For example, only migrate major versions of SOP documents which have been made Effective since July 1.
  2. Ability to fail documents that already exist in the target system.
  3. Ability to update documents in the new repository that may have been updated after initial migrations.

Migration Speed – Database versus API Transactions

Often times we see bulk load or other migration tools (including DEMA) emphasize that speed in migration is the most important issue related to migration. In one particular quote, DEMA was described as “go[ing] faster by going to the underlying database layer and bypassing the API”.

While we built OpenMigrate to be as fast as possible (utilizing multi-threading functionality and command line enterprise software), we also understand the benefits of leveraging delta migrations to improve speed while processing safely through the API rather than risk issues with the database.  By processing through the API, the OpenMigrate allows:

  • Delta migrations with API errors indicating failed documents.
  • Ongoing migrations—migration batches that can be executed while the system is in production.
  • Consistency with other import capabilities within the new system.

Summary

With the 100+ clients that have leveraged OpenMigrate for their migration needs, we feel strongly that a one-step approach with a single tool is significantly superior to a two-step approach with different programs and resources for both one-time as well as ongoing migrations.

In talking to clients about migrations, we typically try to leverage a “moving” analogy in that the migration tool (ex: OpenMigrate) is the moving truck and the consultants (ex: TSG) are the movers.  With the moving analogy, the speed of the truck is not as important as the responsibility, accuracy and care of the movers to provide minimal interruption of the client’s lives.