Supersized Documentum Migrations and Upgrades Two Billion Documents and Counting

Two weeks ago we completed several of the largest Documentum migrations and upgrades we’ve ever seen. With short outage windows, we helped plan and support our client’s migration of their Documentum systems from a data center in the southeast US to the Rockies while simultaneously upgrading the repositories from Documentum 6.5 to 6.7 SPx. Altogether the repositories contained over 2 billion documents,  several TB of file server and multiple Centera devices; as well as over 425,000 ActiveWizard forms!

Starting work in February, we conducted preparation runs during March and April.  We scheduled several weekends in May and June to move the repositories. The first migrations were more challenging and we did run into a few issues. While these repositories were smaller, they had more complexity.

Issue #1: The Upgrade Script

One of the first issues occurred during the preparation for the upgrade. (Whenever possible, we recommend clients run full dress rehearsals of migrations and upgrades. No one wants a surprise on cutover weekend!) During the rehearsal on one of the largest repositories containing several hundred million documents, while the 6.7 upgrade script didn’t creep past the 3% mark on the progress bar and pegged the database.

We had foreseen the likelihood of an issue and had turned on the Content Server SQL tracing so we could monitor what was happening at a low level. It was determined that a specific database index generation was taking an inordinate amount of time. We also discovered that many old and unnecessary queue items were sitting in the table and being updated as part of the upgrade. After some work with EMC, the script was modified to make the index generation a secondary task and the client’s team cleaned out the queue items. After these changes the 6.7 upgrade script ran in under an hour on even the largest repository.

Issue #2: Centera

Document files were copied to a new Centera device in the new data center prior to the migration weekend. Early testing showed everything was in place and working fine. Unfortunately, on cutover weekend a simple typo in the Centera configuration led to hours of troubleshooting.

The step was easy enough, when the upgrade script prompted for the CA STORE information, the team entered in the new IP addresses for the Centera device and the new PEA file for Centera pool authorization. The upgrade script then proceeded.

Post-upgrade checkouts’ went smoothly and we were able to create new documents as well as perform expected functions. We did some searching for older documents and saw something was wrong. The older documents were getting errors when we tried to retrieve them from Centera.

Resolving the issue involved two specific parts. First, a new PEA file was generated that ensured Documentum had access to documents that were created in multiple Centera pools. Second, a CA trace on the Content Server identified a nondescript error message about a missing parameter. We’d seen a similar error at another client and this led to again inspecting the Centera configuration in the repository. Upon closer inspection, we found that the PEA file was on a different line than the IP addresses for the Centera box. After adding in the PEA file to the same line and including a ‘?’ to indicate it is a parameter we were able to reinitialize the repositories and access documents from all the Centera pools. Without that PEA file, Centera had just been writing content to its default pool and not telling us.

Issue #3: Webtop

For WebTop users, we issued instructions to clear the browser cache and UCF folders before accessing Webtop 6.7. That preemptive activity prevented several errors we’d seen when users switched from the 6.5 to 6.7 version of Webtop.  Many users also had to update their Java version to work with Webtop 6.7. However, one of the odder and more inconsistent errors proved to be BOF errors. In several cases, Webtop would throw an error the first time a user tried to perform a function saying the BOF code couldn’t be downloaded. Refreshing the browser or logging in and out and trying again succeeded. Sometimes, three times was the charm.

Issue #4: Trust but Verify

Our client had several custom applications.  On cutover weekend we discovered that one application had not been tested in the new environment. The testing team had apparently only signed into the app and performed some very basic actions. It didn’t appear that the server logs had been checked or that any actions had been verified that searched for or modified data.

On cutover weekend, the application was migrated to the new servers, which included an upgrade of WebSphere. At first glance, the migration appeared successful, but as we performed our own post-migration check out tests,  we found errors in the log files and clearly parts of the application didn’t work. After some investigation and questioning of the prior testing, we discovered that the same errors also existed in the test environments. Once we knew the issues were common to all environments it became much less complicated to resolve.

First, we removed some jar files that WebShere had issues with, next we migrated over parts of the application that lived on the Content Server that had not been migrated before. After updating the configurations for the new servers the applications successfully started up and checked out.

Issue #5: Moving Content

Migrating terabytes of content requires some special planning. Our client had already been in the process of replicating content between Centera devices in the old and new data centers. The files were small but still Centera storage space was an issue as well as the CLIP capacity of the devices. Constant monitoring done by EMC and the client team was essential. Even with all the planning, there ended up being a backlog of documents that needed to be copied from the old Centera device to the new one on cut over weekend. The team took some actions to improve the throughput and shortened the number of days it would take to get the documents to the new Centera device.

For the documents not on Centera, the initial plan had been to copy them on cutover weekend or a few days prior to a NAS device that would replicate them to another NAS device in the new data center. A few tests of copying to the NAS showed that for the largest repositories the time needed to copy was just too long. Another approach was needed. After some thinking time, it was decided that the production systems would be updated to write new documents directly to the NAS device. The older documents could then be migrated to Centera or copied from the file system to the NAS device and moved.  This idea proved successful and on cutover weekend there were no issues with copying files for the largest repositories.

Summary

This migration and upgrade had a tight timeline. It had to be completed prior to the end of the company’s fiscal year. It required dedication and focus of the client’s team and a solid migration and upgrade plan.  The effort was worth it. The team and project survived several resource changes and overcame what at first appeared to be daunting issues. One of the biggest success factors for this project was that they did not hesitate to act once a plan was in place. There was no analysis paralysis. If a plan proved unworkable or failed, it was revised and the team tried again.  Congratulations again to the client team on this great accomplishment!