The agenda for Momentum, the Documentum portion of EMC World, has recently been posted (http://www.emcworld.com/images/Momentum_2011_Agenda-master4webpage.pdf). For this post, we thought we would highlight things we (TSG) or our typical clients attend. Overall, EMC has divided the sessions into “tracks” focused on Architecture/Best Practices, ECM, Software Developer, Information Governance, Case Management, User Group, Capture and Labs. Continue reading
Often times Documentum users, frustrated with Webtop Search or Advanced Search will request “Can we just have a Google Search?”. This post will provide input to Documentum developers on how to best address this ongoing request. While this post is specifically focused on Documentum developers, lessons learned about interface design apply to our Alfresco and SharePoint readers as well.
We recently installed xPlore 1.0 SP1 in our environment in order to support a client installation, and we thought we’d share our thoughts on the Documentum xPlore Administrator application.
Way back at Momentum 2001 in Chicago, I remember having an in-depth conversation with a Documentum architect in integrating Autonomy into the Documentum platform. TSG was implementing Autonomy at the time and Documentum was looking to build a pluggable architecture into Documentum in which any search engine could be integrated. The 5.3 platform helped usher in that pluggable architecture with the replacement of Verity (now owned by Autonomy) with FAST. 9 years later at EMC World 2010, Documentum is getting closer to releasing Documentum Search Services, which is essentially an integration between Lucene and xDB.
Ed Buche and Aamir Farooq both presented at EMC World, providing a good technical overview of DSS and lessons learned from how FAST currently interacts with the Content Server. I’ve always looked forward to Ed Buche’s presentations, and glad he has been very involved in the architecture of DSS. A couple of items to highlight:
Using an XML database like xDB in conjunction with Lucene makes a lot of sense in regards to performance and scalability. All metadata for content is being converted to an XML file and stored within xDB. This is very similar to how FAST ingests metadata today. However, with DSS, an XML representation of the ACL will also be created and stored in xDB, allowing security to be evaluated by the search engine, not at the Documentum level. Replication of ACLs from the Content Server to DSS will be asynchronous, not necessarily transaction based.
A new full text admin interface will also be available, providing much more detailed reports on indexing status, errors, graphs, etc.
Performance and Scalability
Queries that may have taken minutes in FAST, will take seconds in DSS. Documentum has taken a number of lessons learned from the FAST integration and has addressed a number of performance issues that have caused angst in the past. Querying inside folders with a large number of subfolders has been optimized. Additionally, underprivileged users belonging to a small subset of content but searching a wide range of content should see a significant increase in performance. This is a specific issue we’ve run into with our clients and looking forward to comparing the performance difference.
Facets provide the ability to display your search results and drill down further by a set of pre-defined categories. If you have a large results set, you can further drill down by date, format, etc. to refine your search. CenterStage will support this out of the box. I will be curious how or if this will be integrated into Webtop Search Results or how custom search applications will be able to make use of the capability.
Cost / Upgrading to DSS
DSS will remain part of the Content Server and will not be licensed separately.
Microsoft/FAST and Documentum have agreed on extended support for customers until the end of 2011. Therefore, customers making use of full text indexing must upgrade to at least 6.5 SP2 and migrate to DSS by then. DSS will become standard starting with the D6.7 Release. This may be a key driver for customers to start planning you upgrades based on the 2011 date.
Customers who are currently deployed on 6.5 SP2 or later will be able to upgrade to DSS. To evaluate and test DSS compared to FAST, a new docbase may be created using DSS. Both FAST and DSS can therefore be running at the same time and provide a seamless transition from one search platform to another.
Over the past few months, TSG has been able to sit in on the DSS Beta program as a read only participant. The DSS beta completed last week, and it’s been valuable to hear how the beta has progressed. From what we’ve heard, the DSS software was successful for the customers participating in the beta. One item to note however, is that DSS will not be generally available in the June/July 2010 time frame as originally thought. A targeted DSS Controlled Release planned for June through September 2010 will precede the generally available release. EMC will be selecting participants for the controlled released program starting this month.
DSS uses Apache Lucene as a full text search engine. For Documentum customers that are looking to leverage Lucene for full text search now, see the following articles:
Back in 2007, we started offering a presentation for clients, “What’s Next for Documentum?” The presentation focused on what mature Documentum clients are doing “next”. Typically, mature customers:
- have an existing production install of Documentum
- have implemented tools “out of the box” with some or extensive customizations
- have had some success
- ….and are looking for “What do I do Next?”
This post will share new, innovative or just different items we see our clients doing or considering for Documentum as well as links to other posts that depict their choices.
Documentum Upgrade – When?
When to upgrade is a difficult decision for many clients. Many existing Documentum clients are on paid support (version 5.3 or before) this year given reduced budgets, the cost of the upgrade, the effort of the upgrade combined with the difficulty of upgrading their application that either is unsupported or would require re-write to leverage the new Documentum interface.
For an overall upgrade understanding – try our Documentum Upgrade Planning Guide.
Look throughout blog.tsgrp.com for many posts including upgrade alternatives, extends versus modifies in 6.5, understanding the impact of WDK development, migration, clone or in-place upgrade, high volume server, common upgrade questions, and upgrading your application now to make upgrading Documentum easier later
External Users – Extending capabilities beyond the firewall
We have seen a push to add additional third parties or external users to Documentum. Typically this would be a related party responsible for creating of Documents. Interfaces are somewhat different in that, while they can create/approve documents, the interface should limit the third party from only seeing documents or completing a limited set of functions.
SharePoint – 2010 increases pressure on Documentum
SharePoint continues to be a major influence at the bulk of our clients. SharePoint users can be very vocal in their desire to “dump Documentum”. Clients struggle with wanting to satisfy SharePoint users but understand that a typical SharePoint environment focuses on collaboration with collaboration being very different from the current Documentum installation. Providing the scale, image and records management and maintaining indexing consistency are all requirements of most ECM systems and something that SharePoint 2007 can’t always accomplish. As Microsoft releases SharePoint 2010 with more robust ECM features, we would anticipate that this pressure would increase.
Look throughout blog.tsgrp.com for many posts including SharePoint Myths, SharePoint – Adding ECM Structure and Active Wizard/SharePoint Integration. Also, look for OpenMigrate to have continued support for a SharePoint source as well as the upcoming release to support an SharePoint target.
Documentum Software Audit – Managing Licenses
While this is not new in 2010, we but have steadily seen an increase in EMC Software Audits since 2005 with a big increase in the 4th quarter of 2009. Clients are getting better at documenting system usage and working with their sales reps to actively manage their license and maintenance costs.
Consumer Interface – Continued Focus
Multiple clients are continuing to look to push content out of Documentum for consumers. Whether tied to fault tolerance, performance, upgrade prep or licensing, many have developed strategies to push released content either with SCS or OpenMigrate to an external DataBase and file store.
Non Webtop Interfaces – Less Training – better Performance
Similar to the consumer interface, we are seeing more “non-Webtop” interfaces begin to proliferate at clients. Initially this was focused on image, workflow or improving Documentum search within Webtop. Recently, typical Webtop installations are looking to move certain users, like the consumers mentioned above, off of Documentum Webtop. This could involve just approvers (both internal and external) or offering a simpler author interface.
Lucene – FAST/Verity Replacement and DSS
Many clients are tired of “waiting for Documentum Search Services (DSS)” and have begun to deploy Lucene internally for either a cached consumer repository or Documentum itself.
Look for a post here on one client’s comparison of FAST to Lucene given their content and search scenarios.
CenterStage, Cloud, CMIS, Fatwire – what do these mean?
We have seen clients considering CenterStage, utilizing the cloud (ex: Amazon EC2), the upcoming CMIS specification as well as concern about recent Fatwire alliance. Most of our clients are taking a “wait and see” approach given the economy and some concern about being early adopters rather than followers. We’ve seen this same approach with the upgrade decision as well.
Other “what’s next” items include of Adobe Flex to create a better user interface, form and workflow enhancements, browser based annotation services and other items that will be posted here In the coming months . If you are looking for our 2007 “What’s Next” presentation, a Screencam with sound is still available on our site.
Since the new Documentum Search Services beta program just started last week, we thought we would share some of TSG’s thoughts on full-text search and our plans to add Lucene capabilities to our open source offerings.
Documentum Search Services (DSS), was tentatively called Enterprise Search Services (ESS) early in the product development. DSS promises to be “the next generation of search in EMC and will be built upon xDB with Apache Lucene as the underlying indices”. Specific highlights from EMC World included:
- Relevance Sorting
- Advanced Query Processing
- Parallel, Native Facet computation, Xquery for structured and unstructured search
- Lower Hardware and Storage Costs
- Native VMWare, NAS, SAN support and Advanced Data Placement
At the present time, DSS is targeted for heavy testing through the end of 2009 with a release in 2010.
TSG Thoughts on DSS
At the present time, we are very encouraged with the progress and the direction of DSS. We have been using Lucene for a couple of clients and can safely say that the tool will address many of the shortcomings of FAST including index rebuild, overall performance and server requirements. That being said, the scope of DSS needs to encompass all of the Documentum API level functionality that FAST or Verity have addressed in the past. As the beta progresses, truly the “devil is in the details” of how DSS evolves so we will with hold our final thoughts until the beta is complete.
Other Tools (Autonomy, Google Appliance, SearchBlox, Vivisimo….)
As an integrator, we do get asked to integrate in different search tools. We began working with Autonomy for EMC on an internal Documentum project (pre-Documentum purchase) back in the late 90’s. Overall, most search tools meet full-text needs but are typically built as “crawlers” focused on the web. As a crawler, the tool needs to scan a directory/website for changes and then update the full text index. We have found this approach difficult when Documentum clients want to do true “Documentum Searches” of combining attribute, security and full text. For example – one client wanted to search on secure documents a certain plant (attribute), create date (attribute) and containing this part-number (full-text).
Also, a couple of clients have had concerns in regards to latency of when a document is stored in Documentum and indexed (after the crawler runs) in the full-text search engine. One client complained that with FAST, sometimes the latency was 2 minutes and other times it was 2 hours.
Our last concern with the crawler approach is how to get the index data and security added to the index to avoid having to run the query against Documentum (plant, create date, security), against full-text (part-number) and then only displaying the results that are on both lists.
Native Lucene with Documentum?
One scenario we are building out for clients is a Documentum 5.3 or 6.5 application that indexes documents into Lucene from either Documentum or a cached copy (whitepaper here). To differentiate from DSS, our approach won’t provide support for inline DQL but rather a pure web services approach.
In the diagram below, both OpenMigrate and HPI use OpenContent web services to communicate with Lucene. OpenMigrate is used to keep the Lucene index up to date, and HPI is used to query the index for full text searches and optionally metadata searches as well:
A couple of key factors:
- 5.3 Support – we are focused on supporting both 5.3, 6.0, 6.5 and future releases. Many of our clients have chosen to delay their upgrades due to variety of reasons. By implementing Lucene now, clients can remove FAST in their current environment and from an eventual D6.5 upgrade.
- Attributes – we are focused on storing both the content, attributes and security in Lucene to avoid having to search both the Documentum attributes and the Lucene full-text index.
- Indexing – we are leveraging OpenMigrate to index/delete content and meta data to Lucene on a real-time, multi-threaded push basis to avoid a crawler approach. We think the push approach can better control updates to the index, reduce server load on the full-text index and improve audit control to insure everything is indexed.
- Security – One issue we addressed was how to manage security concerns versus high-performance search. Verifying that the user has access to browse each document retrieved from the search (Documentum lookup) is expensive and would hurt performance as identified in the crawler discussion above. One approach was to cache document ACL information with each document in Lucene and update as ACL’s are updated. Since Documentum ACL’s don’t change often, we would leverage one lookup to retrieve the users ACL access and add that information to the Lucene query.
So far our results have been favorable. Please contact us if you are interested in this type of solution as we are looking for additional case studies.