Documentum Search – Why the Google appliance just doesn’t cut it

Many experienced Documentum customers have attempted to leverage the Google appliance as an alternative to Documentum Search or as part of an Enterprise Search effort.  One of our clients presented their experience at our user group meeting.  This post will discuss their findings.

What’s Wrong with Google – Recap and Client Experience

One of our most popular posts regarding Documentum Search has been “What to do when users just want a Google Search”.  As we pointed out in the post, it isn’t so much that clients want a Google search as much as they hate Webtop Advanced Search.  In talking with a client that added a Google search, they found the following issues:

You get what you asked for – It is a Google search:

  • Index very time consuming (4-5 hrs) and limited to a nightly run.
  • Display output limited to messy Google-type output.
  • No sorting of search results other than the relevance sort that Google provides.
  • No ability to integrate custom metadata into the search.
  • It’s not possible to audit the index content.  The client found numerous missing documents in the index.
  • Google will only index files under 20 MB.

In addition to dealing with search result issues, the client had the following Site Caching Services (SCS) issues:

  • SCS is Folder driven (rather than document) using linked folders, caused problems if file to be published is not in the folder to be published.
  • Site Caching jobs are very cache intensive.  DBA tuning was attempted but could only run the jobs once per day.
  • SCS did not run in real time  in that, due to cache issues, could not publish jobs more than once a day (since Google could only index once a day as well).

Lucene/Solr, OpenMigrate and HPI

In replacing Search with a combination of HPI and OpenMigrate, our client found that:

HPI Search and Output

  • Search filters can use many data fields in combination with search engine.
  • User Driven – User Display output held in cookie.
  • Sorting is allowed on any data field.
  • Searching is Just as fast as Google.
  • HPI provides the ability to apply a 20,000 search result limit.

Open Migrate Flexibility

  • OpenMigrate can be configured to fit the client’s scope.  This particular client uses the Object Type, Facility, ACL and Modified Date attributes to know when to push a document to the search index.
  • Indexing is near real time.  Changes captured AND indexed in search engine hourly. (Likely 20 minutes once other server loads removed)
  • Ability to audit the indexexists.  Runs nightly to identify misfires, typically due to user processing.  (Rendition Import – EMC bug)
  • File size, no limitation. (Google limited to 20 MB, WT 6.7 – 40 MB)

Summary

The desire of users to get data quicker and easier is something we have promoted for years.  Often times we quote that 70% of users just want to search and print.  Clients can be frustrated by complex Documentum searches (ex:  Webtop, D2) that require the “build a search” model.  While clients might request a “Google” search, as presented in this post, the Google appliance along with Site Caching Services can bring along its own set of problems and issues.

For a screencam comparison of searching with Webtop, D2, xCP and HPI, please see our comparison available in the learning zone.

2 thoughts on “Documentum Search – Why the Google appliance just doesn’t cut it

  1. Thanks for sharing!
    The comments on SCS are recognizable; and I can even add more limitations.

    On Google search, the described use case does not really seem to use the Google Search Appliance capabilties or is based on an old version based on my understanding of GSA and feedback from collegues knowing more about this product.

    -Index very time consuming (4-5 hrs) and limited to a nightly run.
    These are implementation choices and not an enforced standard;

    – Display output limited to messy Google-type output.
    Three scenarios possible, from standard Google style, XSLT processing or custom UI based on XML results. So this offers more ways!

    – No sorting of search results other than the relevance sort that Google provides.
    Is indeed true; though date sorting is also possible. 7.2 release (end of this year) will support custom metadata for searching. So soon not an issue anymore.

    – No ability integrate custom metadata into the search.
    Google is able to integrate Documentum metadata; so this seems to be implementation choices.

    – It’s not possible to audit the index content. The client found numerous missing documents in the index.
    The administrative interface provides all the relevant details to analyze.

    -Google will only index files under 20 MB.
    This applied to older versions.

Comments are closed.