Recently TSG worked with an Alfresco client that had unique requirements around document and folder security. In addition to traditional ACL security, the client needed the ability to tag documents with metadata that would deny users access to content unless they were members of a particular group. This post will focus on our approach for addressing the security requirements in Alfresco.
The first step in supporting all of the TSG products on Hadoop is building our OpenContent REST Web Services layer to access Hadoop in the same manner we access Documentum, Alfresco and other content management systems. This post will present our plans and timelines for OpenContent along with associated TSG solutions.
To continue our series of posts highlighting common design patterns that we see in many Alfresco implementations (see previous posts on Auto-Filing and Data List-Drive Value Assistance, this post will focus on another Alfresco module package (AMP) that TSG has developed that provides a configurable way to automatically number content with a unique numbering sequence upon creation.
Although content modeling in Alfresco is very flexible and configurable, one of the issues that we run into when setting up repositories for clients is the lack of the ability for business users to manage value assistance lists. Value assistance is a term that we use to describe the lists of options that show up in dropdown and multi-select controls in the user interface on the edit properties and search screens.
This post will describe a module that we’ve developed to utilize the data list functionality available in the Alfresco Share interface to allow business users to manage value assistance lists without any coding, XML configuration, or server restarts.
TSG has several clients using Documentum as a repository and a custom front end application for consumption of the records or renditions of records. In most cases there is a mechanism in place such as SCS (Site Caching Services) or TSG’s OpenMigrate PUMA (See CIS Case Study for more details). While a typical Documentum application (ex: Webtop) provides a “one stop shop” for authors and approvers, the interface can be challenging when “consumers” are just looking for quick search and retrieval. This solution provides improved performance, business continuity, and ability to add documents from other systems. One potential risk to using a cache of documents and metadata for search and retrieval is the integrity of data. Publishing techniques are designed to accurately cache records; however there are uncontrollable circumstances that may result in a mismatch. Continue reading
Since the very first Momentum (1996 in a very windy Miami), the Documentum user community has pushed for a more reliable means to convert mostly Microsoft office documents into PDF. Back then, during a wrap-up luncheon, the feedback on AutoRender ( a previous incarnation of DTS) was anything but positive. Similar to some complaints today, some of the main complaints included:
- Having to monitor/reboot the AutoRender Server throughout the day
- Unreliable PDF Transformation included:
- Unsupported Document Types
- Font Replacement
- Broken links
At the time, Documentum threw some engineering effort into AutoRender to address some of the shortcomings. One of the changes was to have AutoRender reboot itself (not really a fix but it did address some of the shortcomings). Like other products from Documentum, TSG is occasionally asked for alternatives. This post will address some of the tools we use in non-Documentum environments that could easily be adapted to the PDF rendition needs for Documentum.
For a couple of our non-Documentum customers, we have leveraged the Adobe LiveCycle component PDF Generator. We have been very impressed with their reliability and functionality. Considering Adobe created the best known implementation of Portable Document Format, it makes sense to rely on Adobe technology to convert your native content.
The last post discussed the results of an HPI Lucene Search test compared to a Webtop FAST Search as part of a proof of concept for a client looking to provide a consumer interface. As we have often mentioned on this forum, we continually see clients looking for a better search interface than Webtop, as well as some content cached outside of Documentum for business continuity, performance, and licensing.
One accurate comment raised by the post was that our comparison of HPI/Lucene against a Webtop/FAST search wasn’t really comparing apples to apples as the Webtop search was running against Documentum with security, while the Lucene search was not. While the client’s goals were to show the benefits of the cached repository and Lucene against Documentum, many Documentum users would like to know how Lucene would perform directly against a Documentum repository (as with upcoming DSS).
For this post, we will discuss TSG’s strategy and initial proof of concept results in leveraging Lucene for a Documentum full text search engine.
As mentioned in a previous article, many clients are moving to away from FAST in preparation for the eventual release of Documentum Search Services (DSS) slated for release in June that leverages the open source product, Apache Lucene. This post will share the results from one client that executed a proof of concept test to compare the two search engines.
Proof of Concept Approach – As we have mentioned before, many clients have decided to implement an external cache outside of Documentum to address business continuity, performance and licensing issues. For a large pharmaceutical client, TSG was tasked with performing a proof of concept on 156,000 documents in an external data source indexed by Lucene. The proof of concept would compare search results of FAST within Documentum (Webtop) and Lucene (HPI) outside of Documentum in regards to search results. The proof of concept additionally evaluated leveraging Lucene for metadata storage rather than storing in another database such as Oracle.
POC Findings – Lucene/HPI and the external repository was found to be considerably quicker that the existing FAST/Webtop implementation on most queries.
|1200 Results||90 seconds||3 seconds|
|8 Results||5 seconds||3 seconds|
|10 Results||8 seconds||4 seconds|
|76 Results||10 seconds||5 seconds|
|5100 Results||72 seconds||5 seconds|
|65 Results||6 seconds||3 seconds|
Simple configuration of the Lucene index did a better job of returning a more complete search result set than the standard FAST/webtop configuration. Examples included additional documents that were logical derivatives of the initial search word. For example – a search for “exception report” could return “exceptions report” or “exception reports”. The proof of concept data set also included German documents and Lucene demonstrated multilingual stemming capability.
Key Stats – Lucene
- 156,000 Documents – 31.6 Gigabytes
- Total Index Space – 521 MB
- Total Index Build Time – 10 hours – The client was very interested in the time it took to index the content and metadata in Lucene because they had experience lengthy indexing times with FAST in their 5.3 upgrade. This was tracked as part of the proof of concept, however, the corresponding FAST data is no longer available from the 5.3 upgrade.
FAST and Lucene – Full Text Syntax Differences
- “One Two” – will return documents with the exact phrase “One Two” in the document
- One Two – will return documents with the words One OR Two in the document
- One+Two – will return documents with the words One OR Two in the document
- One and Two – will return documents with the words One AND Two in the documen
- Lucene – Based on the Proof of Concept’s configuration
- “One Two” – will return documents with the exact phrase “One Two” in the document
- One Two – will return documents with the words One AND Two in the document
- One OR Two – will return documents with the words One OR Two in the document
- One and Two – will return documents with the words One AND Two in the document
- One+Two – will return documents with the exact phrase “One Two” in the document
Overall the client was very satisfied with the findings and is moving forward with the solution. The flexibility of Lucene to index both the metdata and full-text values allowed the client to avoid adding an additional Oracle database to their external cache for attribute storage. The client also liked the more simple, intuitive search interface of HPI compared to the Webtop interface.
In addition to leveraging Lucene for searching an external cache, we are also working to leverage Lucene for internal Documentum/Webtop search.
If you have any questions or would like more detailed information, please contact us or comment below:
Another advantage of GWT is that it allows debugging in a hosted mode browser, so most changes in the client side code can be viewed by simply refreshing the browser. Several plug-ins are available which allow GWT development in different development environments including Eclipse.
Here is a screenshot of our GWT Annotation tool interface. Be sure to check back often as we will be releasing our annotation demo shortly.
With the upgrade to D6.5, many of our clients are reconsidering their annotation choices. This blog post will address some of the annotation product choices based on our experience, as well as our internal development efforts on our Free Viewer Tool that is based on a thin client with Adobe Flex and support for viewing and basic annotation capabilities.
Definition –this entry is referring to “annotation” as a mark-up “layer” on top of the document. Redline changes (like Word track changes) are embedded in the Word file and is not the focus of this entry.
Thick Client or Thin Client
One of the first decision points when choosing an annotation tool is between a thick or thin client. Early annotation tools required a client side component for client/server capabilities. With browser-based annotation tools, annotations might rely on either a client side plug in or an applet. For Documentum, client components are required for Brava (applet), Annodocs and Documentum Annotation Services (Adobe Acrobat). Snowbound offers versions that don’t require a client component or have an applet based approach. Our Free Viewer only requires Adobe Flash to be installed on the client. With a thin client approach, the image (not the entire file) is sent to the client. This could be a substantial performance improvement when viewing large files. Also, thin client approach provides for additional security since the file is never passed to the client.
TSG Thoughts – We are usually recommending the thin client to improve performance and security while reducing IT support costs particularly when extending the application to outside third parties.
Native Document Annotations or PDF-only
One approach would be to allow the mark-up layer to view on top of any type of file format. Snowbound and Brava both support this type of annotation. Another approach would be to turn everything into PDF and only allow mark-ups on top of the PDF. This approach is required by Adobe and Annodocs although supported by Brava and Snowbound as well.
TSG Thoughts – Many of our clients have had difficulty with the native document approach not due to fault of the vendor but due to the constantly evolving and backward compatible native file formats. For our free viewer, we are only supporting PDF or TIFF.
With all annotation tools, the amount of graphic options (circle, arrow, highlight, underscore….) can confuse the user and blur the line between annotations and redlines. Also, one major user complaint is that annotations can be buried on subsequent pages and users will have to flip to them to find them. Annotation tools should highlight/bookmark annotations when viewing the document to avoid having the user flip through every page looking for annotations.
TSG Thoughts – We lean toward simple annotations for basic markup to reduce training costs and markup/review time.
It is important to understand that every annotation tool typically stores it’s annotations in a proprietary format making it difficult to change annotation tools. When changing annotation tools, the existing annotations must be deleted or reformatted.
TSG Thoughts – For our Free Viewer, we have targeted Adobe’s new XFDF for mark-up to be compatible with Adobe as well as Documentum Annotation Services.