Extending Alfresco Security Beyond ACLs

Recently TSG worked with an Alfresco client that had unique requirements around document and folder security.  In addition to traditional ACL security, the client needed the ability to tag documents with metadata that would deny users access to content unless they were members of a particular group.  This post will focus on our approach for addressing the security requirements in Alfresco.

Continue reading

Hadoop – OpenContent/HPI Product Plans

The first step in supporting all of the TSG products on Hadoop is building our OpenContent REST Web Services layer to access Hadoop in the same manner we access Documentum, Alfresco and other content management systems.  This post will present our plans and timelines for OpenContent along with associated TSG solutions.

Continue reading

Auto-Numbering Content in Alfresco

To continue our series of posts highlighting common design patterns that we see in many Alfresco implementations (see previous posts on Auto-Filing and Data List-Drive Value Assistance, this post will focus on another Alfresco module package (AMP) that TSG has developed that provides a configurable way to automatically number content with a unique numbering sequence upon creation.

Continue reading

Alfresco Data List-Driven Value Assistance

Although content modeling in Alfresco is very flexible and configurable, one of the issues that we run into when setting up repositories for clients is the lack of the ability for business users to manage value assistance lists.  Value assistance is a term that we use to describe the lists of options that show up in dropdown and multi-select controls in the user interface on the edit properties and search screens.

This post will describe a module that we’ve developed to utilize the data list functionality available in the Alfresco Share interface to allow business users to manage value assistance lists without any coding, XML configuration, or server restarts.

Continue reading

Documentum to Portal Consistency Checker – Proof of Concept

TSG has several clients using Documentum as a repository and a custom front end application for consumption of the records or renditions of records.  In most cases there is a mechanism in place such as SCS (Site Caching Services) or TSG’s OpenMigrate PUMA (See CIS Case Study for more details).  While a typical Documentum application (ex: Webtop) provides a “one stop shop” for authors and approvers, the interface can be challenging when “consumers” are just looking for quick search and retrieval.  This solution provides improved performance, business continuity, and ability to add documents from other systems.  One potential risk to using a cache of documents and metadata for search and retrieval is the integrity of data.  Publishing techniques are designed to accurately cache records; however there are uncontrollable circumstances that may result in a mismatch.   Continue reading

Documentum Transformation Services (DTS) – Alternative Approaches with Adobe LiveCycle and OpenOffice

Since the very first Momentum (1996 in a very windy Miami), the Documentum user community has pushed for a more reliable means to convert mostly Microsoft office documents into PDF.  Back then, during a wrap-up luncheon, the feedback on AutoRender ( a previous incarnation of DTS) was anything but positive.  Similar to some complaints today, some of the main complaints included:

  • Having to monitor/reboot the AutoRender Server throughout the day
  • Unreliable PDF Transformation included:
    • Unsupported Document Types
    • Font Replacement
    • Broken links

At the time, Documentum threw some engineering effort into AutoRender to address some of the shortcomings.  One of the changes was to have AutoRender reboot itself (not really a fix but it did address some of the shortcomings).   Like other products from Documentum, TSG is occasionally asked for alternatives.  This post will address some of the tools we use in non-Documentum environments that could easily be adapted to the PDF rendition needs for Documentum.

Adobe LiveCycle

For a couple of our non-Documentum customers, we have leveraged the Adobe LiveCycle component PDF Generator. We have been very impressed with their reliability and functionality. Considering Adobe created the best known implementation of Portable Document Format, it makes sense to rely on Adobe technology to convert your native content.

Continue reading

Documentum Full Text Search with Lucene – Honoring ACL Security

The last post discussed the results of an HPI Lucene Search test compared to a Webtop FAST Search as part of a proof of concept for a client looking to provide a consumer interface.  As we have often mentioned on this forum, we continually see clients looking for a better search interface than Webtop, as well as some content cached outside of Documentum for business continuity, performance, and licensing.

One accurate comment raised by the post was that our comparison of HPI/Lucene against a Webtop/FAST search wasn’t really comparing apples to apples as the Webtop search was running against Documentum with security, while the Lucene search was not.  While the client’s goals were to show the benefits of the cached repository and Lucene against Documentum, many Documentum users would like to know how Lucene would perform directly against a Documentum repository (as with upcoming DSS).

For this post, we will discuss TSG’s strategy and initial proof of concept results in leveraging Lucene for a Documentum full text search engine.

Continue reading

Documentum Search – Lucene versus FAST

As mentioned in a previous article, many clients are moving to away from FAST in preparation for the eventual release of Documentum Search Services (DSS) slated for release in June that leverages the open source product, Apache Lucene.  This post will share the results from one client that executed a proof of concept test to compare the two search engines.

Proof of Concept Approach – As we have mentioned before, many clients have decided to implement an external cache outside of Documentum to address business continuity, performance and licensing issues.   For a large pharmaceutical client, TSG was tasked with performing a proof of concept on 156,000 documents in an external data source indexed by Lucene.  The proof of concept would compare search results of FAST within Documentum (Webtop) and Lucene (HPI) outside of Documentum in regards to search results.  The proof of concept additionally evaluated leveraging Lucene for metadata storage rather than storing in another database such as Oracle.

POC Findings – Lucene/HPI and the external repository was found to be considerably quicker that the existing FAST/Webtop implementation on most queries.  

Specific results:

Query

FAST/Webtop

Lucene/HPI

1200 Results 90 seconds 3 seconds
8 Results 5 seconds 3 seconds
10 Results 8 seconds 4 seconds
76 Results 10 seconds 5 seconds
5100 Results 72 seconds 5 seconds
65 Results 6 seconds 3 seconds

 Simple configuration of the Lucene index did a better job of returning a more complete search result set than the standard FAST/webtop configuration.  Examples included additional documents that were logical derivatives of the initial search word. For example – a search for “exception report” could return “exceptions report” or “exception reports”. The proof of concept data set also included German documents and Lucene demonstrated multilingual stemming capability.

Key Stats – Lucene

  • 156,000 Documents – 31.6 Gigabytes
  • Total Index Space – 521 MB
  • Total Index Build Time – 10 hours – The client was very interested in the time it took to index the content and metadata in Lucene because they had experience lengthy indexing times with FAST in their 5.3 upgrade. This was tracked as part of the proof of concept, however, the corresponding FAST data is no longer available from the 5.3 upgrade.

FAST and Lucene – Full Text Syntax Differences

  • FAST
    • “One Two” – will return documents with the exact phrase “One Two” in the document
    • One Two – will return documents with the words One OR Two in the document
    • One+Two – will return documents with the words One OR Two in the document
    • One and Two – will return documents with the words One AND Two in the documen
  • Lucene – Based on the Proof of Concept’s configuration
    • “One Two” – will return documents with the exact phrase “One Two” in the document
    • One Two – will return documents with the words One AND Two in the document
    • One OR Two – will return documents with the words One OR Two in the document
    • One and Two – will return documents with the words One AND Two in the document
    • One+Two  – will return documents with the exact phrase “One Two” in the document

Overall Thoughts

Overall the client was very satisfied with the findings and is moving forward with the solution.  The flexibility of Lucene to index both the metdata and full-text values allowed the client to avoid adding an additional Oracle database to their external cache for attribute storage.  The client also liked the more simple, intuitive search interface of HPI compared to the Webtop interface. 

In addition to leveraging Lucene for searching an external cache, we are also working to leverage Lucene for internal Documentum/Webtop search.

If you have any questions or would like more detailed information, please contact us or comment below:

Thin Client Annotation tool uses Google Web Toolkit

As part of our design and development for our Thin-Client annotation tool, we decided to combine the Google Web Toolkit (GWT) and Flex as the main implementation technologies. GWT is a relatively new set of open source tools that allows you to create and maintain front-end JavaScript applications in Java. This means that the front end code is written in Java that is compiled by GWT into optimized JavaScript that works across all major browsers. Eliminating handwritten JavaScript can greatly simplify front-end coding. However, if the application needs to do something that GWT cannot, JavaScript can be inserted straight into the Java program. A number of widget libraries are available from Google and third parties but if none suit a specific needs (as was the case with our sticky notes); it is quite simple to create a custom widget.

Another advantage of GWT is that it allows debugging in a hosted mode browser, so most changes in the client side code can be viewed by simply refreshing the browser. Several plug-ins are available which allow GWT development in different development environments including Eclipse.

GWT’s layout concept can be the main learning curve for developers unfamiliar with GWT.  Most of the page layout is based on the placement of horizontal and vertical panels. All widgets in a horizontal panel will appear on the page lined up horizontally. If the user wants one of the widgets to be below the rest, that widget will need to be placed in a vertical panel with the previously mentioned horizontal panel. Once the layout concept is grasped, it quickly becomes quite intuitive but it was frustrating at first. The only other issues we ran into came from the fact that GWT does so much internally (JavaScript compiling, RPC calls, etc), this can make debugging more difficult since it can be hard to identify what the actual root of a problem is.

Overall, GWT was a great tool for front-end JAVA development. It eliminated a lot of time that would have to be devoted to JavaScript development. There are some really helpful tutorials on the Google Code website (http://code.google.com/webtoolkit/tutorials/1.6/index.html ) for anyone interested in learning more about development in GWT.

Here is a screenshot of our GWT Annotation tool interface. Be sure to check back often as we will be releasing our annotation demo shortly.

GWT Annotation Interface
Google Web Toolkit Annotation Interface

Documentum Annotations – TSG View of the different solutions

With the upgrade to D6.5, many of our clients are reconsidering their annotation choices.  This blog post will address some of the annotation product choices based on our experience, as well as our internal development efforts on our Free Viewer Tool that is based on a thin client with Adobe Flex and support for viewing and basic annotation capabilities.

Definition –this entry is referring to “annotation” as a mark-up “layer” on top of the document.  Redline changes (like Word track changes) are embedded in the Word file and is not the focus of this entry.

Thick Client or Thin Client

One of the first decision points when choosing an annotation tool is between a thick or thin client.  Early annotation tools required a client side component for client/server capabilities.  With browser-based annotation tools, annotations might rely on either a client side plug in or an applet.  For Documentum, client components are required for Brava (applet), Annodocs and Documentum Annotation Services (Adobe Acrobat).  Snowbound offers versions that don’t require a client component or have an applet based approach.  Our Free Viewer only requires Adobe Flash to be installed on the client.  With a thin client approach, the image (not the entire file) is sent to the client.  This could be a substantial performance improvement when viewing large files.  Also, thin client approach provides for additional security since the file is never passed to the client.

TSG Thoughts – We are usually recommending the thin client to improve performance and security while reducing IT support costs particularly when extending the application to outside third parties.

Native Document Annotations or PDF-only

One approach would be to allow the mark-up layer to view on top of any type of file format.  Snowbound and Brava both support this type of annotation.  Another approach would be to turn everything into PDF and only allow mark-ups on top of the PDF.  This approach is required by Adobe and Annodocs although supported by Brava and Snowbound as well.

TSG Thoughts – Many of our clients have had difficulty with the native document approach not due to fault of the vendor but due to the constantly evolving and backward compatible native file formats.  For our free viewer, we are only supporting PDF or TIFF.

Annotation Capabilities

With all annotation tools, the amount of graphic options (circle, arrow, highlight, underscore….) can confuse the user and blur the line between annotations and redlines.  Also, one major user complaint is that annotations can be buried on subsequent pages and users will have to flip to them to find them.  Annotation tools should highlight/bookmark annotations when viewing the document to avoid having the user flip through every page looking for annotations.

TSG Thoughts – We lean toward simple annotations for basic markup to reduce training costs and markup/review time.

Upgrade/Changing Considerations

It is important to understand that every annotation tool typically stores it’s annotations in a proprietary format making it difficult to change annotation tools.  When changing annotation tools, the existing annotations must be deleted or reformatted.

TSG Thoughts – For our Free Viewer, we have targeted Adobe’s new XFDF for mark-up to be compatible with Adobe as well as Documentum Annotation Services.