Enterprise Content Management – State of the Industry – 2015

TSG is holding our annual client briefing on June 1st and June 2nd.  To kickoff the meeting, I will be conducting a “State of the ECM Industry – 2015” Overview.  This post will present the highlights of that presentation. 

Documentum, FileNet, Hyland, OpenText and other Legacy ECM Vendors fighting off disrupters for 20+ years

Documentum just celebrated it’s 25th birthday.  FileNet, part of IBM, is even older.  Hyland, OpenText and others go back at least 20 years.  Many of these vendors helped shape different components of the ECM community capitalizing on disrupting trends in the industry.

  • FileNet/IBM 1980s – Image processing – Capitalizing on digital scanning, viewing and networked PC’s.
  • OpenText and Documentum/EMC early 1990s – Enterprise Document Management – Capitalizing on image and office documents.
  • Interwoven late 1990s – Web Content Management – Capitalizing on the evolution and maintenance of websites.

Over the 20+ years, many disruptors have emerged around the edges of ECM.  Some obvious disruptors and the trends that they capitalized on include:

  • Alfresco and Nuxeo mid 2000 – disrupting with OpenSource and a subscription (rather than purchase) model
  • SharePoint early-mid 2000 – disrupting with Microsoft Install Base
  • Dropbox and Box.net 2010+ – disrupting with Cloud and Mobile

As we prep our clients, we like to point out key trends that we think will shape the industry as they look to make technology decisions.  The remainder of this post will discuss our thoughts on trends that will shape ECM over the next 5+ years.

Disrupting the ECM Suite – Disruptors with an eye toward SaaS

While these and other legacy ECM vendors fought it out for green field opportunities back in the 1990s and early 2000’s, a significant part of any recent growth has been selling additional products and components to existing customers.  Gartner has always rewarded the market share and a suite approach in their annual “magic quadrant”.  See our thoughts on Gartner from 2014.

OpenText has probably been the most prolific in buying additional products to increase sales to customers with recently buying Brava.  EMC/Documentum has also purchased multiple companies to complete it’s suite including acquiring Captiva (document capture), XHive (XML Database 2007), Document Sciences (publishing) along with the 2013 consulting purchase of Sitrof.

We see less and less of our customers happy with a proprietary suite of products and looking at disrupting alternatives with better functionality, cloud options and significantly disrupting pricing models.   To illustrate, let’s look at the extended suite from Documentum and alternatives/disruptors that exist.

  • Document Capture – Documentum Captiva – Bought by Documentum, Captiva, along with Kofax, were strong players back in the 2000s. Significant competitors like Ephesoft have emerged that have better technology and a better pricing model.
  • Document Publishing – Documentum bought Document Sciences to address high-end publishing requirements. Less and less clients look for these capabilities to be available from the repository, choosing web-based software as a service models.
  • Document Viewing and Navigation – Documentum bought D2 in 2011 to replace an aging Webtop but customers have often chosen alternatives including Cara as well as HPI.
  • Document Search – Documentum has offered xPlore, the full-text search engine, based on Verity, FAST and now with a combination of Lucene/xDB from their purchase of xHive. Most of our clients are choosing an open source approach with Solr.
  • Document Sync and Share – Documentum bought Syncplicity but most clients look to Box and Dropbox as cloud based alternatives that already have traction within their companies.
  • Document Annotation – Documentum previously supported PDF Annotation with PDF Annotation Services but, given changes to Adobe Acrobat, have dropped support for PDF Annotation Services.  Clients typically have looked to Brava, recently acquired by OpenText, but we see continued strong demand for our open source browser-based alternative, OpenAnnotate.

We would anticipate that clients will continue to look for easy alternatives with product approaches.  During our briefing we will be illustrating the following alternatives to typical ECM approaches.

  • Ephesoft – for document and data capture.
  • DocuSign – for signatures of documents by external parties rather than extending their existing repository.
  • Adlib – for document renditions. We have talked with Adlib about a cloud based service to remove rendition services and allow for easier scaling and less maintenance.
  • HotDocs – for dynamic document creation. Available on premise or as a SaaS based model.

Disrupting the ECM Repository – Commodity Pricing and Components, Search Appliance and Big Data Disruptors

We see three major trends that could disrupt the core ECM repository over the next five years:

  1. Document Storage and Memory Prices – All of the ECM vendors were developed with an eye toward the cost component of memory and storage. With prices shrinking, new approaches don’t need to maintain as efficient use of memory or rely on high-priced document storage.
  2. CPU and parallel processing – Google has proved that a farm of commodity priced CPU power with parallel processing capabilities can easily out perform a super computer for document retrieval.
  3. Search Appliance – In the 1990s and 2000s, most of our search tuning was completed by adding indexes to the underlying database component of the ECM tool. With a search appliance approach from Lucene/Solr, most searching (and tuning) now takes place without any database tuning.

Back in the early 1990s, FileNet built out a very proprietary hardware approach with proprietary servers.  Overtime, the approach evolved to drop the proprietary software as the industry supported more UNIX and Windows NT along with Windows on the desktop.  The one component of that equation that hasn’t changed has been Oracle as the database behind the repository.

We are predicting that Big Data approaches will disrupt the repository behind ECM just like it has the repository for Business Intelligence.  As we have pointed out in an earlier post this year, we see Hadoop as the most likely candidate to disrupt ECM as well as the relational database behind ECM systems.

Disrupting ECM – Why Hadoop?

Hadoop is an open-source software framework for distributed storage and distributed processing on clusters of commodity hardware.  Two core components provide some interesting capabilities for content management:

  • Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity hardware, providing very high aggregate bandwidth across the cluster.
  • Hadoop MapReduce – a programming model for large-scale data processing.

All of the modules in Hadoop are designed with a “clustered” assumption that any individual machine or machines failure would be automatically managed by the framework.  Apache Hadoop’s MapReduce and HDFS components originally derived respectively from Google’s MapReduce and Google File System (GFS) papers.

Hadoop has benefited by heavy investment both by services providers as well as corporations.  From the Apache Hadoop Wiki, Facebook and Yahoo have used Hadoop extensively for petabyte installations.  The Wiki also references more than half of the Fortune 50 use Hadoop.

Hadoop, and approaches like it, offer three major advantages over the traditional ECM repository.

  • Parallel Processing – Hadoop is built to take advantage of commodity components and run parallel processing. One of the most difficult components of ECM systems is setting up the clustered database (along with load balancing) behind a robust solution.  Hadoop includes and provides load balancing with minimal cost and effort.
  • Not Only SQL (NoSQL) and Schema on Read – Users have always complained that “why does it take so much time to add one field to the interface?” Traditional SQL databases require “Schema on Write” in that the database system has to know how the data will be used before it is written.  NoSQL approaches, in supporting Big Data, allow for data to be written to the repository easily with the understanding that the accessing application (Schema on Read) will need to understand the context.
  • Storage of Data and Documents – Hadoop leverages a cluster/redundancy approach to duplicate files across servers. This capability has the ability to replace another component of the ECM architecture (and one near and dear for EMC), the Storage Area Network or SAN.  It also significantly reduces the two step backup (database and SAN) approach for ECM.

We have already seen Hadoop being considered as a light-weight replacement for our traditional database approach for publishing content from a ECM repository.

ECM Vendors – Addressing the Disruptors

When it comes to addressing the Suite and Big Data disruptors, each vendor will have their own challenges:

  • Alfresco – Probably positioned the best of any vendor in that Alfresco has just focused on the repository and not pursued a Suite approach. With a consistent engineering group led by John Newton, a robust partner ecosystem developing components of the suite, we would expect Alfresco to address the Big Data disruptors within the existing database architecture sooner than the other Legacy solutions.
  • Documentum – While the Suite disruptors might clamor for Documentum’s install base, they would also be pushed away from having to partner/compete with internal products. We would expect the Suite Disruptors to offer their own integration in a light partnership model.  Documentum may struggle with embracing components of Big Data like Hadoop as they have focused on Project Horizon leveraging xDB, their Xhive purchase back in 2007.  Most of industry has moved beyond XML-based databases since then.
  • FileNet/IBM – IBM has always had multiple repositories offerings. We would anticipate that they would extend a Big Data offering to ECM.  Again, would see component vendors interested in partnering with IBM but concerned about competing.
  • Microsoft – SharePoint has always had a “only Microsoft” back-end approach.  Would not expect any traction with opensource components from the Big Data perspective.  Microsoft does have a vibrant partner community with add-ons for the Suite.
  • OpenText, Hyland, Oracle, Other. We would anticipate that most legacy vendors struggle with replacing the suite components and database components of their offerings.

Summary

We anticipate ECM component vendors as well as components from the Big Data movement to impact the ECM industry over the next few years.  Some key components include:

  • New Component vendors (ex: Ephesoft) disrupting traditional document capture vendors (ex: Captiva) that are part of an overall Suite of ECM products from one vendor.
  • Hadoop like file system approach with a Search Appliance (ex: Lucene/Solr) approach disrupting the relational database component of the typical ECM solution stack.

Let us know your thoughts below.