January 6, 2013
The organizational approach in enterprise content and records management systems has traditionally relied upon the use of document attribute data as the primary information source for content storage and retrieval. Attributes such as business units, document types, expiration dates, functional areas, and similar things are common.
In the past few years content and records management applications have often used a “facetted” taxonomy design to define how the content and/or records are classified.
Is this sufficient? It might be acceptable in some cases for content management, but falls short of expectations for knowledge management.
Large volumes of knowledge content are often well suited to auto-categorization. he tools and methods most commonly used for auto-categorization are text analytics or image analysis. This can be expensive to implement. This makes it challenging for small law offices to benefit from this technology, who also have large volumes of data to contend with.
In coming blogs, I will talk about how to deliver inexpensive solutions to the problem.
December 31, 2012
Anywhere we turn, we read about the shortage of Data Scientists to help us make sense of Big Data. How do we resolve this bottleneck?
As an analogy, look at Content Management Systems. In the late 90’s everybody wanted a website and IT expertise was a bottleneck – Every new piece of content had to be coded by an IT elite. We resolved issue by abstracting the basic needs and made them easy for non-techies.
We need to do this again for Big Data. Industry is crying out for a solution.
December 9, 2012
Check out http://www.kiji.org/features/
With Kiji , we can use HBase as a real-time data persistence and serving layer for applications
December 9, 2012
With growing use of applications using in-memory data grids, the need for the place where data is stored does not need to be fast. It does however need to be fault-tolerant and scalable. HDFS nicely fills this requirement.
Good explanation on details of this at http://tinyurl.com/ag82suv
November 9, 2012
Batch processing jobs still not meeting user expectations after putting Hadoop in the mix? Great article with a good analogy in regards to bottlenecks while grocery shopping
Corona divides the job tracker’s responsibilities in two. First, a new manager manages cluster resources and keeps an eye on what’s available in that cluster. At the same time, Corona creates a dedicated job tracker for each job, which means the job tracker no longer has to be tied to the cluster. With Corona, smaller jobs can be processed right on the requester’s own machine.
Will this help improve overall throughput? Looking forward to giving this a dry-run.
October 2, 2012
Here are some of the competitors for managing Hadoop clusters