May 5, 2014
The Ingestion Box in the reference architecture is displayed as the smallest box. However, this is the component that integrates with all the available data sources. This tends to be among the most complex and time consuming task, but tends to be relegated to a lower priority which is a big mistake.
One needs to prioritize the data sources that generate maximum value and ensure we can ingest the data into the Big Data platform for subsequent “cool” analytics.
In my experience, it is also extremely important to have a robust User Interface for the ingestion section. Otherwise, there could be a series of manual steps leading to errors and ingestion of “bad” data that will minimize impact of subsequent analytics.
May 5, 2013
This article does an excellent job of detailing the business use cases where Hadoop is useful and where traditional Database Management systems might be more appropriate.
April 29, 2013
While users have access to many tools that assist in performing large scale data analysis tasks, understanding the performance characteristics of their parallel computations, such as MapReduce jobs, remains difﬁcult. Step #1 is to create a test suite that you can reliably run after every change.
April 12, 2013
Excellent blog by Guident – http://tinyurl.com/bne6t8k
comparing a Hadoop versus Traditional High Performance Computing. A specific use case of reading large log files is compared and Hadoop is the winner in terms of performance.
What happens if we have access to traditional HPC hardware? Should we use Hadoop on HPC? Check out excellent article by S.Krishnan on this – http://tinyurl.com/cwbzvof . Results are not conclusive, but an interesting read.
Bottom line, it appears to be dependent on the use case? Has anyone done a more detailed comparison?