After some initial playing around in my home-grown Hadoop-solution, I decided to try out Microsoft’s free-to-download HDInsight solution – for starters in a virtual machine. But before using it, it first has to be installed.
Database installments always have something ‘heavy’ around them, SQL Server being no exception: advanced install ‘wizards’, needing multiple reboots, multiple runs (for different instances), configuration tools and system compatibility checks. My home-grown Hadoop solution was even worse: on my Ubuntu-based installation I even had to compile Hadoop. That was quite a pain (ever tried to install the right version of Oracle Java? There isn’t even a repository available. At least not for my version of Ubuntu. And of course it is, but not at the moment I try to install it). How surprised I was by the ease of installing the default single-node installation in HDInsight! Microsoft’s solution is actually easy: after setting up Windows Server 2008R2 and performing every possible update, just go to this website, from where the Web Platform Installer takes care of everything else.
Or, I should say, seems to take care of everything else :-). Although everything looks fine until now, clicking on ‘Continue’ opens the next screen, in which the HDInsight Dashboard should have been shown. However, I guess that the screen being served up to me was not the HDInsight Dashboard:
The error explanation reads:
‘The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail. (Exception from HRESULT: 0x800736B1)’
Performing a Google search on HDInsight errors resulting in this error message doesn’t yield any results. However, inspection the Windows Application Event Log contains the following information about this event:
‘Activation context generation failed for “c:hadoopwebsitesHadoopDashboardbinsqlceme40.dll”. Dependent Assembly Microsoft.VC90.CRT,processorArchitecture=”amd64″,publicKeyToken=”1fc8b3b9a1e18e3b”,type=”win32″,version=”9.0.30729.4148″ could not be found. Please use sxstrace.exe for detailed diagnosis.’
That makes sense. HDInsight needs version 9.0.30729.4148 of the Visual C++ Redistributable, which is the “Microsoft Visual C++ 2008 Service Pack 1 Redistributable Package ATL Security Update“. Further investigation learns that this redistributable indeed is not installed on my virtual.
Installing the redist does the trick: browsing to http://localhost:8085/ now brings up the desired configuration screen, which is rather nice.
After that, it’s easy to dive straight into HDInsight using the HDInsight Jumpstart Guide, which will guide you bulk import data, running mapreduce jobs, Hive Tables, Sqoop, Flume and using the different Microsoft BI tools in combination with Hadoop.
Bottomline: Installing HDInsight is easy. Getting started with HDInsight is easy too. So, if you’re into Data Warehousing, Business Intelligence, databases in general or just curious about Hadoop, give it a try! (did I already mention it’s free?)