When it comes to Hadoop everyone these days has a few stories to tell and that would of course include me. Let us start off with some history. Two years Ago, Me: This is the right time to jump on the Hadoop bandwagon and create some offerings. Stakeholders: What is this Hadoop. Me: Usual 15 minute spiel on evolution of Hadoop Stakeholders: Nah. Too cutting edge. Only for the Yahoos and Googles of the world. Cannot be used by traditional IT shops and businesses. Case closed, move on. A year ago, Client: We have this monstrous Teradata warehouse where we are parking all the...
One of the most frequent questions that I get asked by customers and business associates is how can I apply Big Data. The reason that folks ask this question is not because they do not know where it can be applied but they are looking for a way out of the Analysis Paralysis cycle, a shoulder to lean on, someone who will give them a willing ear and help them in the process of charting a meaningful course through their plethora of domains where Big Data can be potentially applied successfully. Having worked with customers for the past over 2 years on Big Data initiatives, I think I have...
Big Data is undoubtedly the latest trend in IT. From enterprises to startups, everyone is keenly following this space. The interesting aspect of Big Data is in its ability to create new opportunities in the ecosystem. In the last few years, there are many startups that sprung up with a Big Data focused product and many of them are successful. The traditional players and the biggies like IBM, Oracle, Microsoft and EMC have opted to acquire a stack from some of these startups rather than building the Big Data offerings in-house. That’s an endorsement for these startups whose focus has been primarily...
Let me start by setting the context – who is a Mumbai Dabbawala? “Dabba” literally means a box and a Dabbawala is a person who carries the box. Everyday, thousands of Mumbaikars (a slang that refers to the residents of Mumbai, the financial capital of India) rely on the Dabbawalas to deliver their lunch boxes carrying the homemade food to their work places. Given the increased cost of living in India and the reluctance to have junk food for everyday meal, many households depend on the network of Dabbawalas. Mumbai is one of the largest cities in the world and an average working...
People always look for convenience! In the early 20th century, retail industry was still in its infancy taking baby steps across Europe and North America. But the latter half of the 20th century saw the emergence of the hypermarket and the supermarket as they truly simplified the all-in-one-stop shopping experience. Retail industry today is big business and will continue to remain so for the foreseeable future. Recent estimates put world-wide retails sales at USD 7.5 trillion. Wal-Mart has been the leader at the global stage since its inception. The world’s top 5 retailers are Wal-Mart (USA),...
Having introduced various components of Hadoop Ecosystem in part 1 and part 2, the last part of this series covers Hive, HBase, Mahout, Sqoop and ZooKeeper. Hive Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis. Using Hadoop was not easy for end users, especially for the ones who were not familiar with MapReduce framework. End users had to write map/reduce programs for simple tasks like getting raw counts or averages. Hive was created to make it possible for analysts with strong SQL skills (but meager Java programming skills)...
In part 1 of this series, we introduced the Hadoop Ecosystem. We will cover HDFS and MapReduce in this part. HDFS When a dataset outgrows the storage capacity of a single physical machine, it becomes necessary to partition it across a number of separate machines. Filesystems that manage the storage across a network of machines are called distributed filesystems. HDFS is designed for storing very large files with write-once-ready-many-times patterns, running on clusters of commodity hardware. HDFS is not a good fit for low-latency data access, when there are lots of small files and for modifications...
We live in the data age! Web has been growing rapidly in size as well as scale during the last 10 years and shows no signs of slowing down. Statistics show that every passing year more data gets generated than all the previous years combined. Moore’s law not only holds true for hardware but for data being generated too. Without wasting time for coining a new phrase for such vast amounts of data, the computing industry decided to just call it, plain and simple, Big Data. More than structured information stored neatly in rows and columns, Big Data actually comes in complex, unstructured formats,...
PALO ALTO, CA–(Marketwire – Feb 29, 2012) – VMware, Inc. (NYSE: VMW), the global leader in virtualization and cloud infrastructure, today announced the availability of Spring Hadoop, the latest addition to the Spring Data family of projects. Spring Hadoop integrates the Spring Framework and the Apache Hadoop platform to make it easy for enterprise developers to build distributed processing solutions with Apache Hadoop. “VMware is committed to helping developers build, deploy, manage and scale the new wave of data-driven applications,” said Adrian Colyer, CTO Cloud...





