Big Data As A Technology

According to Pat Geisinger (CEO of VMware), Data “is the new science, Big Data has the answers”. This statement is a clear indication that data is the key to our world today. For critical business decisions, such as marketing and shopping, we used to have to rely on professionals. This was especially based on their previous experience with many problems and how they were able to solve them. They had been subconsciously training their minds to make certain decisions.

The times have changed and now Data-based decisions is being used to make more precise decisions that minimize human error and maximize efficiency in these industries. This concept requires that we know how much data we have to work with. A single Cross Country flight can generate nearly 3 trillion Terabytes data. Surprised!!!. This was not about the amount of data generated in the airline industry. Many industries are working in similar areas and generating huge amounts of data.

Big Data is a term that evolves to describe any large amount of structured, semi-structured or unstructured data that can be mined for data. To extract useful information from such large volumes of data, we employ specific techniques. These techniques must be robust and easily accessible, scalable, and simple. HADOOP is one such framework. The framework is built on the file system HDFS (Hadoop Distributed File System), which combines parallel programming and distributed file system architecture to manage large amounts of data on commodity servers. These techniques are essential for mining crucial information.

HDFS stores files in blocks. To minimize file seek times, these blocks are placed in random places on the servers. Second, duplicates of these blocks are stored to protect information and make it more robust. The metadata for these blocks is stored in Primary Node, while the actual data (in the form of blocks) are stored in different Data nodes distributed across the server. This Primary Name Node acts as a master to all Data Nodes and is also known as a master-slave architecture. It will torture the data, and then confess everything. The quote perfectly embodies all of the points made in the previous paragraphs. It is no wonder that it has been called the Hotcake by IT professionals, and will continue to be so for many decades.