Speaking about Big Data, there is always some obscurity what it is actually about. This article deals with the sources Big Data (BD) systems might use for operations.

The first thing that would be interesting to know about is where the data come from. This is rather vague and difficult to answer, since each BD system is designed to fit the specific needs of a company and therefore uses different input. Consequently, it is better to speak about all potential sources which can broadly be divided into two sections.

On one side, BD systems can integrate non-openly available sources. Most commonly these consist of traditional sources of data which comprises CRM- and ERP databases various ledgers, point of sale data and everything else that is produced by a company, its business partners and the consumers. Additionally, there is always the option of purchasing additional information collected by a third party on a certain issue. Nevertheless, it needs to be clear that this is a borderline case in regards to privacy concerns. In some parts of the world and particularly Europe, buying consumer data has the potential to cause negative public feedback if revealed by the media. This issue is discussed in greater detail in later articles.

On the other side, BD systems might utilize information that are available via the internet. This includes a whole range of data in form of posts, articles, academic works, videos and pictures, sensory data like the weather, individual or aggregated search queries, website visiting times and clickstream data from social media and website publishers, governmental agencies and other business companies, only to name a few.  It can be said that everybody who uses connected technologies produces data, from individuals to large scale corporations and governments, which is why there is such an abundance of data available. However, privacy concern also apply to data in this category.

Several criteria can be utilized to categorize the above mentioned data; the most important one in the context of BD is their structure. Even though touched upon in the previous article this needs to be recaptured here, as it is of central importance for the value of BD. Traditionally, companies relied mainly on structured data in the form of ledgers, CRM- and ERP databases. However, due to the increasing size of available information online and their potential usefulness they received growing attention from businesses around the world. The issue that arises is that normally data from other sources like websites, social media or academics are semi- or completely unstructured, making it burdensome for standard system to decipher the content. Consequently, new systems and particularly BD are and need to be capable of working with these 3 aforementioned types of available data. In our BD case algorithms are specifically designed to make sense of the content and to evaluate the relevance for the question at hand. At this point the 3 dimensions, Volume, Velocity and Variability kick in to find and make sense of the relevant data.

The information pool on the internet growths at an exponential rate and provides 24 000 000 TB of new data on a daily basis (2013). BD Systems are currently the best option on the market to make sense of this overload of information by filtering out relevant and useful ones from the noise that surrounds it. Twitter that averaged about 5700 tweets/second and Facebook with 3472 photo uploads/second produced by themselves an enormous quantity of data (2013). However, BD System can be designed to go further and are not only capable of processing these data streams, but analyzing their content in regards to their context in real-time, thereby getting from data masses to understandable Smart Data that can readily be integrated in business operations.

The next topic revolves first and foremost around different ways of how BD provides utility, in order to give a clearer picture of its application and the employment of all the data discussed in this article.


MIT Sloan Management Review , 2012 – How ‘Big Data’ is different
Forrester Research, Inc. 2013 – Evaluating Big Data Predictive Analytics Solutions
IDC, 2013 – Analysieren, Visualisieren, Vorrausschauen
DTAG – Factsheet Big Data


hifipanda.com edited by Andreas Hoth