Data Lake – Why should we use one?

From the last few years, we have observed a massive growth in the data than we have ever seen. Many organizations find an opportunity from this big data and develop different strategies to monetize it. But the major challenge is “Where to store all the data?”

We have data warehouses that store the data as per the prescribed standards of the organization. That means, when the data is coming, it may be stopped, different cleaning and smoothing operations might be performed and then are stored in the data warehouse. This indeed gives the concern about what to do about the data that won’t be requiring frequently and still different resources on processing that data are utilized.

This is where the “Data Lake” can be introduced.

A Data Lake is a gigantic data repository where the data is stored in its indigenous form. It acts as a centralized repository where the data coming from different sources are stored in its raw form without any cleaning or transformations thereby storing the data in its true form.

So why should one opt for Data Lake?

From the past two years, it has been observed that massive amounts of data are generated and there is a need to address this massive explosion of data. Most of the times, there is a comparison between Data Warehouse and Data Lake, but Data Warehouse consists of different components and stores the data in some standards which can be prescribed in the data transformation processes. The data lake can be thought of as a system that comes before data warehousing.

The term “DataLake” was first coined by James Dixon, CTO of Pentaho in 2012 to contrast with “Data Mart” or “Data Warehouse” which is a smaller repository of refined data extracted from the raw data.

He explained: “If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake and various users of the lake can come to examine, dive in, or take samples.”

Indeed, the Data Lake is not a replacement for Data Warehouse, actually if designed right, it can complement with your existing Data Warehouse and work effectively together. The best part of this integration will be that it can store all formats of the data (Structured, Semi-Structured and Unstructured) that is situated into one place.

(Image Source:

Data Science – Understanding the concept and why it is important?

Over the last decade, there has been a massive growth in both data generated and data retained. These data are retained by companies as well as you and me, isn’t it? Sometimes, we call this as the “Big Data”.

Nowadays, the term “Data Science” is gaining a wide recognition. But what does a data scientist do? Data scientists are the people who make sense out of all the big data and determine what can be done with it in order to increase the productivity.

Let’s understand with an example:

Consider, you are visiting a candy shop, generally a person takes those candies that he likes, in contrast, data scientists are the people who will get all the flavors of the candies and analyze them because they really need to know what each one tastes like. In short, the title “Data Scientist” encompasses different flavors of the work. According to me, that is the major difference between a “Data Scientist”, “Statistician”, “Analyst” or an “Engineer”. A data scientist is one who does little of those tasks done by a statistician, analyst and an engineer.

To be more specific, a data scientist is one who does the following primary tasks:

  1. Data Cleaning
  2. Data Analysis
  3. Statistics
  4. Engineering

Let’s have a look at each of the tasks in brief:

  • Data Cleaning:

The data coming from different sources may contain a lot of noise, might be unformatted and might not be useful for generating valuable insights. This task ensures that all the data is well formatted and also conforms to some set of rules and standards.

  • Data Analysis:

In this task, lots of plots of data are made in order to understand the pattern of the data. Through this process, some theories regarding the data behavior are crafted in a way that will be easy to communicate and easy to act on.

  • Statistics:

A data scientist develops different models by understanding the data patterns through data analysis and develops some strategies based on the understood or developed statistics. But the most challenging aspect of this task is that the models or statistics cannot act as a permanent solution to the defined problem. Therefore, a lot of time is dedicated to this task in which a data scientist may need to evaluate and make some changes in the existing models, as well as going back to the data and bring out new features to help make better models.

  • Engineering:

The above-discussed tasks can just be defined or act as a tip of the iceberg. This is because even if we have state of the art data models for different applications, it doesn’t do anyone much good if the insights are not given to the customers or users and do it consistently. This means building a sort of a data product that can be used by the people who are not data scientists. This can be implemented in many forms like chart visualizations, metrics on a dashboard, or an application.

Understanding all the above tasks of a data scientist, in brief, it can also be understood that a long-term life cycle of a data science project may involve going back and re-analyzing the data models if there is always a new source of data coming in and there is a need to incorporate them.

Analyzing such traits and tasks of a data scientist it can be concluded beyond doubt that how great importance the data science and data scientist may have in the growth of any organization in the era of highest competition and the need of constant improvements in the services of the organizations.

(Image Courtesy:

Research Paper Published on IEEE Xplore Digital Library

Greetings of the day…!

The research paper “Titled: Sensor Data Computing as a Service in Internet of Things” which I had presented at the International Conference on Colossal Data Analysis and Networking held at Indore, India in March 2016 is now made available on the IEEE Xplore Digital Library on 19 September 2016, indeed it feels like the months of perseverance had paid off.

Link to the digital library.

Please review the document, your views and thoughts are welcomed.


The details of the publication are as below:

Title: Sensor Data Computing as a Service in Internet of Things

Author: Karan N Tongay

DOI: 10.1109/CDAN.2016.7570963

Electronic ISBN: 978-1-5090-0669-4

The abstract can also be downloaded from the following link:

Download Abstract

Cognitive Computing – Understanding the Concept

In the today’s digital transformation age, we use the computers for high-end data processing and making calculations. Most of the computers can capture, move, and store the data.

But as of now, they cannot understand what does the data mean.

For example, the computers are well ahead and best for processing applications, but are not able to accomplish some of the most basic tasks such as recognizing water or milk inside a glass or determining a potato or tomato from a basket of vegetables.

The Cognitive Computing actually deals such kind of intelligence among the computers.

The term “Cognitive Computing” was brought into existence by IBM for the machines that can interact and think like humans.

Consider a scenario for banking systems, where fraud detection and security is the most primary concern. Cognitive computing can help such banking companies to identify the risks and frauds before taking critical transactional decisions.

The cognitive computing may include the following components:

1. NLP (Natural Language Processing): It is the ability to understand the human speech and linguistics as they are spoken.

2. Machine Learning: The ability of the computers to learn without being explicitly programmed.

3. Deep Learning: A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions for recognizing patterns.

4. Emotional Intelligence: The ability to identify and manage the situations by understanding the emotions.

5. Big data processing and Analytics: The process of examining large data sets to uncover hidden patterns, unknown correlations, preferences and other useful information.

The trending examples of Cognitive Computing are the personal assistants namely Siri, Cortana and Google Now which are quite easy to use and quickly adapt to our spoken language.

The Google DeepMind and IBM Watson are the leaders in the cognitive computing area.

Cognitive computing cannot be an individual outcome, instead, the role of Big data and Analytics is of primary importance. Big data gives the ability to store enormous amounts of data whereas Analytics gives the ability to predict what is going to happen, succeedingly Cognitive Computing gives the ability to learn from further interactions and recommend best actions.

(Image Source:

Imagineering…! Engineering Imagination

Since the very beginning, imagination is the part of the progress of our lives. It is the power of imagination that has always led to the creation of remarkable structures and theories may it be in the area of computer science, aviation technology, healthcare, understanding people etc. It’s the imagination that has helped to create the number of possibilities to achieve the milestones which once seemed to be difficult to realize.

Most of the times, unreachable imagination seems to be alluring, but such thoughts can be molded to their practical existence with a proper application of knowledge and yes, Engineering!

What if Engineering can be brought about with Imagination?

Circumventing the insignificant and coming to the discussion, I believe, Engineering with Imagination should be the key to achieving higher accomplishments.

In short, Engineering + Imagination leads to “Imagineering”.

Very creative term, isn’t it?

Learning Engineering with a very minimum understanding of how to implement it in practicality would actually not worth more. In fact, understanding comes with positive imagination. Thereby, not only Engineering but “Imagineering” can help achieve more than just Engineering!

The term Imagineering is the most respected and widely practiced in “Walt Disney Inc”, of course, we are closely familiar with this corporation…!


(Image Source: