What is the difference between big data, large data set, data stream and streaming data?
Author
Carlos Barge
The Big Data is very big in volume, high at velocity and various types. Traditional applications are not adequate to process such data sets. Data Streaming – is transfer of data at a very high speed but steadily. In big data we refer data streaming to a process where the real-time unstructured data is to be processed. Here is my understanding.
Big Data: “Big data” is a business buzzword used to refer to applications and contexts that produce or consume large data sets.
Data Set: A good definition of a “large data set” is: if you try to process a small data set naively, it will still work. If you try to process a large data set naively, it will take orders of magnitude longer than acceptable (and possibly exhaust your computing resources as well). For instance, one of the basic concepts in “big data” is known as MapReduce – a model of parallel programming where you split up your data set into smaller chunks and then have separate jobs/”workers” processing your data and then piecing it back together. If you have to do this in order to make your application run in acceptable times, you have a large data set. If just one monolithic job can get through your data set in reasonable time, it, it’s not that large. The definition of what are “reasonable” or “acceptable” times, thus, depends on your application’s requirements.
Streaming Data: “Streaming data” is data which keeps coming along even as you process it – the opposite of the “easier” approach where first you wait until you have the whole data set (say, stored inside a file or a database) and you process it. A “data stream” is an abstraction for such a continuously flowing piece of data. It is one thing to run this analytics application over a bunch of prerecorded calls, and quite another thing to hook it up with the “data stream” of all calls that are going on right now. The latter approach will be able to notify you “in real time” as someone says the magic word (say, so you can start listening in on the conversation), but it is much more prone to problems such as: what if the words come through separated in two chunks? What if the data is coming in faster than I can process it? Etc.
Do you want to know how your competitors are doing business?
Tell us a little about yourself below to gain data for free
Gotcha! Do you want to monitor any specific competitor or market?
List of Competitors
- Add competitor…
Your Data is on the Way!
Our data scientists team is working for you by collecting data and we’ll come back to you shortly with a pre-assessment and proposal.