Key concepts of Big Data

What is Big Data?

  • The arena of Big Data provides means and methods to analyze, logically extract information, and deal with large & complex data sets by traditional data-processing application software.
  • Big data analytics tends to organized, extract & analyze facts from large volumes of data that is too large to be checked manually by human beings using pen & paper.
  • US computer scientist and entrepreneur John R. Mashey popularized the term Big Data in 1990.

Key Concept

  • Big data was initially linked with three key concepts: volume, variety, and velocity.
  • The following 10-Vs are currently most popularly associated with Big Data.
  1. Velocity: The Speed at which data is being created and transferred to the destination.
  2. Volume: Quantity of collected and stored data.
  3. Variety: Data structured and unstructured in different forms.
  4. Variability: Dynamic evolving behavior in Data source
  5. Value: Business value derived using data
  6. Veracity: Quality or trustworthiness of the data
  7. Validity: Accuracy or correctness of the data used to extract results in the form of information.
  8. Virality: The rate at which the data is spread by a user and received by different users.
  9. Volatility: Duration of the usefulness of data
  10. Visualization: Representation of data to trigger a decision.
  • The American software engineer, businessman, and Google CEO (2001-2011) explain the Data Volume In The Era Of Data Centers as “There were 5 exabytes of information created from the start of modernization till 2003. Now in the era of data centers, big data & digital technologies – 5 exabytes of information are created every 2 days. “

Big Data Analytics Broad Description

  • Big data analytics consists of capturing data, data storage, data analysis, search, visualization, querying & updating data & using AI software to do automatic analysis.
  • The analysis of big data sets is consumed to find correlations, historical trends, find unusual data anomalies & use this information to take corrective actions.
  • The Big data is now collected using Industry 4.0 technologies including IOT sensors, Smart Phones, aerial drones (remote sensing), cameras, radio-frequency identification (RFID) sensors, and wireless sensor networks.

Main Difference from Industry 3.0 Viz-A-Vis Real-Time Data Gathering

  • In Industry 3.0 data was gathered by cables. Data has totally changed including the following in Industry 4.0.
  • The Data gathering by cables sensors as well as Wi-Fi sensors
  • Data may be given in real-time to any data center in the world at light speed
  • This data may be checked using big data analytics algorithms in a data center
  • AI software real-time diagnostics to do self-correction or suggest 2-3 options to the engineers for solving the problem


  • Big data has fast grown to fulfill the demand of data management specialists such enough in order that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP, and Dell have utilized quite $15 billion on software firms specializing in data management and analytics.
  • This industry was growing at approximately 10 percent a year: about twice as fast because of the software business. Developed economies increasingly use data-intensive technologies. World wide mobile phone subscription is 4.6 billion and between 1 billion and a couple of billion people accessing the web.
  • One billion people worldwide entered the center class between 1990 and 2005, which suggests more people became more literate, which successively led to information growth. The capability to interchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 2007, and predictions kept the internet traffic increase at 667 exabytes annually by 2014 worldwide.
  • The information one-third of the worlds wide stored is within the sort of alphanumeric text and still image data consistent with one estimate, which is that the format most useful for many big data applications. This described the potential of yet unused data (i.e. Within the sort of video and audio content).
  • Many vendors while offering off-the-shelf solutions for giant data, experts recommend the event of in-house solutions custom-tailored to unravel the corporate’s problem at hand if the company has sufficient technical capabilities.

How did Data storage start?

  • Data storage began with vinyl records storing songs in 1880. The vinyl record could keep 200 MB of data – but could not rewrite the data & was for one-time use only.
  • The first magnetic tape data storage came in 1947. Data could be rewritten multiple times on the same magnetic tape as the tape could store 60 MB of data.
  • The first hard disk drive reached in 1957 was named IBM 305 RAMAC. This may keep and store 4 MB data and weighed 900 K.G. The hard drive may write & rewrite data in real-time.
  • The first solid-state disk drive arrived in 1991 with a 20 MB data storage capacity. This drive was no moving parts & had only electronic circuits to write & rewrite data endless times.
  • The biggest available today hard disk drive capacity is 16, 000 GB & the biggest solid-state disk drive available is 8, 000 GB capacity. The size of there is 3.5 inch * 2 inch & weight is only 500 grams (Half K.G)
  • The capacity of 20 MB in 1991 versus 16, 000 GB in 2020 – is only 0.12 % of data storage capacity.
  • Due to such huge data storage capacities are available today – high-speed internet, YouTube videos, Industry 4.0, AI & machine learning technologies have become possible.

Google Data Centers

Google has this 15 data centers globally with the following features;

  • Each data center uses 200 MW of electrical power.
  • Every data center covers 500 acres of covered buildings
  • Biggest installation of air conditioning for cooling as servers produce a lot of heat
  • Installation of chillers for air conditioning, cooling towers, heat exchangers, water pumps, RO plants – all these equipment is connected to Google’s own machine learning systems to optimize the use of all these utilities
  • The UPS systems of 20-50 MW for backup electrical power.

Facebook & big data Technology​

  • Facebook already utilized the most advanced technologies viz-a-vis Machine Learning, AI, Advance Software Algorithms & Industry 4.0 Technologies.
  • Facebook has 12 data centers globally which are as big as Google data centers.
  • The Facebook machine learning software operates at light speed (terabits/second) & “learns” the background of every user in the world including his age, profession, likes he does on Facebook, city of the living, types of friends he has, types of pages he is following.
  • Facebook even “knows” from which city each user is using Facebook and using big data analytics knows which person stays in which area and which person travels to foreign countries.
  • Within a timeline of 3 months, Facebook AI software has learned the basic habits of every person including likes & dislikes, types & back ground of friends, pages each person follows, chats & subjects of chats of each person.
  • Based on this massive data – Facebook’s big data analytics & machine learning software gives tailor-made individualized suggestions designed for each individual separately – including types of friends, new pages & products.
  • Such technology is impossible to deploy by using manual sheets of paper, telephone lines, what’s the app, SMS, excel worksheets.