Path to Big Data Developer!
- tejdeep24

- Jul 8, 2020
- 2 min read
Updated: Jul 8, 2020
In this blog I'm going to explain 'How to become a Big Data Developer?.'. Many people will be in dilemma at the initial stage of their big data learning path. Hopefully after reading this blog your doubts would be clarified.

Pre-requisites - Java, Collections in Java, Java Coding Exercises
Stages of execution in Big Data
Data Storage - Store Structured, Semi-Structured and Unstructured data.
Data Integration - Integrate data from multiple sources.
Data Analysis - Analyze the data to gain greater insight into the trends for business decision.
Data Visualization - Graphically presenting the extracted information from analysis for better understanding.
Data Product - Develop services and apps based on the data captured, stored and findings from analysis.
Note: In MapReduce (the processing framework of Hadoop) all the programs has to be written in Java and it's a drawback for data analysts. To overcome this drawback Hive (Face book) and Pig (Yahoo) were developed.
Hive
Data ware house solution build on top of Hadoop. It projects a table like structure to the data already stored in HDFS and provides a SQL dialect (a particular version of a programming language) to query HDFS data.
It is suitable for data ware house applications where static data is analyzed.
Pig
Pig is a platform for analyzing huge datasets. It consists of a data flow language called Pig latin and a run time engine for executing the data flows in parallel. It also has built in functions for data transformations.
Sqoop
It is a integration tool that is used to migrate data from external or internal SQL systems to HDFS and vice-versa.
HBase
No SQL
Open source distributed software. It was the outcome of the project called Big Table (Distributed Database) in Google.
Not suitable to store a file more than 2 GB.
Oozie
To integrate the Hadoop eco systems and to schedule the jobs.
Automation framework which integrates Hadoop ecosystems.
Comments