Path to Big Data Developer!

tejdeep24
Jul 8, 2020
2 min read

Updated: Jul 8, 2020

In this blog I'm going to explain 'How to become a Big Data Developer?.'. Many people will be in dilemma at the initial stage of their big data learning path. Hopefully after reading this blog your doubts would be clarified.

Pre-requisites - Java, Collections in Java, Java Coding Exercises

Stages of execution in Big Data

Data Storage - Store Structured, Semi-Structured and Unstructured data.
Data Integration - Integrate data from multiple sources.
Data Analysis - Analyze the data to gain greater insight into the trends for business decision.
Data Visualization - Graphically presenting the extracted information from analysis for better understanding.
Data Product - Develop services and apps based on the data captured, stored and findings from analysis.

Note: In MapReduce (the processing framework of Hadoop) all the programs has to be written in Java and it's a drawback for data analysts. To overcome this drawback Hive (Face book) and Pig (Yahoo) were developed.

Hive

Data ware house solution build on top of Hadoop. It projects a table like structure to the data already stored in HDFS and provides a SQL dialect (a particular version of a programming language) to query HDFS data.
It is suitable for data ware house applications where static data is analyzed.

Pig

Pig is a platform for analyzing huge datasets. It consists of a data flow language called Pig latin and a run time engine for executing the data flows in parallel. It also has built in functions for data transformations.

Sqoop

It is a integration tool that is used to migrate data from external or internal SQL systems to HDFS and vice-versa.

HBase

No SQL
Open source distributed software. It was the outcome of the project called Big Table (Distributed Database) in Google.
Not suitable to store a file more than 2 GB.

Oozie

To integrate the Hadoop eco systems and to schedule the jobs.
Automation framework which integrates Hadoop ecosystems.

Comments