top of page

Path to Big Data Developer!

  • Writer: tejdeep24
    tejdeep24
  • Jul 8, 2020
  • 2 min read

Updated: Jul 8, 2020

In this blog I'm going to explain 'How to become a Big Data Developer?.'. Many people will be in dilemma at the initial stage of their big data learning path. Hopefully after reading this blog your doubts would be clarified.

Pre-requisites - Java, Collections in Java, Java Coding Exercises


Stages of execution in Big Data

  1. Data Storage - Store Structured, Semi-Structured and Unstructured data.

  2. Data Integration - Integrate data from multiple sources.

  3. Data Analysis - Analyze the data to gain greater insight into the trends for business decision.

  4. Data Visualization - Graphically presenting the extracted information from analysis for better understanding.

  5. Data Product - Develop services and apps based on the data captured, stored and findings from analysis.

Note: In MapReduce (the processing framework of Hadoop) all the programs has to be written in Java and it's a drawback for data analysts. To overcome this drawback Hive (Face book) and Pig (Yahoo) were developed.


Hive

  • Data ware house solution build on top of Hadoop. It projects a table like structure to the data already stored in HDFS and provides a SQL dialect (a particular version of a programming language) to query HDFS data.

  • It is suitable for data ware house applications where static data is analyzed.

Pig

  • Pig is a platform for analyzing huge datasets. It consists of a data flow language called Pig latin and a run time engine for executing the data flows in parallel. It also has built in functions for data transformations.

Sqoop

  • It is a integration tool that is used to migrate data from external or internal SQL systems to HDFS and vice-versa.

HBase

  • No SQL

  • Open source distributed software. It was the outcome of the project called Big Table (Distributed Database) in Google.

  • Not suitable to store a file more than 2 GB.

Oozie

  • To integrate the Hadoop eco systems and to schedule the jobs.

  • Automation framework which integrates Hadoop ecosystems.

Comments


Contact Me

Tejdeep Pasupulati

Big Data Developer

Phone:

+91-8971739408

 

Email:

pasupulatitejdeep@gmail.com

Thanks for submitting!

© 2020 By Tejdeep Pasupulati

bottom of page