hbasaea

 requirement for databases:


1.structure manner

2.random access

3.low latency.

4.ACID property

atomic

consistency

lsolation

durability


hbase--runs on top of hadoop.


--distributed

scalable

fault tolerant


----------------------

hbase:

--------------

--structure--loose.

-low latency---using row key.

random access-using row key

some what ACID.



-searching(using row keys)

-processing.


main purpose is seraching.


in rdbms--if no data then column is null-takes space.


hbase--no data then column is not there

column based.


--u can perform CRUD operations:


create

read

update

delete


in hbase.


ACID--at single row ok

but multiple rows ACID is not complaint.


epoch-- time

(unix timestamp)\

no of seconds after 1970


row keys

-----------------------

-unique array.

--all stored as bytes array.


use binary search algorithm..


row key in sorted ascending order.


column family

--------------------

each column familiy data stored separate data

can add new column on fly


columnfamily:columnname = work:department


timestamp:

--------------------

value for version.



CAP Theorum

-------------------

--applies to distributed system.

c-consistency

A-Avalibility

P-Partition tolerance


out of 3 we can get 2..


consistency--each node will have latest value.


availability--system is always give a respone.

there is no gurantee that the value is latest.


partition tolerance--system will continue to operate even when there is network break.


CA---consistency and avilability (RDBMS)\


AP--Availability and Partition tolerance(Cassandra,DynamoDB)


CP--consistency and partition tolerance(HBase,MongoDB)


for distributed system--partition tolerance is must


consistency is prefered-- if result are not lates--give error or time out


chatting application,banking


Availability--is prefered-we need immadiate results.


travel portal--price is always show--even if it does not get the latest price.


Hbase Architecture:

===========================


4 node--4 region server


data divided into regions--

and each region server--holds multiple regions.


each regions holds the data sorted based on row keys.


column familys are stored in separate files.


each portion of table--called regions.


Memstore

----------------

every insert is appened in memory inside the memstore. when the size grows to threshold size--contain of memstore will be flush to disk. this file is called as Hfile.


Hfile is stored in HDFS.


For each region there is memstores for every column family.


Per column family memstore is there.


wal--Write ahead log/HLOG

-----------------------

when u are inserting to memstore--that time u also write to wal(write ahead log)..then its written to memstore.


so if server crash--it will retain from wal.


wal is stored on disk (hdfs)


Block cache:

------------------------

when we read data--it will stored in cache--so that next time we will be reading from cache.


Wal and block cache--one per region server.


Zookeeper:

----------------

it is a cordinating service for various distributed system.


--server will sends heartbeat to zookeper.

have metatable.


Metatable--holding mapping of row keys,regions and region server.


this metatable--present in one of the region server.


hbase--master slave architecture.


hmaster--master and region servers-- slaves.


hmaster 

:

------------------

assign regions to region server.

will do load balancing.


hbase cluster may have one or more master node--but at a time only one hmaster will be active.


hmaster--will do DDL operation.

recovery


HFiles:

--------

sorted key value pair.

--immutable.

--stored data as set of blocks.

--so that based on block indexes--we get to know which block holds the data.


binary serach is applied with in the block to serach for data.

default block size--64kb.


client--talk to zookeeper--for metatable information

Metatable u can stored in cache--

Query the metatable to get region server.


and get the region information.


hbase read:

------------

region server checks in block cache..

if not there then check in memestore.

if not there then check in correspnding hfiles--

block index--will tell--in block--binary serch will get the data.


compactions:

----------------------

combining smalla Hfiles to large Hfiles.

minor--few no hfiles

major--all combines to one large Hfiles.(resorce intensive)


Delete operation:

---------------------

for delete--it will be mark with tombstone markers. and inserted.

when u read this will be return as null.


during compaction this marker will be deleted.



hBase practical:

-------------------------


to connect hbase shell


list

---------------

to list tables.


exit

to come out


create 'table name','column family name','column family name'


atleast one column family


put 'table name','rowkey','column family:column name','values'


to get contain of tables:


scan 'table name'


to get specific row:

get 'table name','rowkey'


it is case sensitive


if table exists or not


exists students


drop 'tab name'


to drop--first we need to disable..

when we disable--content of memstore is flused to disk-- then it can drop


service --status-all

to check all service status.

root has access.


we need to check hbase master and hbase region server


============================


filter:


value filter

2 input parameter

1 operator and value


Qualifier filter:

------------------

column name 


Family filter:

----------------

column family filter


count of records(row key)


count 'table name'


Comments

Popular posts from this blog

scala-4