hbasaea
requirement for databases:
1.structure manner
2.random access
3.low latency.
4.ACID property
atomic
consistency
lsolation
durability
hbase--runs on top of hadoop.
--distributed
scalable
fault tolerant
----------------------
hbase:
--------------
--structure--loose.
-low latency---using row key.
random access-using row key
some what ACID.
-searching(using row keys)
-processing.
main purpose is seraching.
in rdbms--if no data then column is null-takes space.
hbase--no data then column is not there
column based.
--u can perform CRUD operations:
create
read
update
delete
in hbase.
ACID--at single row ok
but multiple rows ACID is not complaint.
epoch-- time
(unix timestamp)\
no of seconds after 1970
row keys
-----------------------
-unique array.
--all stored as bytes array.
use binary search algorithm..
row key in sorted ascending order.
column family
--------------------
each column familiy data stored separate data
can add new column on fly
columnfamily:columnname = work:department
timestamp:
--------------------
value for version.
CAP Theorum
-------------------
--applies to distributed system.
c-consistency
A-Avalibility
P-Partition tolerance
out of 3 we can get 2..
consistency--each node will have latest value.
availability--system is always give a respone.
there is no gurantee that the value is latest.
partition tolerance--system will continue to operate even when there is network break.
CA---consistency and avilability (RDBMS)\
AP--Availability and Partition tolerance(Cassandra,DynamoDB)
CP--consistency and partition tolerance(HBase,MongoDB)
for distributed system--partition tolerance is must
consistency is prefered-- if result are not lates--give error or time out
chatting application,banking
Availability--is prefered-we need immadiate results.
travel portal--price is always show--even if it does not get the latest price.
Hbase Architecture:
===========================
4 node--4 region server
data divided into regions--
and each region server--holds multiple regions.
each regions holds the data sorted based on row keys.
column familys are stored in separate files.
each portion of table--called regions.
Memstore
----------------
every insert is appened in memory inside the memstore. when the size grows to threshold size--contain of memstore will be flush to disk. this file is called as Hfile.
Hfile is stored in HDFS.
For each region there is memstores for every column family.
Per column family memstore is there.
wal--Write ahead log/HLOG
-----------------------
when u are inserting to memstore--that time u also write to wal(write ahead log)..then its written to memstore.
so if server crash--it will retain from wal.
wal is stored on disk (hdfs)
Block cache:
------------------------
when we read data--it will stored in cache--so that next time we will be reading from cache.
Wal and block cache--one per region server.
Zookeeper:
----------------
it is a cordinating service for various distributed system.
--server will sends heartbeat to zookeper.
have metatable.
Metatable--holding mapping of row keys,regions and region server.
this metatable--present in one of the region server.
hbase--master slave architecture.
hmaster--master and region servers-- slaves.
hmaster
:
------------------
assign regions to region server.
will do load balancing.
hbase cluster may have one or more master node--but at a time only one hmaster will be active.
hmaster--will do DDL operation.
recovery
HFiles:
--------
sorted key value pair.
--immutable.
--stored data as set of blocks.
--so that based on block indexes--we get to know which block holds the data.
binary serach is applied with in the block to serach for data.
default block size--64kb.
client--talk to zookeeper--for metatable information
Metatable u can stored in cache--
Query the metatable to get region server.
and get the region information.
hbase read:
------------
region server checks in block cache..
if not there then check in memestore.
if not there then check in correspnding hfiles--
block index--will tell--in block--binary serch will get the data.
compactions:
----------------------
combining smalla Hfiles to large Hfiles.
minor--few no hfiles
major--all combines to one large Hfiles.(resorce intensive)
Delete operation:
---------------------
for delete--it will be mark with tombstone markers. and inserted.
when u read this will be return as null.
during compaction this marker will be deleted.
hBase practical:
-------------------------
to connect hbase shell
list
---------------
to list tables.
exit
to come out
create 'table name','column family name','column family name'
atleast one column family
put 'table name','rowkey','column family:column name','values'
to get contain of tables:
scan 'table name'
to get specific row:
get 'table name','rowkey'
it is case sensitive
if table exists or not
exists students
drop 'tab name'
to drop--first we need to disable..
when we disable--content of memstore is flused to disk-- then it can drop
service --status-all
to check all service status.
root has access.
we need to check hbase master and hbase region server
============================
filter:
value filter
2 input parameter
1 operator and value
Qualifier filter:
------------------
column name
Family filter:
----------------
column family filter
count of records(row key)
count 'table name'
Comments
Post a Comment