hive optu

3 ways optimize:

design table(creatin)

(2 options:

partitioning and bucketing)

both divide the data into small parts.

structure queries.(efficient queries)

(join takes lots of times)--query level

(join optimizations)

simplified queries.(simple queries).

(windowing function)

partitioning:

dividing data based on columns

in dir:

user/hive/warehouse/treandytech.db/customers/state=CA

user/hive/warehouse/treandytech.db/customers/state=NY

only dir will scan

less data scan--performance gain

if we use partition columns then-- optimization

partioning --should done on most common queries.

issues:

if we have lots of distinct values-- cardinality--very high--then we won't do partitining

lots of folder will be created.

two types partioning:

static:

(we should hv idea on data and load manually)

dynamic:

(partitions created automatically.)

(we don't know data)

static is faster than dynamic

partitioning--works well with low cardinality

if more distinct values--then--go for bucketing

no 0f partition= no of distinct values

bucketing:

-------------------

we hv to define fix no of bucket---during table creation.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

accoring to data--we do trials t+hen decide no of buckets.

each partition= folder

each bucket= file

while quering

select * from orders where id = 4

bucket--mod will work

if bucket= 3

then 4 % 3 = 1..

it will check in 1st bucket

high cardinality--go bucketing

by default no of partition-- set to 10000

if it exceeded then hive gives error..u can change this no --but u should not+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

partition--varies lot in size..

but bucketing-- almost same

bucketing is a good sample-- if someone ask a sample

--then give him a backet.

bucket--is a good sample

-------------------------

we can combine both partitioning and bucketing in hive table

bucketing + partitioning--we can't have

benefits of bbucketing

--faster query response

--join optimization

Search This Blog

My Learning Notes

hive optu

Comments

Post a Comment

Popular posts from this blog

scala-4