hive optu

 3 ways optimize:


design table(creatin)

(2 options:

partitioning and bucketing)


both divide the data into small parts.


structure queries.(efficient queries)

(join takes lots of times)--query level

(join optimizations)


simplified queries.(simple queries).

(windowing function)


partitioning:

dividing data based on columns


in dir:


user/hive/warehouse/treandytech.db/customers/state=CA

user/hive/warehouse/treandytech.db/customers/state=NY


only dir will scan


less data scan--performance gain


if we use partition columns then-- optimization


partioning --should done on most common queries.


issues:


if we have lots of distinct values-- cardinality--very high--then we won't do partitining

lots of folder will be created.


two types partioning:

static:

(we should hv idea on data and load manually)

dynamic:

(partitions created automatically.)

(we don't know data)


static is faster than dynamic


partitioning--works well with low cardinality


if more distinct values--then--go for bucketing


no 0f partition= no of distinct values


bucketing:

-------------------

we hv to define fix no of bucket---during table creation.


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


accoring to data--we do trials t+hen decide no of buckets.


each partition= folder

each bucket= file


while quering


select * from orders where id = 4


bucket--mod will work


if bucket= 3

then 4 % 3 = 1..

it will check in 1st bucket

high cardinality--go bucketing


by default no of partition-- set to 10000

if it exceeded then hive gives error..u can change this no --but u should not+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


partition--varies lot in size..

but bucketing-- almost same



bucketing is a good sample-- if someone ask a sample

--then give him a backet.


bucket--is a good sample


-------------------------


we can combine both partitioning and bucketing in hive table


bucketing + partitioning--we can't have


benefits of bbucketing


--faster query response

--join optimization


Comments

Popular posts from this blog

scala-4