티스토리 뷰

반응형

오류내용

hive 에 STRUCT 형으로 필드가 있는 테이블을 JOIN 할 때 나타났던 문제이다.

hql의 문법상 문제가 없는데 beeline 에서 뱉는 오류메시지는 다음과 같다. (tez 엔진을 사용했다)

INFO  : Map 1: 26(+2)/28        Map 5: 337(+33)/370     Map 6: 1(+1)/2  Reducer 3: 0(+0,-415)/377       Reducer 4: 0/208
ERROR : Status: Failed
ERROR : Vertex re-running, vertexName=Map 5, vertexId=vertex_1475821062280_208694_1_01
ERROR : Vertex re-running, vertexName=Map 6, vertexId=vertex_1475821062280_208694_1_00
ERROR : Vertex re-running, vertexName=Map 1, vertexId=vertex_1475821062280_208694_1_02
ERROR : Vertex re-running, vertexName=Map 5, vertexId=vertex_1475821062280_208694_1_01
ERROR : Vertex re-running, vertexName=Map 6, vertexId=vertex_1475821062280_208694_1_00
ERROR : Vertex re-running, vertexName=Map 1, vertexId=vertex_1475821062280_208694_1_02
ERROR : Vertex re-running, vertexName=Map 5, vertexId=vertex_1475821062280_208694_1_01
ERROR : Vertex re-running, vertexName=Map 6, vertexId=vertex_1475821062280_208694_1_00
ERROR : Vertex failed, vertexName=Reducer 3, vertexId=vertex_1475821062280_208694_1_03, diagnostics=[Task failed, taskId=task_1475821062280_208694_1_03_000165, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task: attempt_1475821062280_208694_1_03_000165_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":"값","_col1":"306960"},"value":{"_col0":6}}
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:187)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:351)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
        at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:59)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:59)
        at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:36)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":"값","_col1":"306960"},"value":{"_col0":6}}
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:270)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:169)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:164)
        ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"_col0":"값","_col1":"306960"},"value":{"_col0":6}}
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:338)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:259)
        ... 16 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unexpected exception: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct cannot be cast to [Ljava.lang.Object;
        at org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:318)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1047)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:858)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:718)
        at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:786)
        at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:329)

해결방법

아마도 버그가 아닐까 생각되는데, 오류메시지가 아래 내용과 완전 일치되진 않지만 유사한 형태로 추정된다.

여러 삽질끝에 map join 을 disable 후 쿼리를 돌리면 성공했다.

https://issues.apache.org/jira/browse/HIVE-11051

 

쿼리를 돌리기전에, 다음 옵션을 false 로 해제하고 돌려보니 성공적으로 돌아갔다.

set hive.auto.convert.join=false;

참고로 이 옵션이 뭔지 궁금할텐데, join 할 때 맵사이드 조인(Mapside Join)을 활성화 할지 여부를 의미한다.

 

기본적으로는 조인작업은 key 단위로 묶어야 하기 때문에  Reudce 작업이 필수다. (키단위로 묶으려면 셔플이 일어나야하니까)

하지만, 테이블 하나를 메모리에 통째로 올릴수 있다면 Reduce 작업 없이 조인이 가능한데, 이를 맵사이드 조인(Mapside Join) 이라고 이야기 한다. 

 

아무튼, 맵사이드 조인(Mapside Join) 을 처리할 때, stuct 타입필드와 뭔가 버그가 있는게 아닌가 싶다.

급한대로 이 옵션을 해제하면 쿼리를 돌릴수 있으니 참고하자.

 

hive 조인 관련된 내용이 더 궁금하면 아래 링크를 참고하도록 하자.

https://wikidocs.net/80768

 

 

반응형
댓글
공지사항
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
링크
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함