Hadoop 2.2 and Flume 1.4 Protobuf Problem and Solution

I have to say the big THANK to the author of  “Hadoop in Practice” : Alex Holmes

Source : http://grepalex.com/2014/02/09/flume-and-hadoop-2.2/

The problem you may encounter while  trying to integrate Hadoop 2.2 and Flume 1.4 is the incompatibility between protobuf versions :

2014-04-15 13:56:23,251 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR – org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:422)] process failed

java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$RecoverLeaseRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
at java.lang.Class.privateGetPublicMethods(Class.java:2651)
at java.lang.Class.privateGetPublicMethods(Class.java:2661)
at java.lang.Class.getMethods(Class.java:1467)
at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:426)
at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:323)
at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:636)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:722)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:328)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Exception in thread “SinkRunner-PollingRunner-DefaultSinkProcessor” java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$RecoverLeaseRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
at java.lang.Class.privateGetPublicMethods(Class.java:2651)
at java.lang.Class.privateGetPublicMethods(Class.java:2661)
at java.lang.Class.getMethods(Class.java:1467)
at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:426)
at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:323)
at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:636)
at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:722)
at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:328)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220)
at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536)
at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160)
at org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56)
at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

This is his  post :

Google really screwed the pooch with their protobuf 2.5 release. Code generated with protobuf 2.5 is binary incompatible with older protobuf libraries (I guess Google missed the semantic versioning boat on this release). Unfortunately the current stable release of Flume 1.4 packages protobuf 2.4.1 and if you try and use HDFS on Hadoop 2.2 as a sink you’ll be smacked with the following exception:

java.lang.VerifyError: class org.apache.hadoop.security.proto.SecurityProtos$GetDelegationTokenRequestProto
overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
    ...
    at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92)
    at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:328)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)

Hadoop 2.2 uses protobuf 2.5 for its RPC, and Flume loads its older packaged version of protobuf ahead of Hadoop’s, which causes this error. To fix this you’ll need to move both protobuf and guava out of Flume’s lib directory. The following command moves them into your home directory.

$ mv ${flume_bin}/lib/{protobuf-java-2.4.1.jar,guava-10.0.1.jar} ~/

Now if you restart your Flume agent you’ll be able to target HDFS as a sink with Hadoop 2.2. Great success!

Flume’s next release will move to protobuf 2.5 so this problem should magically disappear in due course.