IT技术互动交流平台

hadoop2.4.0的distcp引起的问题

来源:IT165收集  发布日期:2015-01-06 20:10:42

最近在支持业务部门将数据从hadoop0.20.203迁移到hadoop2.4.0的时候,distcp报了几个错误,在这里记录一下:

1、报权限错误

 

15/01/06 10:48:37 ERROR tools.DistCp: Unable to cleanup meta folder: /DistCp

org.apache.hadoop.security.AccessControlException: Permission denied: user=weibo_bigdata_uquality, access=WRITE, inode="/":hadoop:supergroup:drwxr-xr-x

at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:274)

at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:260)

at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:241)


解决办法:修改/DistCp目录权限

 

hadoop fs -chmod 777 /DistCp

这个问题有一个疑问,其实我在执行distcp的时候指定了 -log选项,将log目录指定到了有权限的用户目录下面,但是还是报以上错误。

 

2、报401错误

 

15/01/06 10:48:37 ERROR tools.DistCp: Exception encountered

java.io.IOException: Server returned HTTP response code: 401 for URL:

at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1459)

at org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.fetchList(HftpFileSystem.java:462)

at org.apache.hadoop.hdfs.web.HftpFileSystem$LsParser.getFileStatus(HftpFileSystem.java:474)

at org.apache.hadoop.hdfs.web.HftpFileSystem.getFileStatus(HftpFileSystem.java:503)

at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)

at org.apache.hadoop.fs.Globber.glob(Globber.java:248)

at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1623)

at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77)

at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:80)

at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:327)

at org.apache.hadoop.tools.DistCp.execute(DistCp.java:151)

at org.apache.hadoop.tools.DistCp.run(DistCp.java:118)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.tools.DistCp.main(DistCp.java:373)

这个问题和我们hadoop集群的权限认证有关系,在hadoop1.0上面配置了ugi和extrahosts之后得到解决

 

3、报检验和错误

经过以上两个错误的解决,distcp成功运行了,但是在运行过程中报校验和错误:

 

2015-01-06 11:32:37,604 ERROR [main] org.apache.hadoop.tools.util.RetriableCommand: Failure in Retriable command: Copying ... to ....

java.io.IOException: Check-sum mismatch between ..... and .....

at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.compareCheckSums(RetriableFileCopyCommand.java:190)

at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doCopy(RetriableFileCopyCommand.java:125)

at org.apache.hadoop.tools.mapred.RetriableFileCopyCommand.doExecute(RetriableFileCopyCommand.java:95)

at org.apache.hadoop.tools.util.RetriableCommand.execute(RetriableCommand.java:87)

at org.apache.hadoop.tools.mapred.CopyMapper.copyFileWithRetry(CopyMapper.java:258)

at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:229)

at org.apache.hadoop.tools.mapred.CopyMapper.map(CopyMapper.java:45)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

在网上搜索了一下,找到解决方案:

Running distcp on a CDH4 YARN cluster with a CDH3 hftp source will fail if the CRC checksum type being used is the CDH4 default (CRC32C). This is because the default checksum type was changed in CDH4 from the CDH3 default of CRC32.

You can work around this issue by changing the CRC checksum type on the CDH4 cluster to the CDH3 default, CRC32. To do this set dfs.checksum.type to CRC32 in hdfs-site.xml.

意思是在hadoop1.0中的校验和类型是CRC32,但是到了hadoop2.0,校验和类型改成了CRC32C,肯定就对不上了,那怎么办呢,把hadoop2.0的校验和类型也改成CRC32吧。

地址:http://blog.csdn.net/map_lixiupeng/article/details/27542625

设置参数dfs.checksum.type为CRC32。

最终的提交命令:

hadoop distcp -Ddfs.checksum.type=CRC32 -log /user/${user_name}/DistCp hftp://example1:50070/${path} hdfs://example2:8020/${path}

 

Tag标签: 问题  
  • 专题推荐

About IT165 - 广告服务 - 隐私声明 - 版权申明 - 免责条款 - 网站地图 - 网友投稿 - 联系方式
本站内容来自于互联网,仅供用于网络技术学习,学习中请遵循相关法律法规