错误信息: java.io.IOException: java.sql.BatchUpdateException: Incorrect string value: '\xD6\xD0\xB9\xFA\xB9\xA4...' for column 'content' at row 1 at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340) at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185) at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Caused by: java.sql.BatchUpdateException: Incorrect string value: '\xD6\xD0\xB9\xFA\xB9\xA4...' for column 'content' at row 1 at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1666) at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1082) at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
解决方法:
在nutch2.1 中配置 <property> <name>encodingdetector.charset.min.confidence</name> <value>1</value> <description>A integer between 0-100 indicating minimum confidence value for charset auto-detection. Any negative value disables auto-detection. </description> </property> 并确保mysql数据库编码为UTF-8