爬取网页,对资源文件进行存储时,抛出java.io.IOException: Server returned HTTP response code: 403 for URL 异常
处理方法:
URLConnection openConnection = new URL(href).openConnection();
openConnection.addRequestProperty("User-Agent", Config.DEFAULT_USER_AGENT);
openConnection.connect();
//InputStream in = url.openStream();
InputStream in = openConnection.getInputStream();
OutputStream out = new BufferedOutputStream(new FileOutputStream(savePath));
for (int len; (len = in.read()) != -1;) {
out.write(len);
}
out.close();
in.close();
主要是第二行代码,设置User-Agent,如果不设置,有些网站服务器设置禁止爬取,就会抛出code:403异常