自己做采集程序

作者：字体：[增加减小] 来源：互联网时间：2017-05-11

通过本文主要向大家介绍了c#数据采集程序,ad7606采集程序,数据采集程序,温度采集程序,全国学籍照片采集程序等相关知识,希望对您有所帮助,也希望大家支持linkedu.com www.linkedu.com

现在网上的采集程序很多，但是有时候你发现一个好的网站，想自己做个采集工具采集一些信息，就需要自己去写程序了，其实这样的采集程序并不难写，主要是去分析源网站的网页结构。
首先去下载个XMLHTTP的类文件：
<%
Class xhttp
private cset,sUrl,sError
Private Sub Class_Initialize()
'cset="UTF-8"
cset="GB2312"
sError=""
end sub

Private Sub Class_Terminate()
End Sub

Public Property LET URL(theurl)
sUrl=theurl
end property
public property GET BasePath()
BasePath=mid(sUrl,1,InStrRev(sUrl,"/")-1)
end property
public property GET FileName()
FileName=mid(sUrl,InStrRev(sUrl,"/")+1)
end property
public property GET Html()
Html=BytesToBstr(getBody(sUrl))
end property

public property GET xhttpError()
xhttpError=sError
end property

private Function BytesToBstr(body)
on error resume next
'Cset:GB2312 UTF-8
dim objstream
set objstream = Server.CreateObject("adodb.stream")
with objstream
.Type = 1 '
.Mode = 3 '
.Open
.Write body '
.Position = 0 '
.Type = 2 '
.Charset = Cset '
BytesToBstr = .ReadText '
.Close
end with
set objstream = nothing
End Function

private function getBody(surl)
on error resume next
dim xmlHttp
'Set xmlHttp=server.createobject("Msxml2.XMLHTTP.4.0")
'set xmlHttp=server.createobject("Microsoft.XMLHTTP")
set xmlHttp=server.createobject("MSXML2.ServerXMLHTTP")
xmlHttp.setTimeouts 10000,10000,10000,30000
xmlHttp.open "GET",surl,false
xmlHttp.send
if xmlHttp.readystate=4 then
'if xmlHttp.status=200 then
getBody=xmlhttp.responsebody
'end if
else
getBody=""
end if

if Err.Number<>0 then
sError=Err.Number
Err.clear
else
sError=""
end if
set xmlHttp=nothing
end function

Public function saveimage(tofile,isoverwrite)
on error resume next
dim objStream,objFSO,imgs

if Not isoverwrite Then
Set objFSO = Server.CreateObject("Scripting.FileSystemObject")
If objFSO.FileExists(Server.MapPath(tofile)) Then
Exit Function
End If
Set objFSO = Nothing
End IF

imgs=getBody(sUrl)
Set objStream = Server.CreateObject("ADODB.Stream")
with objStream
.Type =1
.Open
.write imgs
.SaveToFile server.mappath(tofile),2
.Close()
end with
set objstream=nothing
end function

end class

%>
用了这个类文件，做起事情来就方便多了。
然后就可以分析采集网站的网页结构，写采集程序了。
下面给个例子：



<%
server.ScriptTimeout = 1000
%>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>BT采集器</title>
</head>
<body>
<form name="form1" method="post" action="get81bt.asp">
分类ID：
  <input type="text" name="cid" value="<%=request("cid")%>"><br>
开始ID：
  <input type="text" name="startid" value="<%=request("startid")%>">
  <br>
  结束ID：
  <input type="text" name="overid" value="<%=request("overid")%>">
  <br>
  分类名称：<input type="text" name="classname" value="<%=request("classname")%>">为空自动获取
  <br>
  <input name="action" type="hidden" id="action" value="getdata">
  <input type="submit" name="Submit" value="采集">
</form>
当前ID：<%=request("id")%> <br>
<%
dim action

action = Request("action")
if action = "getdata" then
cid = Request("cid")
startid = Request("startid")
overid = Request("overid")
id = Request("id")
if id = "" then id = startid

set objxhttp = new xhttp

objxhttp.URL = "http://www.81dd.com/Class/"&cid&"_"&id&".htm"
content = objxhttp.Html

if InStr(content,"网站维护中") then
call NextID
response.End()
end if

list = GetContent(content,"","",0)

Dim regEx, Match, Matches,patrn
Set regEx = New RegExp
patrn = "<a href=""../BtHtml/(.+?)"">"
regEx.Pattern = patrn
regEx.IgnoreCase = True
regEx.Global = True
Set Matches = regEx.Execute(list)
on error resume next
For Each Match in Matches

'response.write Match.Value & "<br>"
weburl = "http://www.81dd.com/BtHtml/" & regEx.Replace(Match.Value,"$1")
response.write weburl & "<br>"
response.Flush()

objxhttp.URL = weburl
cpage = objxhttp.Html
cpage = GetContent(cpage,"","",0)

title = GetContent(cpage,"BT资源名称：<strong>","</strong>",0)
title = stripHTML(title)

IF Request("classname") <> "" then
classname = Request("classname")
Else
if InStr(title,"喜剧") then
classname = "喜剧"
Elseif InStr(title,"动作") then
classname = "动作"
Elseif InStr(title,"惊悚") then
classname = "惊悚"
Elseif InStr(title,"犯罪") then
&n

分享到：QQ空间新浪微博腾讯微博微信百度贴吧 QQ好友复制网址打印

您可能想查找下面的文章:

2017-05-11ASP编程入门进阶（十一）：Chat聊天程序
2017-05-11P3P 和跨域 (cross-domain) cookie 访问(读取和设置）
2017-05-11ASP新闻分页，将一篇过长的文章分页，生成静态页面
2017-05-11迅雷API接口_通过脚本调用迅雷自动下载资源
2017-05-11关于ASP eof与bof 区别分析
2017-05-11asp分页生成html的程序脚本代码
2017-05-11ASP常用函数:CStrIP()
2017-05-11如何查询日期类型的数据？
2017-05-11asp 根据IP地址自动判断转向分站的代码
2017-05-11asp 取得用户真实IP，对代理地址仍然有效的函数

自己做采集程序

您可能想查找下面的文章:

相关文章

文章分类

最近更新的内容