pythonre抓站存数据问题。

作者：佚名字体：[增加减小] 来源：互联网时间：2017-06-07

佚名通过本文主要向大家介绍了pythonre,pythonre模块,pythonre.s,大数据问题,管家婆数据恢复问题等相关知识,希望对您有所帮助,也希望大家支持linkedu.com www.linkedu.com

问题：python re抓站存数据问题。
描述:

喜欢看日乎日报，就像把他们采集下来方便以后看。
但是碰到这样的目标：http://daily.zhihu.com/story/4692091
采集回来存数据库的时候，只存第一个条目.
需要标题和内容，使用的是scrapy和re.compile方法。
如何将标题和内容一一对应，并全部存入数据库。
练习python中...
采集代码:

        ......
        item = ShenhuifuItem()
        sites = response.body
        i = sites
        items = []
        item['bid']=re.compile('(\d+)').findall(response.url)[0]
        item['title']=re.compile(r'<h2>(.*?)</h2>').findall(i)
        item['content']=re.compile(r'<div>(.*?)</div>',re.DOTALL).findall(i)
        item['author']=re.compile(ur'<span>(.*?)</span>').findall(i)
        for title in  item['title']:
            item['title'] = title
        for content in item['content']:
            item['content'] = content
        for author in item['author']:
            if "，" in author:
                item['author'] = author[:-1]
            else:
                item['author']=author
        items.append(item)
        yield item

解决方案1:

谢谢，各位，匹配需求部分，然后使用for...in...循环解决。

解决方案2:

re.compile(r'<h2>(.*?)</h2>', re.M)

开启多行匹配模式

分享到：QQ空间新浪微博腾讯微博微信百度贴吧 QQ好友复制网址打印

您可能想查找下面的文章:

2017-06-07 不连接adb的手机（远程）通过安装apk，能不能做到将所有运行信息输出成文件？
2017-06-07 (python)怎么从BeautifulSoup得到的ResultSet里搜索想要的部分？
2017-06-07 怎么用正则表达式匹配13~99之间的年龄？
2017-06-07 JavaScript小算法！
2017-06-07 应届生怎样找一份Python的开发工作？
2017-06-07 机器学习实战之树回归的代码是否功能多余问题
2017-06-07 nodesdk里列出文件未支持排序
2017-06-07 curl模拟用户登陆，但有验证码
2017-06-07 jQuery将图片URL转换为img标签
2017-06-07 (shell)买了阿里云服务器还需要自己会负载均衡技术吗

pythonre抓站存数据问题。

您可能想查找下面的文章:

相关文章

文章分类

最近更新的内容