![](/uploads/image/0008.jpg)
抽取自媒体新闻热词的技术实现作者:叶宇翔来源:《电脑知识与技术》2018年第17期 摘要:通过基于Python语言的网络爬虫对“今日头条”、“一点资讯”的热点推送新闻标题进行抓取,使用基于Python的中文分词工具对新闻标题数据进行分词统计处理。为了高效获取数据,对不同的网站使用不同的爬虫技术,在为期一个月的时间内对“今日头条”伦敦时间和北京时间换算
等自媒体新闻网抓取近万条热点新闻标题,在对数据进行分词统计及关键词提取后成功获取当月新闻中的热词。 关键词:网络爬虫;中文分词;自媒体;新闻传播;关键词
中图分类号:TP311 文献标识码:A 尧山漂流攻略文章编号:1009-3044(2018)17-0014-03
被自考本科坑了一辈子
Abstract:Through the Python-based web crawler the Python-based Chinese word segmentation tool to capture the headline data of “utiao” and “www.yidianzixun”. In order to efficiently obtain data, different spider technologies are 墨西哥城时间
used for different websites中小学生安全教育平台入口, and nearly 10上海东方绿舟夏令营,000 hot news headlines were crawled on the “utiao” and other self-media news networks for a period of one month, and word segmentation statistics and keywords are used for the data. After the extraction, the hot words in the news of the current month were successfully obtained.