文章詳情頁(yè)

csv - python多列存取爬蟲網(wǎng)頁(yè)？

瀏覽：113日期：2022-08-30 10:07:16

問題描述

爬蟲抓取的資料想分列存取在tsv上,試過很多方式都沒有辦法成功存存取成兩列資訊。想存取為數(shù)字爬取的資料一列,底下類型在第二列 csv - python多列存取爬蟲網(wǎng)頁(yè)？

from urllib.request import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)def GPname(): GPnameList = bs0bj.find_all('dd',{'class':re.compile('ddappname')}) str = ’’ for name in GPnameList:str += name.get_text()str += ’n’print(name.get_text()) return strdef GPcompany(): GPcompanyname = bs0bj.find_all('dd',{'style':re.compile('color')}) str = ’’ for cpa in GPcompanyname:str += cpa.get_text()str += ’n’print(cpa.get_text()) return strwith open(’0217.tsv’,’w’,newline=’’,encoding=’utf-8’) as f: f.write(GPname()) f.write(GPcompany())f.close()

可能對(duì)zip不熟悉，存取下來之后變成一個(gè)字一格也找到這篇參考，但怎么嘗試都沒有辦法成功https://segmentfault.com/q/10...

問題解答

回答1：

寫csv文件簡(jiǎn)單點(diǎn) 你的結(jié)構(gòu)數(shù)據(jù)要成這樣 [['1. 東森新聞雲(yún)','新聞'],['2. 創(chuàng)世黎明(Dawn of world)','遊戲']]

from urllib import urlopenfrom bs4 import BeautifulSoupimport reimport csvhtml = urlopen('http://www.app12345.com/?area=tw&store=Apple%20Store')bs0bj = BeautifulSoup (html)GPnameList = [name.get_text() for name in bs0bj.find_all('dd',{'class':re.compile('ddappname')})]GPcompanyname = [cpa.get_text() for cpa in bs0bj.find_all('dd',{'style':re.compile('color')})]data = ’n’.join([’,’.join(d) for d in zip(GPnameList, GPcompanyname)])with open(’C:/Users/sa/Desktop/0217.csv’,’wb’) as f: f.write(data.encode(’utf-8’))

Python 編程

上一條：python - 搜索大文件（20G左右）下一條：ubuntu - Python3.x的中文字符在Linux下面的占位問題？

相關(guān)文章：

1. javascript - 在 model里定義的引用表模型時(shí)，model為undefined。2. python3.x - c++調(diào)用python33. css3 - 沒明白盒子的height隨width的變化這段css是怎樣實(shí)現(xiàn)的?4. atom開始輸入！然后按tab只有空格出現(xiàn)沒有html格式出現(xiàn)5. css3 - 這個(gè)右下角折角用css怎么畫出來？6. android - 課程表點(diǎn)擊后浮動(dòng)后邊透明可以左右滑動(dòng)的界面是什么？7. java - 根據(jù)月份查詢多個(gè)表里的內(nèi)容怎么實(shí)現(xiàn)好？8. 關(guān)于docker下的nginx壓力測(cè)試9. javascript - 一個(gè)關(guān)于客戶端和前端通信的疑惑？10. debian - docker依賴的aufs-tools源碼哪里可以找到啊？

排行榜

					
					關(guān)于docker下的nginx壓力測(cè)試
為什么我ping不通我的docker容器呢？？？
angular.js - angular內(nèi)容過長(zhǎng)展開收起效果
debian - docker依賴的aufs-tools源碼哪里可以找到啊？
android - 課程表點(diǎn)擊后浮動(dòng)后邊透明可以左右滑動(dòng)的界面是什么？
css3 - 沒明白盒子的height隨width的變化這段css是怎樣實(shí)現(xiàn)的?
css3 - 這個(gè)右下角折角用css怎么畫出來？
python3.x - c++調(diào)用python3
java - 根據(jù)月份查詢多個(gè)表里的內(nèi)容怎么實(shí)現(xiàn)好？
javascript - 一個(gè)關(guān)于客戶端和前端通信的疑惑？
javascript - 在 model里定義的 引用表模型時(shí)，model為undefined。
				

熱門標(biāo)簽

久久r热视频,国产午夜精品一区二区三区视频,亚洲精品自拍偷拍,欧美日韩精品二区

csv - python多列存取爬蟲網(wǎng)頁(yè)？