-
<PYTHON>[BeautifulSoup]Flower in my dev/Python 2015. 4. 27. 18:32
설치
pip install BeautifulSoup4
파싱방법
soup.title
# <title>The Dormouse's story</title>
soup.title.name
# u'title'
soup.title.string
# u'The Dormouse's story'
soup.title.parent.name
# u'head'
soup.p
# <p class="title"><b>The Dormouse's story</b></p>
soup.p['class']
# u'title'
soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>URL 추출
for link in soup.find_all('a'):
print(link.get('href'))샘플
웹페이지 파싱
123456789101112131415161718192021222324#-*- coding:utf-8 -*-import urllibfrom bs4 import BeautifulSouphtml = urllib.urlopen(URL)fSoup = BeautifulSoup(html, "lxml")statusImg = fSoup.find(keyword)s = statusImg.get('key')ss = s.split('/')if ss[3] == "value":statusNum = "1"elif ss[3] == "value":statusNum = "2"elif ss[3] == "value":statusNum = "3"else:passcs xml 파일 파싱
1234567from bs4 import BeautifulSoupf = open('B.xml')xml = f.read()fSoup = BeautifulSoup(xml, 'xml')for meas in fSoup.findAll('keyword'):print meas.get('value')cs soup.title
# <title>The Dormouse's story</title>
soup.title.name
# u'title'
soup.title.string
# u'The Dormouse's story'
soup.title.parent.name
# u'head'
soup.p
# <p class="title"><b>The Dormouse's story</b></p>
soup.p['class']
# u'title'
soup.a
# <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
# <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
'Flower in my dev > Python' 카테고리의 다른 글
<PYTHON>[Unix Time] (0) 2015.04.29 <PYTHON>[데몬작성] (0) 2015.04.28 <PYTHON>특정 날짜가 지나면 파일을 지우기 (0) 2015.04.27 <PYTHON>이더레이터?? 제너레이터?? (0) 2015.04.23 <PYTHON> TCP_Echo_Server (0) 2015.04.21