ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • <PYTHON>[BeautifulSoup]
    Flower in my dev/Python 2015. 4. 27. 18:32

    설치

     

    pip install BeautifulSoup4

     

    파싱방법

     

    soup.title
    # <title>The Dormouse's story</title>
    soup.title.name
    # u'title'
    soup.title.string
    # u'The Dormouse's story'
    soup.title.parent.name
    # u'head'
    soup.p
    # <p class="title"><b>The Dormouse's story</b></p>
    soup.p['class']
    # u'title'
    soup.a
    # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>
    soup.find_all('a')
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    # <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    soup.find(id="link3")
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

     

     

    URL 추출

     

    for link in soup.find_all('a'):
    print(link.get('href'))

     

    샘플

     

    웹페이지 파싱

     

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    #-*- coding:utf-8 -*-
    import urllib
     
    from bs4 import BeautifulSoup
     
    html = urllib.urlopen(URL)
    fSoup = BeautifulSoup(html, "lxml")
    statusImg = fSoup.find(keyword)
     
    = statusImg.get('key')
     
    ss = s.split('/')
     
    if ss[3== "value":
        statusNum = "1"
     
    elif ss[3== "value":
        statusNum = "2"
     
    elif ss[3== "value":
        statusNum = "3"
     
    else:
        pass
    cs

     

    xml 파일 파싱

     

    1
    2
    3
    4
    5
    6
    7
    from bs4 import BeautifulSoup
     
    = open('B.xml')
    xml = f.read()
    fSoup = BeautifulSoup(xml, 'xml')
    for meas in fSoup.findAll('keyword'):
        print meas.get('value')
    cs

     

     

    soup.title
    # <title>The Dormouse's story</title>

    soup.title.name
    # u'title'

    soup.title.string
    # u'The Dormouse's story'

    soup.title.parent.name
    # u'head'

    soup.p
    # <p class="title"><b>The Dormouse's story</b></p>

    soup.p['class']
    # u'title'

    soup.a
    # <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

    soup.find_all('a')
    # [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    #  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
    #  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

    soup.find(id="link3")
    # <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
     

     

    'Flower in my dev > Python' 카테고리의 다른 글

    <PYTHON>[Unix Time]  (0) 2015.04.29
    <PYTHON>[데몬작성]  (0) 2015.04.28
    <PYTHON>특정 날짜가 지나면 파일을 지우기  (0) 2015.04.27
    <PYTHON>이더레이터?? 제너레이터??  (0) 2015.04.23
    <PYTHON> TCP_Echo_Server  (0) 2015.04.21

    댓글

Designed by Tistory.