【Scrapy】Xpathの使い方
今回の章
セクション4:XPath Syntax
11. Using XPath with Scrapy
12. Tools to Easily Get XPath
今回の目的
Xpathの使い方を学ぶこと
準備
以下をコピーする
html_doc = '''
<html>
<head>
<title>Title of the page</title>
</head>
<body>
<h1>H1 Tag</h1>
<h2>H2 Tag with <a href="#">link</a></h2>
<p>First Paragraph</p>
<p>Second Paragraph</p>
</body>
</html>
'''
In [1]: from scrapy.selector import Selector
In [2]: %paste
html_doc = '''
<html>
<head>
<title>Title of the page</title>
</head>
<body>
<h1>H1 Tag</h1>
<h2>H2 Tag with <a href="#">link</a></h2>
<p>First Paragraph</p>
<p>Second Paragraph</p>
</body>
</html>
'''
## -- End pasted text --
In [3]: sel = Selector(text=html_doc)
①titleのtextを取得する
In [4]: sel.xpath('/html/head/title').extract()
Out[4]: ['<title>Title of the page</title>']
②First Paragraphのみ取得する。//を使うと省略出来る。
In [13]: sel.xpath('//p[1]').extract()
Out[13]: ['<p>First Paragraph</p>']
③Chromeを活用してXpathを書くこともできる。
- Open the web page in Google Chrome.
- Select the text portion you want to extract.
- Right-click, and select "Inspect".
- Select the HTML code you need, and select "Copy" and then "Copy XPath".
- Paste the XPath to your code, test, and edit it, if necessary.
- Note that this method copy the "id" but you can change it to the "class" of the same portion if that will work better