Using XPath with Scrapy Section 4
XPathの使い方
始める方法
$ scrapy shell In [2]: from scrapy.selector import Selector
以下のファイルを読み込ませる。
<html> <head> <title>Title of the page</title> </head> <body> <h1>H1 Tag</h1> <h2>H2 Tag with <a href="#">link</a></h2> <p>First Paragraph</p> <p>Second Paragraph</p> </body> </html>
実際に実行して見る
In [10]: sel = Selector(text=html_doc) In [11]: sel.xpath('/html/head/title').extract() Out[11]: ['<title>Title of the page</title>']
他の情報の取り方は以下の通りだ。
In [13]: sel.xpath('//p[1]').extract() Out[13]: ['<p>First Paragraph</p>']
色々なものが取得できる。
Google Chromeの活用法
- Open the web page in Google Chrome.
- Select the text portion you want to extract.
- Right-click, and select "Inspect".
- Select the HTML code you need, and select "Copy" and then "Copy XPath".
- Paste the XPath to your code, test, and edit it, if necessary.
- Note that this method copy the "id" but you can change it to the "class" of the same portion if that will work better