Simple web-scraping with Mechanize and Nokogiri. Presented at the Ruby Drink-up of Sophia Antipolis on the 8th of November 2011 by Muriel Salvan (@MurielSalvan).
Mechanize at the Ruby Drink-up of Sophia, November 2011
1. Simple web-scraping with Mechanize and Nokogiri Nov 8 th 2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
11. Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text => 'Green King' ) . first . click page3 = agent. back agent. user_agent = 'My user agent'
12. Common parsing Selectors page. root . css ( 'body div.myclass' ) . each { | element | … } page. root . xpath ( '//h3/a[@class="l"]' ) . eac h { | element | … }
13. Common parsing Elements < div > < a href = " http://www.google.com " > Click here < img src = " http://www.google.com/favicon.ico " / > < / a > < / div > element [ 'href' ] => "http: // www.google.com" element. content => " Click here " element. children . second . name => "img" element. parent . name => "div" element
14. Filling and submitting forms Basic example Google search form = agent. get ( 'http://www.google.com' ) . forms . first form. q = 'Rivierarb' results_page = form. submit
15. Filling and submitting forms Fields When your HTML form has < input … name = "myfield" >...< / input > you can write form. myfield = 'The field value' form. field_with ( :name => 'myfield' ) . value = 'The field value' form. checkboxfield = '1' form. selectfield = '5'
16. Filling and submitting forms Buttons ! Mechanize does not add the value of the button being clicked ! If the web server cares for buttons values in POST data, add them manually. < input type = "submit" name = "btn1" value = "Clicked">...< / input > form. add_field ! ( 'btn1' , 'Clicked' ) b utton = form. button_with ( :name => 'btn1' ) page = form. click_button ( button )