SlideShare a Scribd company logo
1 of 14
Simple web-scraping with Mechanize and Nokogiri Nov 8 th  2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
The need ,[object Object],[object Object]
Parse  HTML pages (DOM) =>  Nokogiri
Installation ,[object Object],[object Object]
! Version 1.0.0 can be more stable than 2.x.x for some complex queries (" gem install mechanize -v 1.0.0 " to enforce it).
Basic example require   'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content =>   "Riviera.rb"
Main usage ,[object Object]
Use the agent to  perform HTTP(S) requests  (get, post). Each request gives a Nokogiri page.
Parse the page  using CSS selectors, XPath, DOM iterators.
Fill and post forms  using intuitive helpers.
Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text   =>   'Green King' ) . first . click page3 = agent. back agent. user_agent  =  'My user agent'
Common parsing Selectors page. root . css ( 'body div.myclass' ) . each   {   | element |  …  } page. root . xpath ( '//h3/a[@class="l"]' ) . eac h   {   | element |  …  }
Common parsing Elements < div > < a   href = &quot; http://www.google.com &quot; > Click here < img   src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] =>   &quot;http: // www.google.com&quot; element. content =>   &quot;    Click here       &quot; element. children . second . name =>   &quot;img&quot; element. parent . name =>   &quot;div&quot; element
Filling and submitting forms Basic example Google search form =  agent. get ( 'http://www.google.com' ) . forms . first form. q  =  'Rivierarb' results_page = form. submit

More Related Content

What's hot

A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...Ho Chi Minh City Software Testing Club
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutescameronbot
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With BloggingTakatsugu Shigeta
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsKonstantin Kudryashov
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management SystemValent Mustamin
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netVijay Saklani
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestraFabio Akita
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page siteMoneer kamal
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Pamela Fox
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rob
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワークTakatsugu Shigeta
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayFITC
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryDavid Park
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Richard Rabins
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersRobert Nyman
 

What's hot (20)

Changing Template Engine
Changing Template EngineChanging Template Engine
Changing Template Engine
 
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutes
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With Blogging
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projects
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management System
 
Capybara
CapybaraCapybara
Capybara
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.net
 
Ajax
AjaxAjax
Ajax
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestra
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page site
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008
 
BDD with cucumber
BDD with cucumberBDD with cucumber
BDD with cucumber
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワーク
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be Okay
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQuery
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
 
Symfony2
Symfony2Symfony2
Symfony2
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - Fronteers
 

Similar to Mechanize at the Ruby Drink-up of Sophia, November 2011

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint DevZeddy Iskandar
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]Chris Toohey
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileChris Toohey
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsAtlassian
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPStephan Schmidt
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Sergey Ilinsky
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2borkweb
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile WidgetsJose Palazon
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Finalematrix
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlJoomla!Days Netherlands
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!Herman Peeren
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgetsgiurca
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPyucefmerhi
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages pptIblesoft
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overviewreybango
 

Similar to Mechanize at the Ruby Drink-up of Sophia, November 2011 (20)

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint Dev
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
 
ASP_NET Features
ASP_NET FeaturesASP_NET Features
ASP_NET Features
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial Gadgets
 
Vb.Net Web Forms
Vb.Net  Web FormsVb.Net  Web Forms
Vb.Net Web Forms
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHP
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2
 
BluePrint Mobile Framework
BluePrint Mobile FrameworkBluePrint Mobile Framework
BluePrint Mobile Framework
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile Widgets
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Final
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nl
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgets
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITP
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages ppt
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overview
 

More from rivierarb

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013rivierarb
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013rivierarb
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013rivierarb
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012rivierarb
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012rivierarb
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012rivierarb
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012rivierarb
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011rivierarb
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011rivierarb
 

More from rivierarb (9)

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011
 

Recently uploaded

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Recently uploaded (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Mechanize at the Ruby Drink-up of Sophia, November 2011

  • 1. Simple web-scraping with Mechanize and Nokogiri Nov 8 th 2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
  • 2.
  • 3. Parse HTML pages (DOM) => Nokogiri
  • 4.
  • 5. ! Version 1.0.0 can be more stable than 2.x.x for some complex queries (&quot; gem install mechanize -v 1.0.0 &quot; to enforce it).
  • 6. Basic example require 'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content => &quot;Riviera.rb&quot;
  • 7.
  • 8. Use the agent to perform HTTP(S) requests (get, post). Each request gives a Nokogiri page.
  • 9. Parse the page using CSS selectors, XPath, DOM iterators.
  • 10. Fill and post forms using intuitive helpers.
  • 11. Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text => 'Green King' ) . first . click page3 = agent. back agent. user_agent = 'My user agent'
  • 12. Common parsing Selectors page. root . css ( 'body div.myclass' ) . each { | element | … } page. root . xpath ( '//h3/a[@class=&quot;l&quot;]' ) . eac h { | element | … }
  • 13. Common parsing Elements < div > < a href = &quot; http://www.google.com &quot; > Click here < img src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] => &quot;http: // www.google.com&quot; element. content => &quot; Click here &quot; element. children . second . name => &quot;img&quot; element. parent . name => &quot;div&quot; element
  • 14. Filling and submitting forms Basic example Google search form = agent. get ( 'http://www.google.com' ) . forms . first form. q = 'Rivierarb' results_page = form. submit
  • 15. Filling and submitting forms Fields When your HTML form has < input … name = &quot;myfield&quot; >...< / input > you can write form. myfield = 'The field value' form. field_with ( :name => 'myfield' ) . value = 'The field value' form. checkboxfield = '1' form. selectfield = '5'
  • 16. Filling and submitting forms Buttons ! Mechanize does not add the value of the button being clicked ! If the web server cares for buttons values in POST data, add them manually. < input type = &quot;submit&quot; name = &quot;btn1&quot; value = &quot;Clicked&quot;>...< / input > form. add_field ! ( 'btn1' , 'Clicked' ) b utton = form. button_with ( :name => 'btn1' ) page = form. click_button ( button )
  • 17.
  • 21. Use HTML parsers other than Nokogiri
  • 22. Does not have JavaScript engine (therefore no Ajax)
  • 23.
  • 26. Nokogiri element API This presentation is available under CC-BY license by Muriel Salvan
  • 27. Q/A