From ac026647f4f5e20f0201e55973660bde1504e9d3 Mon Sep 17 00:00:00 2001 From: Pysis Date: Fri, 18 Feb 2022 09:01:57 -0500 Subject: [PATCH] xidel: add page (#7678) --- pages/common/xidel.md | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) create mode 100644 pages/common/xidel.md diff --git a/pages/common/xidel.md b/pages/common/xidel.md new file mode 100644 index 000000000..b130b9151 --- /dev/null +++ b/pages/common/xidel.md @@ -0,0 +1,36 @@ +# xidel + +> Download and extract data from HTML/XML pages as well as JSON APIs. +> More information: . + +- Print all URLs found by a Google search: + +`xidel {{https://www.google.com/search?q=test}} --extract "//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']"` + +- Print the title of all pages found by a Google search and download them: + +`xidel {{https://www.google.com/search?q=test}} --follow "{{//a/extract(@href, 'url[?]q=([^&]+)&', 1)[. != '']}}" --extract {{//title}} --download {{'{$host}/'}}` + +- Follow all links on a page and print the titles, with XPath: + +`xidel {{https://example.org}} --follow {{//a}} --extract {{//title}}` + +- Follow all links on a page and print the titles, with CSS selectors: + +`xidel {{https://example.org}} --follow "{{css('a')}}" --css {{title}}` + +- Follow all links on a page and print the titles, with pattern matching: + +`xidel {{https://example.org}} --follow "{{{.}*}}" --extract "{{{.}}}"` + +- Read the pattern from example.xml (which will also check if the element containing "ood" is there, and fail otherwise): + +`xidel {{path/to/example.xml}} --extract "{{ood{.}}}"` + +- Print all newest Stack Overflow questions with title and URL using pattern matching on their RSS feed: + +`xidel {{http://stackoverflow.com/feeds}} --extract "{{{title:=.}{uri:=@href}+}}"` + +- Check for unread Reddit mail, Webscraping, combining CSS, XPath, JSONiq, and automatically form evaluation: + +`xidel {{https://reddit.com}} --follow "{{form(css('form.login-form')[1], {'user': '$your_username', 'passwd': '$your_password'})}}" --extract "{{css('#mail')/@title}}"`