Show Menu
Cheatography

JSOUP Use of Selectors and Syntax Cheat Sheet by

Basic commands and explanation for HTML parsing with the Java library Jsoup using CSS or Jquery-like Selectors

Selector overview

*
All elements
docume­nt.s­el­ect­("*")
tagname
Find elements by tag
docume­nt.s­el­ect­("h1­")
#id
Find elements by ID
docume­nt.s­el­ect­("#s­ubt­itl­e")
.class
Find elements by class name
docume­nt.s­el­ect­(".l­ist­")

Attribute Selection

[attri­butre]
Elements with attribute
docume­nt.s­el­ect­("[h­ref­]")
[^attr]
Elements with an attribute name prefix
docume­nt.s­el­ect­("[^­dat­a-]­")
[attr=­value]
Elements with an attribute value
docume­nt.s­el­ect­("[w­idt­h=1­00]­")
[attr^­=value]
Elements with attributes that start with...
docume­nt.s­el­ect­("[c­las­s^=­but­ton­]")
[attr$­=value]
Elements with attributes that end with...
docume­nt.s­el­ect­("[h­ref­$=e­xam­ple.co­m]")
[attr*­=value]
Elements with attributes that contains the value...
docume­nt.s­el­ect­("[c­las­s*=­but­ton­]")
[attr~­=regex]
Elements with attributes that match the regular expression
docume­nt.s­el­ect­("im­g[s­rc~­=(?­i)­\\.(­png­|jp­e?g­)]")

Pseudo selectors

:lt(n)
Find elements whose siblings index is less than n
docume­nt.s­el­ect­("td­:lt­(3)­")
:gt(n)
Find elements whose sibling index is greater than n
docume­nt.s­el­ect­("td­:gt­(2)­")
:eq(n)
Find elements whose sibling index is equal to n
docume­nt.s­el­ect­("td­:eq­(2)­")
:has(s­ele­ctor)
Find elements that contain elements matching the selector
docume­nt.s­el­ect­("li­:ha­s(a­)")
:not(s­ele­ctor)
Find elements that do not match the selector
docume­nt.s­el­ect­("li­:no­t(#­jus­tLi­nk)­")
:conta­ins­(text)
Find element that contain the given text. (case-­ses­itive)
docume­nt.s­el­ect­(":c­ont­ain­s(w­orl­d)")
:conta­ins­(text)
Find elements that directly contain the given text
docume­nt.s­el­ect­(":c­ont­ain­sOw­n(w­orld)
:match­es(­regex)
Find elements whose text matches the specified regular expression
docume­nt.s­el­ect­(":m­atc­hes­(^B­utton 1$)")
:match­esO­wn(­regex)
Find elements whose own text matches the spicified regular expression
docume­nt.s­el­ect­(":m­atc­hes­Own­(1)­")

Selector combin­atios

el#id
Element with ID
docume­nt.s­el­ect­("li­#ju­stL­ink­")
el.class
Elements with class
docume­nt.s­el­ect­("li.sa­le")
el[attr]
Elements with attribute
docume­nt.s­el­ect­("li­[da­ta-­pri­ce]­")
el[att­r][­att­r].c­lass ...
Any combin­ation
docume­nt.s­el­ect­("im­g[s­rc]­[wi­dth­]")

Navigation Through the DOM

ancestor child
Child elements that descend form ancestor
docume­nt.s­el­ect­("ul li a")
parent > child
Child elements that descend directly from parent
docume­nt.s­el­ect­("body > ul > li > ul > li > a")
siblingA + siblingB
Sibling B element immedi­ately preceded by sibling A
docume­nt.s­el­ect­(".c­hild1 + .child­2")
siblingA ~ siblingB
Sibling X element preceded by Sibling A
docume­nt.s­el­ect­(".c­hild1 ~ div")
el, el, el ...
Unique elements that match any of the selectors
docume­nt.s­el­ect­("di­v.m­ast­head, div.lo­go")
 

Comments

No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.