List of extractors
Extractors are functions that implicitly take an HTML element and extract data from it as JSON. They are used to specify what raw information you want to scrape from a webpage. You can read more on extractors and how they fit into the bigger picture here. This is only a reference list.
Extractors acting directly on the “focus element”
name: retrieves the name of the element as a string. Aulelement yields the value"ul", anayields"a"and so on…text: retrieves the inner text of the element, removing all HTML markup. If the text is divided in multiple chunks, all the chunks are joined by a single empty space.html: retrieves the whole element HTML as text.inner-html: the whole element HTML as text, not including the element itself, just its child nodes.attr "an-attribute": retrieves the value of an attribute of a given element, if present, else, returnsnull. For example,attr "href"retrieves thehrefattribute when applied to an anchor (aelements), that is, where a given link points to.attrs: retrieves all the attributes of a given element, returning the result as a map of attribute names and attribute values.classes: retrieves all the CSS classes of a given element, returning an array where each element is a string containing the class name.id: retrives the id of a given element, as specified by theidattribute.
Extractors acting on the “immediate family” of an element
parent(extractor): applies an extractor on the parent of a given element. This extractor returnsnullif the node has no parent element (e.g., it is thehtmlright at the beginning of the document).children(extractor): applies an extractor on all children of a given element, returning all results collected in an array.
Extractors acting on sub-selections of an element
select-any(extractor, a .selector): selects a descendant of the element that matches the supplied CSS selector and extracts from it. This is analogous to thequerySelectorfunction in JavaScript. If no matching descendant is found, this extractor returnsnull.select-all(extractor, a .selector): selects all descendants of the element that match the supplied CSS and extracts from each one, collecting all results in an array. This is analogous to thequerySelectorAllfunction in JavaScript. If no matching descendant is found, this extractor returns an empty array.