URL Total Nodes (MainBlock)
Route
(MainBlock)
Nodes
(ContentFiltering)
Route
(ContentFiltering)
Nodes
Recall Precision F1
en.citizendium.org 1645 2-1-3-9 1478 2-1-3-9 1478 100 100 100


The meaning of every column is:

URL: Webpage used for content extraction.
Total nodes: total number of nodes in the DOM tree of the webpage.
(Main Block) Route: route to the content block, starting from the body node of the webpage (numbers are children identifiers. E.g., 5-2 represents the second child of the fifth child of the body node).
(Main Block) Nodes :number of DOM nodes in the main content block.
(Content Filtering) Route: route of the retrieved content block, starting from the body node.
Content Filtering) Nodes: number of DOM nodes retrieved as the main content block.
Recall: number of relevant nodes retrieved divided by the total number of relevant nodes.
Precision: number of relevant nodes retrieved divided by the total number of retrieved nodes.
F1: it is computed as (2 * P * R) / (P + R) being P the precision and R the recall.



Webpage Image   Main Block Content of the Webpage   Main Block Content Extracted with the Algorithm