| URL | Total Nodes | (MainBlock) Route |
(MainBlock) Nodes |
(ContentFiltering) Route |
(ContentFiltering) Nodes |
Recall | Precision | F1 |
|---|---|---|---|---|---|---|---|---|
| en.citizendium.org | 1645 | 2-1-3-9 | 1478 | 2-1-3-9 | 1478 | 100 | 100 | 100 |
URL: Webpage used for content extraction.
Total nodes: total number of nodes in the DOM tree of the webpage.
(Main Block) Route: route to the content block, starting from the body node of the webpage (numbers are children identifiers. E.g., 5-2 represents the second child of the fifth child of the body node).
(Main Block) Nodes :number of DOM nodes in the main content block.
(Content Filtering) Route: route of the retrieved content block, starting from the body node.
Content Filtering) Nodes: number of DOM nodes retrieved as the main content block.
Recall: number of relevant nodes retrieved divided by the total number of relevant nodes.
Precision: number of relevant nodes retrieved divided by the total number of retrieved nodes.
F1: it is computed as (2 * P * R) / (P + R) being P the precision and R the recall.
| Webpage Image | Main Block Content of the Webpage | Main Block Content Extracted with the Algorithm | ||
![]() |
![]() |
![]() |