Other Template Extractors

The following algorithms are other approaches to template extraction. We have reimplemented all of them and make them publicly available in Section Downloads.

RTDM-TD

  • Karane Vieira, Altigran S. da Silva, Nick Pinto, Edleno S. de Moura, Joao M. B. Cavalcanti, Juliana Freire
    A Fast and Robust Method for Web Page Template Detection and Removal.
    Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM 2006).
    © ACM Press, 2006.
    Available: ACM webpage

RBM-TD

  • Karane Vieira, AndrĂ© Luiz da Costa Carvalho, Klessius Berlt, Edleno S. de Moura, Altigran S. da Silva, Juliana Freire
    On finding templates on web collections.
    Journal World Wide Web. Volume 12, Issue 2, pp 171-211 .
    © Springer-Verlag, 2009.
    Available: Springer webpage

SST

  • Lan Yi, Bing Liu, Xiaoli Li
    Eliminating noisy information in Web pages for data mining.
    Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2003).
    © ACM Press, 2003.
    Available: ACM webpage

Incremental

  • Yu Wang, Bingxing Fang, Xueqi Cheng, Li Guo and Hongbo Xu
    Incremental Web Page Template Detection.
    Proceedings of the 17th international conference on World Wide Web (WWW 2008).
    © ACM Press, 2008.
    Available: ACM webpage