These content extraction tools have been implemented as WebExtensions.
The source code of both plugins is public (but the use of this source code needs permission from the authors). If you want to use this code in any form, please contact the authors.
Both extensions contain the following files:
- manifest.json: Identifies the plugin and specifies the internal organization of the plugin. It also specifies the interface and the permissions of the extension.
- background-scripts/background.js: This script is loaded as soon as the extension is loaded. This script will be loaded until the user disables the extension. It contains listeners that interact with the user's actions.
- content-scripts/createNamespaces.js: Creates the needed namespaces.
- content-scripts/browserOverlay.js: Implements the methods needed to run the extension (load web pages, extract the main content, toggle the view, etc.).
- _locales: This folder contains translations to english, spanish, french, and german.
The core algorithm of both techniques is implemented with Javascript.
Site-level ConEx is composed of:
- content-scripts/ConEx/ContentExtractor.js
- content-scripts/ConEx/loader/PageLoader.js
- content-scripts/ConEx/misc/Misc.js
- content-scripts/ConEx/site/Website.js
- content-scripts/ConEx/site/Webpage.js
- content-scripts/ConEx/site/Link.js
- content-scripts/ConEx/util/Hashtable.js
- content-scripts/ConEx/util/TreeSearch.js
- content-scripts/ConEx/algorithm/ConEx/ConEx.js
- content-scripts/ConEx/algorithm/ConEx/Config.js
- content-scripts/ConEx/algorithm/ConEx/Map.js
- content-scripts/ConEx/algorithm/ConEx/HierarchyLinks.js
Page-level ConEx is composed of:
- content-scripts/ConEx/ContentExtractor.js
- content-scripts/ConEx/loader/PageLoader.js
- content-scripts/ConEx/misc/Misc.js
- content-scripts/ConEx/site/Website.js
- content-scripts/ConEx/site/Webpage.js
- content-scripts/ConEx/site/Link.js
- content-scripts/ConEx/util/Hashtable.js
- content-scripts/ConEx/util/TreeSearch.js
- content-scripts/ConEx/algorithm/ConEx/ConEx.js
- content-scripts/ConEx/algorithm/ConEx/Config.js
- content-scripts/ConEx/algorithm/ConEx/Content.js