Introduction
Whilst there are many useful tools for building tailored wordlists (such as CeWL - https://github.com/digininja/CeWL and CUPP - https://github.com/Mebus/cupp) saving words while browsing a website is often overlooked and can help create the ideal wordlist for file/directory discovery and further enumeration.
This post will cover a new Firefox extension we’ve created web2words that saves all words from websites as you browse.
Warning: This extension is currently in testing (be sure to read and understand the code prior to use) so you will need to use Firefox debugging to import.
Use on your testing instance of Firefox only (e.g. Firefox Developer Edition - https://www.mozilla.org/en-US/firefox/developer/) and not your main browsing instance.
Firefox Extension Creation
Mozilla provides excellent, detailed information on creating extensions at https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Your_first_WebExtension
This post will skip over all of the basics and simply show the extension file contents, how to load the extension into Firefox and how to use it in your testing activities.
Prior to commencing, create a directory to save all of the files, e.g. web2words
manifest.json
Save the following to web2words/manifest.json
{
"manifest_version": 3,
"name": "Webpage Text Saver",
"version": "1.0",
"description": "Saves all text from a webpage and updates on changes.",
"permissions": ["activeTab", "tabs", "storage", "downloads"],
"background": {
"scripts": ["backgroundScript.js"]
},
"content_scripts": [
{
"matches": ["<all_urls>"],
"js": ["contentScript.js"]
}
]
}
Further reading on manifest.json https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/manifest.json
contentScript.js
Save the following to web2words/contentScript.js
// Listen for the page to fully load
window.addEventListener("load", function () {
// Once the page is loaded, extract all text
const pageText = document.body.innerText; // Get all visible text from the page
browser.runtime.sendMessage({ action: "saveText", text: pageText });
// Initialize the MutationObserver
const observer = new MutationObserver(function (mutations) {
console.log("DOM has changed");
const updatedPageText = document.body.innerText; // Get updated text from the page
browser.runtime.sendMessage({ action: "updateText", text: updatedPageText });
});
observer.observe(document, {
childList: true,
subtree: true
});
});
// Listen for URL changes (hash changes)
window.addEventListener("hashchange", function () {
const pageText = document.body.innerText; // Get all visible text from the page
browser.runtime.sendMessage({ action: "saveText", text: pageText });
});
// Listen for refresh messages from background script
browser.runtime.onMessage.addListener(function (request, sender, sendResponse) {
if (request.action === "refreshText") {
const pageText = document.body.innerText; // Get all visible text from the page
browser.runtime.sendMessage({ action: "saveText", text: pageText });
}
});
Further reading on content scripts - https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/Content_scripts
backgroundScript.js
Save the following to web2words/backgroundScript.js
let storedText = "";
// Listen for messages from content scripts
browser.runtime.onMessage.addListener(async function(request, sender, sendResponse) {
if (request.action === "saveText") {
storedText = request.text; // Initialize stored text
// Save to storage
await browser.storage.local.set({ text: storedText });
// Save stored text to a file
await saveToFile();
} else if (request.action === "updateText") {
storedText += "\n--- Updated Text ---\n" + request.text; // Append updated text
// Save to storage
await browser.storage.local.set({ text: storedText });
// Save updated text to a file
await saveToFile();
}
});
// Listen for URL changes
browser.tabs.onUpdated.addListener(function (tabId, changeInfo, tab) {
if (changeInfo.status === "complete") {
browser.tabs.sendMessage(tabId, { action: "refreshText" });
}
});
// Function to save stored text to a file
async function saveToFile() {
const text = await browser.storage.local.get("text");
if (text.text) {
const blob = new Blob([text.text], { type: "text/plain" });
const url = URL.createObjectURL(blob);
await browser.downloads.download({
url: url,
filename: "webpageText.txt",
saveAs: false
});
URL.revokeObjectURL(url);
}
}
Further reading on background scripts - https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/manifest.json/background
Bash helper scripts
Two helper scripts (clean.sh and gen-words.sh) are also used (these will likely be phased out when the extension matures) to generate a clean wordlist and delete downloaded files.
Adapt as required for your use.
gen-words.sh
This script will generate a words.txt file with a clean wordlist.
Save the following to web2words/gen-words.sh
#!/bin/bash
cat ~/Downloads/webpageText* | sort -u | tr " " "\n" | tr "\t" "\n" | sort -u >> words.txt
cat words.txt | sed "s/[^[:alnum:]]/\n/g" | sort -u >> words.txt
sort -u words.txt -o words.txt
clean.sh
To clean-up the downloaded webpageText files and delete your words.txt wordlist.
Save the following to web2words/clean.sh
#!/bin/bash
rm words.txt
rm ~/Downloads/webpageText*
Loading the extension
In Firefox, navigate to about:debugging#/runtime/this-firefox
then click “Load Temporary Add-on” as shown below:
Click on any file in the web2words directory relating to the extension (e.g. manifest.json) and confirm the extension has been loaded as follows:
Testing it out
Navigate to any website (e.g. https://nmap.org) and confirm the Firefox Download progress indicator flashes:
Clicking on the Download button will reveal the filename for each download (i.e. webpageText.txt and so on for each subsequent download):
Generating and viewing the wordlist
To generate the wordlist, simply run ./gen-words.sh
View the contents as follows:
cat words.txt
...
administrators
...
API
...
capture
...
Ncat
...
Ndiff
...
nmap
...
At this point we’ve confirmed everything is functional. Continue browsing your target website and run the ./gen-words.sh
script from time to time to update the wordlist.
Conclusion
This concludes the basic setup and use of our web2words Firefox Extension.
Feel free to contact us if you have any suggestions, questions or would like to schedule a meeting to discuss anything further.