ejabberd-contrib/mod_pottymouth
Badlop 2bf5a362ce This doesn't stop banword, but at least satisfies Dialyzer 2022-09-06 18:02:16 +02:00
..
conf Modules that require configuration, provide it commented (#303) 2021-07-07 21:53:44 +02:00
src This doesn't stop banword, but at least satisfies Dialyzer 2022-09-06 18:02:16 +02:00
COPYING adding mod_pottymouth 2016-06-24 15:34:37 +00:00
README.md Convert modules README.txt to markdown syntax 2022-08-12 10:45:09 +02:00
mod_pottymouth.spec code cleanup. add optional normalize_leet to do dynamic substitution of one-to-one mappings (reduce blacklist size). 2016-07-19 21:34:04 +00:00

README.md

mod_pottymouth - Filter bad words in messages

  • Author: Tom Quackenbush

This module allows individual whole words of a message to be filtered against a blacklist.

It allows multiple blacklists sharded by language. The internal bloomfilter can support arbitrary blacklist sizes. Using a large list (say, 87M terms) will slow down the initial server boot time (to about 15 minutes respectively), but once loaded lookups are very speedy.

mod_pottymouth aims to fill the void left by mod_shit which has disappeared from the net.

Configuration

modules:
  mod_pottymouth:
    blacklists:
      default: /home/your_user/blacklist_en.txt
      en: /home/your_user/blacklist_en.txt
      cn: /home/your_user/blacklist_cn.txt
      fr: /home/your_user/blacklist_fr.txt
    charmaps:
      default: /etc/ejabberd/modules/mod_pottymouth/charmap_en.txt
      en: /etc/ejabberd/modules/mod_pottymouth/charmap_en.txt

For each language (en,cn,fr,...whatever) provide a full path to a backlist file. The blacklist file is a plain text file with blacklisted words listed one per line.

You can also provide an optional 'charmap' for each language. This allows you to specify simple substitutions that will be made on the fly so you don't need to include those permutations in the blacklist. This keeps the blacklist small and reduces server startup time. For example, if you included the word: 'xyza' in the blacklist, adding the following substitutions in the charmap would filter permutations such as XYZA, xYz4, or Xyz@ automatically.

Charmap format:

[
 {"X", "x"},
 {"Y", "y"},
 {"Z", "z"},
 {"@", "a"},
 {"4", "a"}
].

Gotchas

The language will be looked up by whatever value is passed in the xml:lang attribute of the xml message. So, any xml:lang value to be supported will need a corresponding entry/blacklist in the config file. If xml:lang is missing, the 'default' entry in config will be used.

For xml:lang attribute docs, see: http://wiki.xmpp.org/web/Programming_XMPP_Clients#Sending_a_message

Blacklist helper

Thinking of a bunch of swear words and all the permutations can be tough. We made a helper script to take a bare wordlist and generate permutations given a dictionary of substitution characters: https://github.com/madglory/permute_wordlist