From 610c0e72eb0016fad7c04c783470438cdc6c4009 Mon Sep 17 00:00:00 2001 From: Tom Quackenbush Date: Tue, 2 Aug 2016 15:52:41 +0000 Subject: [PATCH] update README.txt with charmap description. remove README.md. --- README.md | 66 --------------------------------------- mod_pottymouth/README.txt | 36 ++++++++++++++++----- 2 files changed, 28 insertions(+), 74 deletions(-) delete mode 100644 README.md diff --git a/README.md b/README.md deleted file mode 100644 index 49970e4..0000000 --- a/README.md +++ /dev/null @@ -1,66 +0,0 @@ -ejabberd-contrib -================ - -This is a collaborative development area for ejabberd module developers -and users. - - -For users ---------- - -To use an ejabberd module coming from this repository: - -- You need to have ejabberd installed. - -- If you have not already done it, run `ejabberdctl modules_update_specs` - to retrieve the list of available modules. - -- Run `ejabberdctl module_install ` to get the source code and to - compile and install the `beam` file into ejabberd's module search path. - This path is either `~/.ejabberd-modules` or defined by the - `CONTRIB_MODULES_PATH` setting in `ejabberdctl.cfg`. - -- Edit the configuration file provided in the `conf` directory of the - installed module and update it to your needs. Then apply the changes to - your main ejabberd configuration. In a future release, ejabberd will - automatically add this file to its runtime configuration without - changes. - -- Run `ejabberdctl module_uninstall ` to remove a module from - ejabberd. - - -For developers --------------- - -The following organization has been set up for the development: - -- Development and compilation of modules is done by ejabberd. You need - ejabberd installed. Use `ejabberdctl module_check ` to ensure it - compiles correctly before committing your work. The sources of your - module must be located in `$CONTRIB_MODULES_PATH/sources/`. - -- Compilation can by done manually (if you know what you are doing) so you - don't need ejabberd running: - ``` - cd /path/of/module - mkdir ebin - /path/of/ejabberd's/erlc \ - -o ebin \ - -I include -I /path/of/ejabberd/lib/ejabberd-XX.YY/include \ - -DLAGER -DNO_EXT_LIB \ - src/*erl - ``` - -- The module directory structure is usually the following: - * `README.txt`: Module description. - * `COPYING`: License for the module. - * `doc/`: Documentation directory. - * `src/`: Erlang source directory. - * `lib/`: Elixir source directory. - * `priv/msgs/`: Directory with translation files (pot, po and msg). - * `conf/.yml`: Configuration for your module. - * `.spec`: Yaml description file for your module. - -- Module developers should note in the `README.txt` file whether the - module has requirements or known incompatibilities with other modules. diff --git a/mod_pottymouth/README.txt b/mod_pottymouth/README.txt index 9d7649d..809a59a 100644 --- a/mod_pottymouth/README.txt +++ b/mod_pottymouth/README.txt @@ -1,8 +1,10 @@ The 'mod_pottymouth' ejabberd module aims to fill the void left by 'mod_shit' which has disappeared from the net. It allows individual whole words of a message to be filtered against a blacklist. It allows multiple blacklists -sharded by language. To make use of this module the client must add the xml:lang -attribute to the message xml. +sharded by language. The internal bloomfilter can support arbitrary blacklist +sizes. Using a large list (say, 87M terms) will slow down the initial server +boot time (to about 15 minutes respectively), but once loaded lookups are very +speedy. To install in ejabberd: @@ -25,11 +27,31 @@ modules: en: /home/your_user/blacklist_en.txt cn: /home/your_user/blacklist_cn.txt fr: /home/your_user/blacklist_fr.txt + charmaps: + default: /etc/ejabberd/modules/mod_pottymouth/charmap_en.txt + en: /etc/ejabberd/modules/mod_pottymouth/charmap_en.txt For each language (en,cn,fr,...whatever) provide a full path to a backlist file. The blacklist file is a plain text file with blacklisted words listed one per line. +You can also provide an optional 'charmap' for each language. This allows you +to specify simple substitutions that will be made on the fly so you don't need +to include those permutations in the blacklist. This keeps the blacklist small +and reduces server startup time. For example, if you included the word: +'xyza' in the blacklist, adding the following substitutions in the charmap +would filter permutations such as 'XYZA', 'xYz4', or 'Xyz@' automatically. + +charmap format: + +[ + {"X", "x"}, + {"Y", "y"}, + {"Z", "z"}, + {"@", "a"}, + {"4", "a"} +]. + Gotchas: The language will be looked up by whatever value is passed in the xml:lang @@ -40,13 +62,11 @@ the 'default' entry in config will be used. For xml:lang attribute docs, see: http://wiki.xmpp.org/web/Programming_XMPP_Clients#Sending_a_message -The internal bloomfilter used to ingest the blacklists currently requires about -4,000 entries in the blacklist to ensure acceptable error probability. (We've -gotten around this by duplicating entries in a short list) +Blacklist helper -Todo: - -Look into acceptable error probabilities for shorter blacklists. +Thinking of a bunch of swear words and all the permutations can be tough. We made +a helper script to take a bare wordlist and generate permutations given a +dictionary of substitution characters: https://github.com/madglory/permute_wordlist Tip of the hat: