update README.txt with charmap description. remove README.md.

This commit is contained in:
Tom Quackenbush 2016-08-02 15:52:41 +00:00
parent 265ff3dc70
commit 610c0e72eb
2 changed files with 28 additions and 74 deletions

View File

@ -1,66 +0,0 @@
ejabberd-contrib
================
This is a collaborative development area for ejabberd module developers
and users.
For users
---------
To use an ejabberd module coming from this repository:
- You need to have ejabberd installed.
- If you have not already done it, run `ejabberdctl modules_update_specs`
to retrieve the list of available modules.
- Run `ejabberdctl module_install <module>` to get the source code and to
compile and install the `beam` file into ejabberd's module search path.
This path is either `~/.ejabberd-modules` or defined by the
`CONTRIB_MODULES_PATH` setting in `ejabberdctl.cfg`.
- Edit the configuration file provided in the `conf` directory of the
installed module and update it to your needs. Then apply the changes to
your main ejabberd configuration. In a future release, ejabberd will
automatically add this file to its runtime configuration without
changes.
- Run `ejabberdctl module_uninstall <module>` to remove a module from
ejabberd.
For developers
--------------
The following organization has been set up for the development:
- Development and compilation of modules is done by ejabberd. You need
ejabberd installed. Use `ejabberdctl module_check <module>` to ensure it
compiles correctly before committing your work. The sources of your
module must be located in `$CONTRIB_MODULES_PATH/sources/<module>`.
- Compilation can by done manually (if you know what you are doing) so you
don't need ejabberd running:
```
cd /path/of/module
mkdir ebin
/path/of/ejabberd's/erlc \
-o ebin \
-I include -I /path/of/ejabberd/lib/ejabberd-XX.YY/include \
-DLAGER -DNO_EXT_LIB \
src/*erl
```
- The module directory structure is usually the following:
* `README.txt`: Module description.
* `COPYING`: License for the module.
* `doc/`: Documentation directory.
* `src/`: Erlang source directory.
* `lib/`: Elixir source directory.
* `priv/msgs/`: Directory with translation files (pot, po and msg).
* `conf/<module>.yml`: Configuration for your module.
* `<module>.spec`: Yaml description file for your module.
- Module developers should note in the `README.txt` file whether the
module has requirements or known incompatibilities with other modules.

View File

@ -1,8 +1,10 @@
The 'mod_pottymouth' ejabberd module aims to fill the void left by 'mod_shit'
which has disappeared from the net. It allows individual whole words of a
message to be filtered against a blacklist. It allows multiple blacklists
sharded by language. To make use of this module the client must add the xml:lang
attribute to the message xml.
sharded by language. The internal bloomfilter can support arbitrary blacklist
sizes. Using a large list (say, 87M terms) will slow down the initial server
boot time (to about 15 minutes respectively), but once loaded lookups are very
speedy.
To install in ejabberd:
@ -25,11 +27,31 @@ modules:
en: /home/your_user/blacklist_en.txt
cn: /home/your_user/blacklist_cn.txt
fr: /home/your_user/blacklist_fr.txt
charmaps:
default: /etc/ejabberd/modules/mod_pottymouth/charmap_en.txt
en: /etc/ejabberd/modules/mod_pottymouth/charmap_en.txt
For each language (en,cn,fr,...whatever) provide a full path to a backlist file.
The blacklist file is a plain text file with blacklisted words listed one per
line.
You can also provide an optional 'charmap' for each language. This allows you
to specify simple substitutions that will be made on the fly so you don't need
to include those permutations in the blacklist. This keeps the blacklist small
and reduces server startup time. For example, if you included the word:
'xyza' in the blacklist, adding the following substitutions in the charmap
would filter permutations such as 'XYZA', 'xYz4', or 'Xyz@' automatically.
charmap format:
[
{"X", "x"},
{"Y", "y"},
{"Z", "z"},
{"@", "a"},
{"4", "a"}
].
Gotchas:
The language will be looked up by whatever value is passed in the xml:lang
@ -40,13 +62,11 @@ the 'default' entry in config will be used.
For xml:lang attribute docs, see:
http://wiki.xmpp.org/web/Programming_XMPP_Clients#Sending_a_message
The internal bloomfilter used to ingest the blacklists currently requires about
4,000 entries in the blacklist to ensure acceptable error probability. (We've
gotten around this by duplicating entries in a short list)
Blacklist helper
Todo:
Look into acceptable error probabilities for shorter blacklists.
Thinking of a bunch of swear words and all the permutations can be tough. We made
a helper script to take a bare wordlist and generate permutations given a
dictionary of substitution characters: https://github.com/madglory/permute_wordlist
Tip of the hat: