This is aimed at developers who would like to help to enhance mmftpd, mmmail and/or the pthread_utils library. Note that the mmsoftware stable code base is not affected by this experimental work which only occurs on the pthread-branch. We also would like mmftpd and mmmail users to help with stress testing of the system and report any problems. UPDATE: mmmail (mmsmtpd and mmpop3d) has also been ported from pth to pthread + pthread_utils, then again only on the pthread-branch. NOTES: all three deamons are running very well under NetBSD 2.1. On Linux 2.4, however, there are several problems. First, if the daemon is set to be launched as root and to change credentials, on linux the pool of threads has threads running as root, because threads are actually processes. There are other problems which I haven't diagnosed yet. It is very possible that SIGUSR2 does not propagate to all threads. It is also possible that the interthread messaging using conditional wait variables also experience problems. However, what seems more obvious is the broken recursive mutex support, which causes mmfd library calls to hang. Moreover, there requires to be a few added #ifdef __Linux__ or #ifdef __GLIBC__ for the code to compile on Linux at this time. It is very possible that apart from the compilation fixes, that it'll work better on Linux 2.6. Since I only have one Linux box, a debian desktop, which I would like to keep in good working and stable state, I'm unsure when I'll attempt another 2.6 switch. The two last ones have been problematic, but it has been some time and the 2.6 code base was extremely unstable back then. If at least 2.4 used LWPs instead of full fledged processes and __clone(2) for threads, it would probably at least have solved the credentials problems, more observing the POSIX specification. This needs to be verified. OVERVIEW ======== mmftpd is one of the daemons I wrote several years back which I continued to maintain until it could be used on production systems. It currently is used on several production systems and it performs its tasks well. However, there is an intent to eventually migrate it to use POSIX threads, particularily now that NetBSD comports a proper native threading system. This is not a short term effort, but rather a long term one, and should help the mmftpd code base to not become antiquated in a few years. We believe that mmftpd has been innovative in its unusual software approach to implement an FTP server and that porting it to use pthread will provide innovative ways to use the native POSIX thread libraries which most unix-like OSs now provide. One if its innovative aspects to users in the past has been to be able to function under normal user privileges under all conditions, when most FTP servers required superuser privileges then. However, from a programmer standpoint, what we find innovative is the way the application attempts to fusion efficient interthread messages (implemented on top of pthread mutexes and conditional variables) and traditional unix filedescriptor events (which POSIX threads lack support for). On unix systems, particularily in networking software, we are stuck with filedescriptors and related events polling. These are incompatible with efficient interthread messaging techniques which can be developed using standard pthread, and this continues to make most software developers to stick to extensively used approaches: - The multiprocess approach, using a process per served client, with fork(2) at each client (BSD inetd and ftpd are using this method) - The multiprocess approach using a pool of pre-fork(2)ed processes which can accept(2) but are synchronized using a lock such as flock(2) and can recycle the same process after serving a client. The pool is managed by a main process. This approach is used by apache 1.3 and mmlib/mmserver2(3) (which is used by mmsoftware/mmspawnd2(8) and several close sourced daemons I wrote for NinjaSystems, Tact group, Logatec and Pulsar-Zone). So far this appears to be one of the most reliable approaches, since a process can be replaced after a certain number of requests have been served, even dealing with potential memory leaks, and yields decent performance since a ready pool of processes is used, and each process reused a certain number of times). This system still lacks performance when shared memory is used because interprocess synchronization is more costly than interthread synchronization within a common process. - The single process non-blocking descriptors approach, using a main loop in a state machine to serve many clients (ircds commonly use this approach considering the frequency of shared memory operations which are required but would require some form of synchronization otherwise). - A hybrid approach of the two concepts above, where a pool of processes is used, each of which being able to serve a certain class of clients at once, either using threads or a state machine (more common). Classes for which all clients need common specific shared memory can use a common process to require less synchronization performance loss. Introducing threads can raise a multitude of problems. It is a more recent research area, especially on unix systems. Issues relating to several factors arise, such as: - Preemtiveness (or lack thereof for certain functions) - Blocking and non-blocking syscalls (and threadsafe and non-threadsafe ones) - Reentrency - Security (since threads share the same process, a successful attack against one of them can affect the whole application). Privilege separation is less intuitive and requires other processes rather than threads. - The inconsistency among pthread implementations, although all claiming to be POSIX compliant. For instance, a thread can be considered a process on most Linux kernels (using _clone(2), a bit similar to an LWP) and can have independent privileges, receive signals using traditional signal(2) which don't propagate to all threads. Another example consists of the pthread interface layer over pth which is not reentrant except when patched to override syscalls, which still does not solve userspace reentrency problems). - The inability to use execve(2) easily - The inability of some implementations to use fork(2) after a main thread was created (this was observed on Linux 2.2 and linuxthreads) - Special care that must be made to execute functions which could exceed thread stack limits or which are considered friend unfriendly i.e. dispatching such functions to other processes (this approach was also used with pth), or enclosing such functions within a global mutex (which is only okay for functions which are certain to return very quickly). In an attempt to experiment other techniques, mmftpd has at its disposal both a pool of threads and a pool of processes. The application is mainly thread driven, but can also dispatch functions to full fleged processes as needed. With pth, many functions which caused problems because of its non-preemptive nature (or fixed stack nature) needed to be dispatched to these, but with pthread, we beleive that these will be rare and that a performance gain should occur. Moreover, the system should also scale better on SMP systems using pthread. The pool of processes were the only facility which allowed mmftpd to make use of any other processor when using pth. PTHREAD_UTILS ============= I wrote a library (pthread_utils) which provides the following: - Easy creation of a ready threads pool which can be dispatched tasks as necessary without the need to create new threads. The number of threads in the pool will grow as necessary, and shrink as well if too many unused threads are there for some time. Depends on mmlib/mmpool(3) library. (there needs to be an enhancement to mmpool(3) in the way it calculates its averages/statistics, it is believed that the application should cause a function to be scheduled at regular intervals to evaluate these than to attempt to do it at pool_free() this is another matter, however, which will be dealt with in time). - Provision of an efficient interthread notification and messaging system similar to AmigaOS intertask messaging or AmigaOS-inspired pth messages. Rings are used for event/message notification and messages can be queued to ports. Contrary to those implementations, an arbitrary number of ports can cause a single ring to be notified on their events. On AmigaOS, creating a port caused a signal bit to be reserved, and a task had a limited number of user signal bits it could allocate. - Implementation of a poll(2) replacement which can both deal with FD and ring events. These two systems are generally incompatible and unfortunately the POSIX pthread standard lacks important functionality in this area. The system basically sets up a polling thread with which communication is maintained through efficient interthread messaging. This polling thread operates as an independent device which can be passed descriptor sets and from which FD event notifications can be obtained. In order to allow the polling thread to awake upon reception of ring events while it is waiting in poll(2), the SIGUSR2 signal is clobbered with a NOOP signal handler function. The goal is to cause poll(2) to immediately return with EINTR. This also means that all threads should be blocking that signal, except the polling thread, which generally can unblock it before going into poll(2). It is currently uncertain if it would be adequate to simply always leave the SIGUSR2 unblocked in the polling thread. - Implementation of pthread_connect_ring() and pthread_accept_ring() functions, tailored using pthread_poll_ring(). These functions can return upon a timeout or ring event. - Various timed thread suspension functions such as pthread_nanosleep(), pthread_sleep(), etc. These are based around pthread_cond_timedwait(). This library attempts to not stray away from the BSD/unix and POSIX standards but to magically assemble the two to obtain a special pthread-unix fusion library. As such, it should require no modifications to the standard libraries provided by an OS, and it should be rather portable. MMSERVER ======== To further complicate things, mmftpd also makes use of an old library, mmlib/mmserver(3). This allows the network client accepting code to be isolated, but also allows mmftpd to easily dispatch functions to processes as necessary when such functions are considered inadequate to execute under a thread context. This for instance could include a recursive function which is considered to require too much stack space, or a function which should be executing under different privileges, using privilege isolation. In the case where pth was used, it was also used to execute functions which were considered CPU intensive enough or too long to return to require preemption if executed under a thread context (pth does not provide such preemption unless pth_yield() calls were inserted into the loops). Another example, getnameinfo(3) can require an arbitrary delay to resolve the hostname for an IP address. It originally was developed because of pth which could not take advantage of SMP and which was also inefficient at certain tasks (pth is based around a main select(2) loop and consists entirely of a userspace non-preemtive thread library). mmserver's async system works in a similar fashion to the previously described subsystem: - A device thread is dedicated to dispatch syscall-like functions through interthread messaging. It queues the requests and dispatches it to the process pool whenever a process is ready to be served a request. It communicates with those processes using BSD IPC, through AF_UNIX SOCK_DGRAM socketpairs. The device then uses a special poll(2) replacement which allows it to both continue to process interthread message requests as well as result messages from the busy processes (pthread_poll_ring()). When a message is sent back through a socketpair, a reply packet is sent back with the function results through interthreading messaging to the thread which issued the request. MMFTPD ====== Moreover, each mmftpd client currently requires two threads, one which deals with transfers, another device thread through which communication occurs using interthread messaging. This allows to easily isolate the control and data aspects of the FTP server. This thread requires two special functions based on the previously described pthread_utils' pthread_poll_ring(): pthread_accept_ring() and pthread_connect_ring() which both serve as accept(2) and connect(2) replacements also awaking upon interthread ring events. Eventually, it is beleived that a single transfer thread device shall be used for all client threads. This is not a priority at current time. Or, we could use a pool of transfer threads and allow each transfer thread to dispatch requests for several client threads. STRESSING THE SYSTEM ==================== Because of the complexity of mmftpd and the heavy usage of the pthread_utils library, it is considered a good testing platform for it. It thus was chosen as a first application to attempt to migrate from pth to pthreads using the new library. HELPING ======= How to get the sources and compile them --------------------------------------- $ mkdir work && cd work $ cvs -q -d:pserver:anoncvs@cvs.pulsar-zone.net:/cvsroot \ co mmondor/tests/pthread_utils $ cvs -q -d:pserver:anoncvs@cvs.pulsar-zone.net:/cvsroot \ co -rpthread-branch mmondor/mmsoftware $ cd mmondor/mmsoftware/mmftpd pthread_utils library is in ../../tests/pthread_utils/ mmftpd is src/mmftpd.c mmlib is in ../mmlib Note that if you are not using i386 you will need to edit ../mmlib/mmarch.h You can set your wanted additional compiler flags in CFLAGS env var also. $ gmake mmftpd binary is src/mmftpd, you'll need to setup basic /usr/local/etc/mmftpd.conf and /usr/local/etc/mmftpdpasswd config files. Optionally, you can use gmake install if you want mmftpd to be stripped and installed under /usr/local prefix. If you want to get rid about mmstat related messages shown via syslog, you should also build and install mmstatd, and launch it before lauching mmftpd. This will also allow to maintain useful statistics. Check provided mmstat(8), mmstatd(8), mmstatd.conf(5), mmftpd(8), mmftpd.conf(5) and mmftpdpasswd(5) manual pages for more information. Debugging --------- You should uncomment the debugging CFLAGS and LDFLAGS options from GNUmakefile and execute mmftpd through gdb: # cd [...]/work/mmondor/mmftpd/src # gdb ./mmftpd > run /usr/local/etc/mmftpd.conf NetBSD 2.1 appears to support "info threads" and "thread " directives to allow switching threads and querying their stack trace. These are probably useful to discover what's wrong. If you happen to find a problem, you are welcome to provide me with the results of a "cvs -q diff -uN" command :) Note that I must still be able to continue to release the code under the current BSD-style license and that the diff might be applied differently, but the goal is to discover and fix any potential problem. Because of the limited resources and time I can put into this, I highly appreciate any help. I shall resume serious debugging at some point and possibly write a suite of complete tests for the pthread_utils library. It has not been possible for me to write this yet. OTHER EVENTUAL ENHANCEMENTS =========================== These are other enhancements that the codebase should eventually undergo: - mmlib/string library should be optimized to use bitwise operators to perform long alignment/moves optimizations. A debate originally rose at NinjaSystems about this and less efficient macros are still in the code. - mmlib/mmpool should probably account its statistics using time rather than frequency of pool_free(), without needing an extra syscall if possible, and without clobbering a signal on its own. It should probably let the application call its GC function at regular intervals, or at rare events where the extra syscall would be considered adequate. - Possibly implement a simple and fast syntax checker which could be shared and used in the main command loop instead of having every command sanity check the arguments. - Possibly use a single transfer thread for all clients. If performance loss is observed on SMP systems then, use a poll of transfer threads but not one per client. Each thread should be able to handle a number of clients then. A booking system would be used. CONTACT ======= Matthew Mondor mmsoftware@pulsar-zone.net