Some problems and their solution.

When writing the Pl/Java, mapping the JVM into the same process-space as the PostgreSQL backend code, some concerns have been raised regarding multiple threads, exception handling, and memory management. Here is a brief text explaining how those issues where resolved.

Multi threading.

Problem

Java is inherently multi threaded. The PostgreSQL backend is not. There’s nothing stopping a developer from utilizing multiple Threads class in the code. Finalizers that call out to the backend might have been spawned from a background Garbage Collection thread. Several Java-packages that are likely to be used make use of multiple threads. How can this model coexist with the PostgreSQL backend in the same process without creating havoc?

Solution

The solution consists of two mechanisms which together form a waterproof protection against multiple threads in the backend code.

a)      All calls from Java and out to the backend functions are synchronized on one and the same object. All native calls are private to ensure that the synchronization cannot be bypassed. The result is that only one thread at a time can make a call from the JVM and out to the backend functions.

b)      A control flag, managed outside of the JVM, keeps track of when the call handler waits for the return of a call into the JVM. If this flag is set, the JVM is allowed to make synchronized calls to the backend. If not set, the JVM will be denied all access. Any attempt to call the backend will result in an exception. This prevents a stray thread from the JVM to enter the backend at a time when it’s not expecting it.

 

The analogy would be to see the JVM as a monster with multiple swords. The backend can cope as long as the monster swings one sword at a time. The synchronization mechanism ensures this. The backend needs to turn its back to the monster and do other things every now and then. The control flag ensures that the monster doesn’t stab the backend from behind.

Exception handling

Problem

Java makes frequent use of try/catch/finally blocks. PostgreSQL sometimes use an exception mechanism that calls longjmp to transfer control to a known state. This jump effectively bypasses the whole JVM and is impossible to catch. No finally block will of course be executed either.

Solution

The current state of the jump buffer (Warn_restart for hackers) is saved by the call handler prior to all calls into the JVM. All calls from the JVM into the backend that might result in a longjmp, will setup its own local jump buffer. If a longjmp occurs, the jump is caught, remembered by raising a flag, and replaced with a Java exception that is thrown. From that point on, and until the JVM returns, the JVM is blocked all access to the PostgreSQL backend code. Once the JVM returns (typically immediately due to the exception), the flag state is examined and the jump “continues” to its intended destination (the original state of the Warn_restart buffer).

 

This allows the JVM to trap all exceptions and to do normal catch/finally processing. The database can of course not be accessed but other housekeeping can be made.

Java Garbage Collector versus palloc()

Problem

Primitive types will be passed by value always. This includes the String type (this is a must since Java uses double byte characters). Complex types are however often wrapped in Java objects and passed by reference. I.e, a Java object will contain a pointer to a palloc’ed memory and use native JNI calls to extract and manipulate data. Such data will become “stale” once a call has ended. Further attempts to access such data will at best give very unpredictable results but more likely cause a memory fault and a crash.

Solution

The Pl/Java contains code that ensures that stale pointers are cleared when the pointer is freed or when MemoryContext that they where allocated in goes out of scope. The Java wrapper objects might live on but any attempt to use them will result in a “closed native handle” exception.

Loader semantics

Problem

Java and JNI will use the following naming scheme when finding the shared library on a Unix box:

Prepend the name with “lib” and then append “.so”. Find the resulting file using the LD_LIBRARY_PATH.

 

On a Windows box it does like this:

Append “.dll” (nothing is prepended) and then use PATH to find the resulting file.

 

Postgres have a scheme of its own. Apparently it doesn’t prepend the “lib” on a Unix box and it uses the Dynamic_library_path instead of the LD_LIBRARY_PATH/PATH to find the module.  Unfortunately, the Dynamic_library_path is not seen by the loader so if a module needs to load other dynamic libraries it will fail unless LD_LIBRARY_PATH/PATH is set correctly.

Solution

The Pl/Java runtime will merge the Dynamic_library_path with the LD_LIBRARY_PATH or PATH and use the result in the JVM. That seems to work fine (the JVM will attempt to load the shared library too and unless it finds the already loaded one, it will fail). In addition, the deploy program that initializes the Java language in the database, has an option –windows, that allows different SQL-syntax to be used on different platforms. On Unix, the module name used will be “libpljava”, on Windows, just “pljava”.

 

PostgreSQL could be improved so that:

a)      It would merge the Dynamic_library_path with the current LD_LIBRARY_PATH or PATH and change the environment used by the forked backend processes. That way, the system loader would function correctly and PostgreSQL would not need any specific code that prepends paths to module names etc. The system loader needs to be functional anyway when a module is dependent on other dynamic libraries.

b)      On Unix systems, the “lib” prefix should be used by default. Backward compatibility can be obtained by testing without “lib” if no dynamic library is found.