When writing the Pl/Java, mapping the JVM into the same process-space as the
PostgreSQL backend code, some concerns have been
raised regarding multiple threads, exception handling, and memory management.
Here is a brief text explaining how those issues where resolved.
Java is inherently multi threaded. The PostgreSQL backend is not. There’s nothing stopping a developer from utilizing multiple Threads class in the code. Finalizers that call out to the backend might have been spawned from a background Garbage Collection thread. Several Java-packages that are likely to be used make use of multiple threads. How can this model coexist with the PostgreSQL backend in the same process without creating havoc?
The solution consists of two mechanisms which together form a waterproof protection against multiple threads in the backend code.
a) All calls from Java and out to the backend functions are synchronized on one and the same object. All native calls are private to ensure that the synchronization cannot be bypassed. The result is that only one thread at a time can make a call from the JVM and out to the backend functions.
b) A control flag, managed outside of the JVM, keeps track of when the call handler waits for the return of a call into the JVM. If this flag is set, the JVM is allowed to make synchronized calls to the backend. If not set, the JVM will be denied all access. Any attempt to call the backend will result in an exception. This prevents a stray thread from the JVM to enter the backend at a time when it’s not expecting it.
The analogy would be to see the JVM as a monster with multiple swords. The backend can cope as long as the monster swings one sword at a time. The synchronization mechanism ensures this. The backend needs to turn its back to the monster and do other things every now and then. The control flag ensures that the monster doesn’t stab the backend from behind.
Java makes frequent use of try/catch/finally
blocks. PostgreSQL sometimes use an exception mechanism that calls longjmp
to transfer control to a known state. This jump effectively bypasses the whole
JVM and is impossible to catch. No finally block will of course be executed
either.
The current state of the jump buffer (Warn_restart
for hackers) is saved by the call handler prior to all calls into the JVM. All
calls from the JVM into the backend that might result in a longjmp
,
will setup its own local jump buffer. If a longjmp
occurs, the jump is caught, remembered by raising a flag, and replaced with a
Java exception that is thrown. From that point on, and until the JVM returns,
the JVM is blocked all access to the PostgreSQL
backend code. Once the JVM returns (typically immediately due to the
exception), the flag state is examined and the jump “continues” to its intended
destination (the original state of the Warn_restart
buffer).
This allows the JVM to trap all exceptions and to do normal catch/finally processing. The database can of course not be accessed but other housekeeping can be made.
Primitive types will be passed by value always. This includes the String type (this is a must since Java uses double byte characters). Complex types are however often wrapped in Java objects and passed by reference. I.e, a Java object will contain a pointer to a palloc’ed memory and use native JNI calls to extract and manipulate data. Such data will become “stale” once a call has ended. Further attempts to access such data will at best give very unpredictable results but more likely cause a memory fault and a crash.
The Pl/Java contains code that ensures that stale pointers are cleared when the pointer is freed or when MemoryContext that they where allocated in goes out of scope. The Java wrapper objects might live on but any attempt to use them will result in a “closed native handle” exception.
Java and JNI will use the following naming scheme when finding the shared library on a Unix box:
Prepend the name with “lib” and then append “.so”.
Find the resulting file using the LD_LIBRARY_PATH
.
On a Windows box it does like this:
Append “.dll” (nothing is prepended)
and then use PATH
to find the resulting file.
Postgres have a scheme of its own. Apparently it doesn’t prepend
the “lib” on a Unix box and it uses the Dynamic_library_path
instead of the LD_LIBRARY_PATH/PATH
to find the
module. Unfortunately, the Dynamic_library_path
is not seen by the loader so if a module needs to load other dynamic libraries
it will fail unless LD_LIBRARY_PATH/PATH
is set correctly.
The Pl/Java runtime will merge the Dynamic_library_path
with the LD_LIBRARY_PATH
or PATH
and use the result in the JVM. That seems to work fine (the JVM will attempt to
load the shared library too and unless it finds the already loaded one, it will
fail). In addition, the deploy program that initializes the Java language in
the database, has an option –windows, that allows different SQL-syntax to be used
on different platforms. On Unix, the module name used
will be “libpljava”, on Windows, just “pljava”.
PostgreSQL could be improved so that:
a)
It would merge the Dynamic_library_path
with the current LD_LIBRARY_PATH
or PATH
and change the environment used by the forked backend processes. That way, the
system loader would function correctly and PostgreSQL
would not need any specific code that prepends paths
to module names etc. The system loader needs to be functional anyway when a
module is dependent on other dynamic libraries.
b) On Unix systems, the “lib” prefix should be used by default. Backward compatibility can be obtained by testing without “lib” if no dynamic library is found.