bpm is explicit about the interface that it provides to your job. If there is anything in this specification which is unclear or non-specific then please let us know so that we can be explicit about the interface and guarantees provided.
Your process is started and has an unlimited amount of time to start up. If present, a bpm pre-start script can run before your process' pre-start and start. Bpm's pre-start script must complete in within 30 seconds to avoid a monit timeout. A job's pre-start script is not bound by this timeout. You should use a post-start script and a health check if you want your job to only say it has completed deploying after it has started up. You do not need to manage any PID files yourself.
On shutdown your job will receive a
Starting in version
1.1.14, you may choose to have your job receive a
SIGINT instead (see shutdown_signal).
You then have 20 seconds to shutdown your application before it will be sent
SIGQUIT to dump the stack
(this is default behavior in the Go and Java runtimes) before being forcibly
If you require longer than this then you should use a drain script for your server. The drain script should put your server in such a state that it can shutdown within 15 seconds. It is acceptable and supported to terminate your process while running the drain script. However, if you do terminate the process then you should also delete the PID file.
Zombie Processes and Forwarding Signals¶
bpm will run an
init process which will start the process your configuration
init process will reap any zombie processes and forward signals
to your process.
Older versions of bpm did not do this and so we suggested that people use
bash-c to start their process which would reap zombie processes. Unfortunately
this would not forward signals. You can now remove this workaround.
Your process should write logs to standard output and standard error file
descriptors. This data will be written to
Any other files which are written to
/var/vcap/sys/log/JOB inside the
container will be written to
/var/vcap/sys/log/JOB in the host system.
bpm can enforce various resource limits on your processes. There are currently 3 different types: memory, open files, and processes.
If your process tries to allocate more memory that your configuration allows then it will be immediately killed by the OOM killer. This setting is more useful for agent jobs which do not use more memory under user load and do not want to affect the more important user-facing processes.
The open files setting sets a limit on the number of open files (including
network connections) which your job is allowed to have open at once. This is
equivalent to setting
ulimit -n for your process.
This setting places a limit on the number of PIDs which your process is allowed to create. This is to protect against fork-bombs and other resource exhaustion mistakes or attacks. Threads also count towards this limit as they are also given PIDs.
Applications may store temporary files in either
/var/vcap/data/JOB/tmp (as per the BOSH recommendation). Both paths,
/var/tmp are mapped onto the same host volume
/var/vcap/data/JOB/tmp) and changes made to one will instantly appear
in the other.
bpm will set the
TMPDIR environment variable when executing your job to one
of the paths listed above. This environment variable is respected by the
majority of in-use standard libraries used by Cloud Foundry.
Note: Per the BOSH team's guidance the temporary filesystem is not mounted as
You should store ephemeral data in
/var/vcap/data/JOB. There are no
guarantees that this data will be present in a different invocation of your
job. The lifecycle of this data is tied with the underlying BOSH stemcell.
This storage area is useful for cached data that can be re-created by other means when necessary.
This storage area is shared between all processes in your job. It is your responsibility to make sure that the data saved by each of your job's processes does not collide.
When the path
/var/vcap/store exists bpm will mount the path
/var/vcap/store/JOB into the job filesystem. As with
/var/vcap/data/JOB bpm will create the leaf directory if it does not
NOTE: Because persistent data access defaults to
/var/vcap/store/JOB, job name changes will cause persistent data to no longer be accessible. When changing the name of a job the BOSH
pre-startscript should idempotently move the persistent data from
Note: This feature has been added to enable support for older jobs which use the file system to communicate with other jobs. It is preferable in nearly all cases to use the network to communicate across job boundaries. Using the network reduces scheduling constraints such that a job can be un-collocated and allows more modern access control (key rotation, etc.) of the information being transferred between your jobs.
If your job configuration lists shared volumes (
additional_volumes: key) then these
paths will be created if they do not exist before being mounted into the job
filesystem. These volumes will be mounted read-write. All collocated jobs which
list a particular volume path will be given a volume they can all share.
Only volume paths inside the
/var/vcap/data, and (when present) the
/var/vcap/store directories are currently permitted. Specifying paths which
are not inside this directory will cause the job to fail to start.
There are various
monit quirks that bpm attempts to hide or smooth over.
If an operator runs
monit stop JOB && monit start JOB then
monit will wait
until the monitored process has completely stopped and the
listed in the configuration has completely finished before attempting to start
the job again. This is unsurprising behavior.
monit restart does not wait until the existing
job has stopped before starting the same job again. As soon as the existing job
has removed its PID file
monit forks and starts the new process. This
behavior is unexpected, subtle, and full of race conditions. A considerable
number of BOSH releases are unknowingly suffering from bugs caused by this.
bpm enforces its own locking around process operations to avoid these race
conditions. It is completely safe (from a correctness perspective, you may
still break your service) to run
monit restart on a job which uses bpm.
In some cases Monit reports the process as execution failed when the process is actually healthy. There is a known race condition that can occur in certain unavoidable circumstances. This same failure will also appear if bpm's pre-start script exceeds monit's timeout when executing. It is recommended to move time consuming logic from bpm's pre-start to the job's pre-start script.