Fix training issues with TensorFlow
Description
This a PR to fix issues over the training process in DEEPaaS.
The current DEEPaaS runs the training in a separate child process in order to be Cancellable. This process is created using the multiprocessing module. It is a known fact that CUDA and multiprocessing don't work out-of-the-box together very well [1]. In addition in the case of Tensorflow this doesn't work very well even in the case of using CPUs [2].
The fix proposed changes the process' start method from fork
(default in Linux) to spawn
[3].
[1] https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing [2] https://github.com/tensorflow/tensorflow/issues/5448#issuecomment-258934405 [3] https://docs.python.org/3.6/library/multiprocessing.html#contexts-and-start-methods
Type of change
Bug fix (non-breaking change which fixes an issue)
How Has This Been Tested?
Tests of the training function (and cancellation) have been performed using Tensorflow (both on CPU and GPU).