unify train and predict pools (!86) · Merge requests · DEEP / DEEPaaS

Ignacio Heredia Cachá requested to merge ignacio-br0 into master Mar 06, 2020

This fixes GPU out-of-memory problems that happened when we had two different pools (for predict and train). When we did train then predict sequentially (or viceversa) each pool wanted to have the whole GPU so out-of-memory errors happened. This won't fix out-of-memory errors when running parallel tasks on GPU (errors which also happened before).

CPU deployments shouldn't be affected.

This has been tested with the image classification package on tf 1.14 and GPU (GeForce GTX 1080). Summary of results:

predict then train: OK
train then predict: OK
train then train: OK
predict then predict: OK
predict in parallel (2 workers): Out of of memory.
train in parallel (2 workers): Out of of memory.

Additional tests on CPU:

warm: OK
predict in parallel (2 workers): OK

unify train and predict pools

Merge request reports