@JulianSMoore
Hi Julian,
So whilst those installed perfectly, my model now crashes.
I created a new environment in python and installed perceptilabs and cuda using the following:
conda create -n myenv python=3.8
conda activate myenv
pip install perceptilabs
As administrator!
call conda install -y -q -c conda-forge cudatoolkit=11.2.2
call conda install -y -q -c conda-forge cudnn=8.1.0.77
The I received the following error in PL when I tried to run a model.
Error during training!
Traceback (most recent call last):
File "perceptilabs\coreInterface.py", line 32, in perceptilabs.coreInterface.TrainingSessionInterface.run_stepwise
File "perceptilabs\coreInterface.py", line 33, in perceptilabs.coreInterface.TrainingSessionInterface.run_stepwise
File "perceptilabs\coreInterface.py", line 52, in _main_loop
File "perceptilabs\trainer\base.py", line 174, in run_stepwise
File "perceptilabs\trainer\base.py", line 282, in _loop_over_dataset
File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\function.py", line 3023, in __call__
return graph_function._call_flat(
File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\function.py", line 1960, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
outputs = execute.execute(
File "c:\users\james\anaconda3\envs\myenv2\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[64,3,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3 (defined at <rendered-code: 1 [DeepLearningConv]>:29) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Identity_13/_6]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[64,3,256,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3 (defined at <rendered-code: 1 [DeepLearningConv]>:29) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference__work_on_batch_3679]
Errors may have originated from an input operation.
Input Source operations connected to node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3:
training_model/deep_learning_conv__convolution_1_keras/depthwise_conv2d/BiasAdd (defined at <rendered-code: 1 [DeepLearningConv]>:25)
Input Source operations connected to node training_model/deep_learning_conv__convolution_1_keras/batch_normalization/FusedBatchNormV3:
training_model/deep_learning_conv__convolution_1_keras/depthwise_conv2d/BiasAdd (defined at <rendered-code: 1 [DeepLearningConv]>:25)
Function call stack:
_work_on_batch -> _work_on_batch