Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
How to run DeepRacer locally on Mac

I took all my instructions from this page.  https://github.com/crr0004/deepracer

Here are the revised instructions for OSX (bold is console command)

  1. Change to a folder in terminal that is not case-sensitive. ~/ should be fine
  2. git clone --recurse-submodules https://github.com/crr0004/deepracer.git
  3. brew install minio/stable/minio -- you may need to install brew first -- /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  4. install vncviewer from here https://www.realvnc.com/download/file/viewer.files/VNC-Viewer-6.19.325-MacOSX-x86_64.dmg
  5. cd rl_coach
  6. vim env.sh
  7. replace the $(hostname -i) with your IP Address (i to edit, esc - :wq to save and quit) -- ifconfig|grep -e 'inet [197][970]'
  8. add a "g" before readlink, so that it reads greadlink
  9. save and exit
  10. brew install coreutils
  11. the "source" command in linux means run a shell script. in Mac you can use "." instead of "source"
  12. . ./env.sh
  13. minio server data
  14. Browse to http://127.0.0.1:9000 and use the credentials the minio command gave you to login
  15. Create a bucket called "bucket"
  16. Now edit the env.sh file again, this time replacing "minio" with the minio access key and "miniokey" with the access secret.
  17. Now you're all done setting up your fake s3 bucket/server
  18. Let's start Sagemaker setup, do Command T to open new terminal
  19. Go back to the "deepracer" or repo root folder cd ..
  20. python3 -m venv sagemaker_venv
  21. This assumes you already have python3 installed. You probably need both pythons installed, 2 and 3.
  22. . sagemaker_venv/bin/activate
  23. pip install PyYAML==3.11
  24. pip install urllib3==1.21.1
  25. pip install -U sagemaker-python-sdk/ awscli ipython pandas
  26. docker pull crr0004/sagemaker-rl-tensorflow:console
  27. docker tag crr0004/sagemaker-rl-tensorflow:console 520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-rl-tensorflow:coach0.11-cpu-py3
  28. I'm also assuming you already have docker installed and logged in with a docker account
  29. mkdir -p ~/.sagemaker && cp config.yaml ~/.sagemaker
  30. cd rl_coach
  31. export LOCAL_ENV_VAR_JSON_PATH=$(greadlink -f ./env_vars.json)
  32. mkdir ~/robo
  33. mkdir ~/robo/container
  34. ipython rl_deepracer_coach_robomaker.py
  35. NOW SAGEMAKER LOCAL should be working
  36. Now for Robomaker
  37. Command T to open new terminal window
  38. cd ..
  39. . sagemaker_venv/bin/activate
  40. cd rl_coach
  41. . ./env.sh
  42. docker pull crr0004/deepracer_robomaker:console
  43. cd ..
  44. edit the robomaker.env file to also reference your local ip address and your aws key and secret
  45. docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console
  46. Command Space, open vnc viewer, connect to 127.0.0.1:8080 to view Gazebo
@keisuke-umezawa

This comment has been minimized.

Copy link

keisuke-umezawa commented Jul 27, 2019

If I clone the repo in ~/dev/, I needed to do following:

mkdir -p ~/dev/robo/container
@iamharbie

This comment has been minimized.

Copy link

iamharbie commented Aug 4, 2019

At step #34, I got the error below

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short

@daniel-cooper

This comment has been minimized.

Copy link

daniel-cooper commented Aug 4, 2019

I'm running into a different problem on #34. It seems like s3_client is not a valid parameter when setting up the local Sagemaker session. I've been messing around with different ways to get sage_session to initialize properly, but haven't gotten much of anywhere.

Any tips?

(sagemaker_venv)  ~/workspace/deepracer/rl_coach   master ●  ipython rl_deepracer_coach_robomaker.py
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/workspace/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in <module>
     27 endpoint_url=os.environ.get("S3_ENDPOINT_URL", "http://127.0.0.1:9000"))
     28
---> 29 sage_session = sagemaker.local.LocalSession(boto_session=boto_session, s3_client=s3Client)
     30 s3_bucket = os.environ.get("MODEL_S3_BUCKET", "bucket") #sage_session.default_bucket()
     31 s3_output_path = 's3://{}/'.format(s3_bucket) # SDK appends the job name and output folder

TypeError: __init__() got an unexpected keyword argument 's3_client'
@kimwooglae

This comment has been minimized.

Copy link

kimwooglae commented Aug 5, 2019

I'm running into a different problem on #34. It seems like s3_client is not a valid parameter when setting up the local Sagemaker session. I've been messing around with different ways to get sage_session to initialize properly, but haven't gotten much of anywhere.

Any tips?

(sagemaker_venv)  ~/workspace/deepracer/rl_coach   master ●  ipython rl_deepracer_coach_robomaker.py
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/workspace/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in <module>
     27 endpoint_url=os.environ.get("S3_ENDPOINT_URL", "http://127.0.0.1:9000"))
     28
---> 29 sage_session = sagemaker.local.LocalSession(boto_session=boto_session, s3_client=s3Client)
     30 s3_bucket = os.environ.get("MODEL_S3_BUCKET", "bucket") #sage_session.default_bucket()
     31 s3_output_path = 's3://{}/'.format(s3_bucket) # SDK appends the job name and output folder

TypeError: __init__() got an unexpected keyword argument 's3_client'

I think http://127.0.0.1:9000 cause the problem.
Minio server should be accessible inside docker container.

When you start the minio server, you can see the endpoint url list.
Try other endpoint address.

@kimwooglae

This comment has been minimized.

Copy link

kimwooglae commented Aug 5, 2019

At step #34, I got the error below

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems
for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short

I modified instance_type and image_name in rl_deeprace_coach_robomaker.py.

# 'local' for cpu, 'local_gpu' for nvidia gpu (and then you don't have to set default runtime to nvidia)
instance_type = "local"


estimator = RLEstimator(entry_point="training_worker.py",
                        source_dir='src',
                        dependencies=["common/sagemaker_rl"],
                        toolkit=RLToolkit.COACH,
                        toolkit_version='0.11',
                        framework=RLFramework.TENSORFLOW,
                        sagemaker_session=sage_session,
                        #bypass sagemaker SDK validation of the role
                        role="aaa/",
                        train_instance_type=instance_type,
                        train_instance_count=1,
                        output_path=s3_output_path,
                        base_job_name=job_name_prefix,
                        image_name="crr0004/sagemaker-rl-tensorflow:console",

...
@Smiffyk

This comment has been minimized.

Copy link

Smiffyk commented Aug 6, 2019

I'm having fierce issues with this. No idea where to go :(. Is there any videos or anything?

I'm getting the below error, Robomaker in docker.

And when I run the Roboracer I get:

S3 bucket: bucket
S3 prefix: rl-deepracer-sagemaker
Traceback (most recent call last):
File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 155, in download_file
s3_client.download_file(self.bucket, s3_key, local_path)
File "/usr/local/lib/python3.5/dist-packages/boto3/s3/inject.py", line 172, in download_file
extra_args=ExtraArgs, callback=Callback)
File "/usr/local/lib/python3.5/dist-packages/boto3/s3/transfer.py", line 307, in download_file
future.result()
File "/usr/local/lib/python3.5/dist-packages/s3transfer/futures.py", line 73, in result
return self._coordinator.result()
File "/usr/local/lib/python3.5/dist-packages/s3transfer/futures.py", line 233, in result
raise self._exception
File "/usr/local/lib/python3.5/dist-packages/s3transfer/tasks.py", line 255, in _main
self._submit(transfer_future=transfer_future, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/s3transfer/download.py", line 353, in _submit
**transfer_future.meta.call_args.extra_args
File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in
main()
File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 239, in main
load_model_metadata(s3_client, args.model_metadata_s3_key, model_metadata_local_path)
File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/utils.py", line 134, in load_model_metadata
local_path=model_metadata_local_path)
File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 164, in download_file
*utils.build_user_error_dict(utils.SIMAPP_S3_DATA_STORE_EXCEPTION, utils.SIMAPP_EVENT_ERROR_CODE_401))
File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/utils.py", line 58, in json_format_logger
message = msg.format(args)
KeyError: "'Message'"
================================================================================REQUIRED process [agent-9] has died!
process has died [pid 1333, exit code 1, cmd /app/robomaker-deepracer/simulation_ws/install/deepracer_simulation/lib/deepracer_simulation/run_rollout_rl_agent.sh __name:=agent __log:=/root/.ros/log/b7b7e5d4-b837-11e9-9cb4-0242ac130002/agent-9.log].
log file: /root/.ros/log/b7b7e5d4-b837-11e9-9cb4-0242ac130002/agent-9
.log
Initiating shutdown!
================================================================================
[agent-9] killing on exit
[better_odom-8] killing on exit
[car_reset_node-7] killing on exit
[robot_state_publisher-6] killing on exit
[racecar/controller_manager-5] killing on exit
[INFO] [1565088599.510748, 10.252000]: Shutting down spawner. Stopping and unloading controllers...
[INFO] [1565088599.514290, 10.252000]: Stopping all controllers...

Has anyone had these issues?

@iamharbie

This comment has been minimized.

Copy link

iamharbie commented Aug 6, 2019

Make sure you have your reward.py and model_metadata.json in custom_files direction in your bucket

@SmiffyKMc

This comment has been minimized.

Copy link

SmiffyKMc commented Aug 6, 2019

@iamharbie where does the custom_files directory go? When following the steps above, nothing was mentioned about the custom_files? Thanks for your reply also :)

@iamharbie

This comment has been minimized.

Copy link

iamharbie commented Aug 6, 2019

I admit the tutorials are not beginner friendly. Had a lot of issues too.
To start with the default configs for your training...just copy. the custom_files in ~/deepracer to ~/deepracer/rl_coach/data/bucket

@SmiffyKMc

This comment has been minimized.

Copy link

SmiffyKMc commented Aug 6, 2019

@iamharbie it is a bit over the place, but still a great purpose :). Perfect! I'm going to try that when I get home! If I manage it, I'll get you a coffee xD. Do you use the mac for training often? I hear people set it up and then leave it as it's slow?

@iamharbie

This comment has been minimized.

Copy link

iamharbie commented Aug 6, 2019

yes. I currently use Mac. Note that you'd have to change the sagemaker image to :console instead of Nvidia since Mac doesn't support GPU, that fix the error I had above.

It is a bit slow and I just let it be. If I need to see what is going on, I restart my system so I can have enough memory to run the training.

I also just started trading so I am still new

@SmiffyKMc

This comment has been minimized.

Copy link

SmiffyKMc commented Aug 6, 2019

Oh very handy!! You've been a great help in clearing up some unknowns. I don't mind if it's taking it's time as long as it finishes correctly. Nothing an overnight run can't do! Will try this when I make it home and will let you know, thanks for your help!

@shugert

This comment has been minimized.

Copy link

shugert commented Aug 7, 2019

When I run the last command I get the following message: docker: Error response from daemon: network sagemaker-local not found.
Any idea? Thanks

@iamharbie

This comment has been minimized.

Copy link

iamharbie commented Aug 7, 2019

That means line 34 isn't done or not successful....the error is docker specific.
Line 34 is meant to pull a very large docker image ~1GB, and run it and create a network called sagemaker-local. Usually it doesn't display any output until it is done pulling the image.

@iamharbie

This comment has been minimized.

Copy link

iamharbie commented Aug 7, 2019

That reminds me, you've pulled the image already in line 26. If line 34 is still trying to pull image when running it, that indicate you're pulling the NVIDIA version of the image which would not work eventually (Faced the same issue) since Mac doesn't support GPU. you'd need to edit the python script.

Kindly read the instructions here https://github.com/kevinmarlis/deep-racer/blob/master/Mac-Local-Training-Installation.md as it an updated version of this gist

@shugert

This comment has been minimized.

Copy link

shugert commented Aug 7, 2019

I made the changes from the link you share and still getting issues while running ipython rl_deepracer_coach_robomaker.py:

Looking for config file: /Users/SamuelNoriega/.sagemaker/config.yaml
Model checkpoints and other metadata will be stored at: s3://bucket/rl-deepracer-sagemaker
Uploading to s3://bucket/rl-deepracer-sagemaker
WARNING:sagemaker:Parameter `image_name` is specified, `toolkit`, `toolkit_version`, `framework` are going to be ignored when choosing the image.
s3.ServiceResource()
Using provided s3_client
INFO:sagemaker:Creating training-job with name: rl-deepracer-sagemaker
Starting training job
Using /Users/SamuelNoriega/Documents/Clientes/DeepRacer/robo/container for container temp files
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~/Documents/Clientes/DeepRacer/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in <module>
    130 
    131 
--> 132 estimator.fit(job_name=job_name, wait=False)

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
    232         self._prepare_for_training(job_name=job_name)
    233 
--> 234         self.latest_training_job = _TrainingJob.start_new(self, inputs)
    235         if wait:
    236             self.latest_training_job.wait(logs=logs)

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs)
    581             train_args['image'] = estimator.train_image()
    582 
--> 583         estimator.sagemaker_session.train(**train_args)
    584 
    585         return cls(estimator.sagemaker_session, estimator._current_job_name)

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image, algorithm_arn, encrypt_inter_container_traffic)
    326         LOGGER.info('Creating training-job with name: {}'.format(job_name))
    327         LOGGER.debug('train request: {}'.format(json.dumps(train_request, indent=4)))
--> 328         self.sagemaker_client.create_training_job(**train_request)
    329 
    330     def compile_model(self, input_model_config, output_model_config, role,

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/local_session.py in create_training_job(self, TrainingJobName, AlgorithmSpecification, OutputDataConfig, ResourceConfig, InputDataConfig, **kwargs)
     76         hyperparameters = kwargs['HyperParameters'] if 'HyperParameters' in kwargs else {}
     77         print("Starting training job")
---> 78         training_job.start(InputDataConfig, OutputDataConfig, hyperparameters, TrainingJobName)
     79 
     80         LocalSagemakerClient._training_jobs[TrainingJobName] = training_job

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/entities.py in start(self, input_data_config, output_data_config, hyperparameters, job_name)
     68         self.state = self._TRAINING
     69 
---> 70         self.model_artifacts = self.container.train(input_data_config, output_data_config, hyperparameters, job_name)
     71         self.end = datetime.datetime.now()
     72         self.state = self._COMPLETED

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/image.py in train(self, input_data_config, output_data_config, hyperparameters, job_name)
     97         Returns (str): Location of the trained model.
     98         """
---> 99         self.container_root = self._create_tmp_folder()
    100         os.mkdir(os.path.join(self.container_root, 'output'))
    101         # create output/data folder since sagemaker-containers 2.0 expects it

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/image.py in _create_tmp_folder(self)
    482         print("Using {} for container temp files".format(root_dir))
    483 
--> 484         working_dir = tempfile.mkdtemp(dir=root_dir)
    485 
    486         # Docker cannot mount Mac OS /var folder properly see

~/anaconda3/lib/python3.7/tempfile.py in mkdtemp(suffix, prefix, dir)
    364         file = _os.path.join(dir, prefix + name + suffix)
    365         try:
--> 366             _os.mkdir(file, 0o700)
    367         except FileExistsError:
    368             continue    # try again

FileNotFoundError: [Errno 2] No such file or directory: '/Users/SamuelNoriega/Documents/Clientes/DeepRacer/robo/container/tmp_rlz587w'

Any recommendations?

@iamharbie

This comment has been minimized.

Copy link

iamharbie commented Aug 7, 2019

Trying to make sense of the error messages.... does the dir in the last line exist?

@shugert

This comment has been minimized.

Copy link

shugert commented Aug 7, 2019

the folder does exist but the file doesn't, does it get generated my the estimator fit?

@decarvalhohenrique

This comment has been minimized.

Copy link

decarvalhohenrique commented Aug 13, 2019

@ shugert The file is created at the moment, so it's the filepath -- and hence the folder, that probably doesn't exist. Maybe you could check again? I ran into the same error. The variable root_dir is assigned as robo/container, so when it tries to access from abspath it doesn't go to ~.../deepracer/robo/container. In sum, try to create the folders /Users/SamuelNoriega/Documents/Clientes/DeepRacer/robo/container/

@maxchen1220

This comment has been minimized.

Copy link

maxchen1220 commented Aug 15, 2019

I had the same issue at step #34

@maxchen1220

This comment has been minimized.

Copy link

maxchen1220 commented Aug 15, 2019

I had the same issue with Step #34

TypeError Traceback (most recent call last)
~/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in
26 endpoint_url=os.environ.get("S3_ENDPOINT_URL", "http://127.0.0.1:9000"))
27
---> 28 sage_session = sagemaker.local.LocalSession(boto_session=boto_session, s3_client=s3Client)
29 s3_bucket = os.environ.get("MODEL_S3_BUCKET", "bucket") #sage_session.default_bucket()
30 s3_output_path = 's3://{}/'.format(s3_bucket) # SDK appends the job name and output folder

TypeError: init() got an unexpected keyword argument 's3_client'

Not sure what to do. Can anyone suggest?

@decarvalhohenrique

This comment has been minimized.

Copy link

decarvalhohenrique commented Aug 17, 2019

process [agent-9] has died!

Were you able to fix this issue? I can't find a way around it.

@naninuneno1703

This comment has been minimized.

Copy link

naninuneno1703 commented Aug 22, 2019

I'm running into a different problem on #34. It seems like s3_client is not a valid parameter when setting up the local Sagemaker session. I've been messing around with different ways to get sage_session to initialize properly, but haven't gotten much of anywhere.
Any tips?

(sagemaker_venv)  ~/workspace/deepracer/rl_coach   master ●  ipython rl_deepracer_coach_robomaker.py
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/workspace/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in <module>
     27 endpoint_url=os.environ.get("S3_ENDPOINT_URL", "http://127.0.0.1:9000"))
     28
---> 29 sage_session = sagemaker.local.LocalSession(boto_session=boto_session, s3_client=s3Client)
     30 s3_bucket = os.environ.get("MODEL_S3_BUCKET", "bucket") #sage_session.default_bucket()
     31 s3_output_path = 's3://{}/'.format(s3_bucket) # SDK appends the job name and output folder

TypeError: __init__() got an unexpected keyword argument 's3_client'

I think http://127.0.0.1:9000 cause the problem.
Minio server should be accessible inside docker container.

When you start the minio server, you can see the endpoint url list.
Try other endpoint address.

kimwooglae, which endpoint url list must be changed ? minio server or in env.sh (Step 6). And what ip addess we must use? Thanks in advance

@kenpeter

This comment has been minimized.

Copy link

kenpeter commented Aug 25, 2019

wonder why line 34 got so many issues, here is mine:

Failed to upload /var/folders/x9/q1g5m54n1s18sp9_k683h0_r0000gn/T/tmp48tp6xoj/source.tar.gz to bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The access key ID you provided does not exist in our records.

@vsay01

This comment has been minimized.

Copy link

vsay01 commented Aug 26, 2019

  1. Anyone be able to resolve error occur by line 34 as @SmiffyKMc mentioned above?

  2. In addition to that, i have below error in line 45:

Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in <module> main() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 239, in main load_model_metadata(s3_client, args.model_metadata_s3_key, model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/utils.py", line 134, in load_model_metadata local_path=model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 145, in download_file s3_client = self.get_client() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 32, in get_client return session.client('s3', region_name=self.aws_region, endpoint_url=s3_url) File "/usr/local/lib/python3.5/dist-packages/boto3/session.py", line 263, in client aws_session_token=aws_session_token, config=config) File "/usr/local/lib/python3.5/dist-packages/botocore/session.py", line 839, in create_client client_config=config, api_version=api_version) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 86, in create_client verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 328, in _get_client_args verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/args.py", line 85, in get_client_args client_cert=new_config.client_cert) File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 261, in create_endpoint raise ValueError("Invalid endpoint: %s" % endpoint_url) ValueError: Invalid endpoint: 192.168.1.76

Note: 192.168.1.76 is my local IP, i made the change to S3_ENDPOINT_URL as mentioned in line 44

  1. Despite error above, when i connect 127.0.0.1:9000 in VNC Viewer, I got this program running; however, I don't know if this is the program we want:

Screen Shot 2019-08-25 at 7 17 28 PM

Any help would be appreciated. Thanks.

@hemantpahil44

This comment has been minimized.

Copy link

hemantpahil44 commented Aug 29, 2019

I was able to run using above instructions earlier but i have started getting errors. below is the error that is new. Any idea what could be wrong? i am suspecting that following command pulled the latest image which caused the issue:
docker pull crr0004/deepracer_robomaker:console

## Creating agent - name: agent
2019-08-29 21:15:45.640320: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-29 21:15:45.885102: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:45.943487: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:46.010246: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:46.054399: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:46.120578: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
## Loading checkpoint: ./checkpoint/0_Step-0.ckpt
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in <module>
    main()
  File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 298, in main
    memory_backend_params = memory_backend_params
  File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 142, in rollout_worker
    graph_manager.create_graph(task_parameters)
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/graph_managers/graph_manager.py", line 153, in create_graph
    self.create_session(task_parameters=task_parameters)
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/graph_managers/graph_manager.py", line 265, in create_session
    self.restore_checkpoint()
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/graph_managers/graph_manager.py", line 572, in restore_checkpoint
    self.checkpoint_saver.restore(self.sess, checkpoint.model_checkpoint_path)
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/saver.py", line 118, in restore
    saver.restore(sess, self._full_path(restore_path, saver))
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/architectures/tensorflow_components/savers.py", line 82, in restore
    sess.run(self._variable_update_ops, placeholder_dict)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 887, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1086, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (512, 6) for Tensor 'Placeholder_21:0', which has shape '(512, 10)'
================================================================================REQUIRED process [agent-9] has died!```
@vsay01

This comment has been minimized.

Copy link

vsay01 commented Sep 13, 2019

  1. Anyone be able to resolve error occur by line 34 as @SmiffyKMc mentioned above?
  2. In addition to that, i have below error in line 45:

Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in <module> main() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 239, in main load_model_metadata(s3_client, args.model_metadata_s3_key, model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/utils.py", line 134, in load_model_metadata local_path=model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 145, in download_file s3_client = self.get_client() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 32, in get_client return session.client('s3', region_name=self.aws_region, endpoint_url=s3_url) File "/usr/local/lib/python3.5/dist-packages/boto3/session.py", line 263, in client aws_session_token=aws_session_token, config=config) File "/usr/local/lib/python3.5/dist-packages/botocore/session.py", line 839, in create_client client_config=config, api_version=api_version) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 86, in create_client verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 328, in _get_client_args verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/args.py", line 85, in get_client_args client_cert=new_config.client_cert) File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 261, in create_endpoint raise ValueError("Invalid endpoint: %s" % endpoint_url) ValueError: Invalid endpoint: 192.168.1.76

Note: 192.168.1.76 is my local IP, i made the change to S3_ENDPOINT_URL as mentioned in line 44

  1. Despite error above, when i connect 127.0.0.1:9000 in VNC Viewer, I got this program running; however, I don't know if this is the program we want:
Screen Shot 2019-08-25 at 7 17 28 PM

Any help would be appreciated. Thanks.

This gist help clarify and resolved my issue:

Same as this steps except for mac we need to user CPU:

  • In ~/deepracer/rl_coach, open rl_deepracer_coach_robomaker.py in an editor.
    Make sure that your endpoint_url (line 27) is the url to your minio server (ie: 192.168.1.xxx:9000)
    For CPU training, make sure instance_type (line 92) is "local" and image_name (line 108) is "crr0004/sagemaker-rl-tensorflow:console"
    Save the file
  • ipython rl_deepracer_coach_robomaker.py
@scumola

This comment has been minimized.

Copy link

scumola commented Oct 14, 2019

When spawning the robomaker container, I'm getting this:

[INFO] [1571063966.818759, 3.258000]: Controller Spawner: Loaded controllers: left_rear_wheel_velocity_controller, right_rear_wheel_velocity_controller, left_front_wheel_velocity_controller, right_front_wheel_velocity_controller, left_steering_hinge_position_controller, right_steering_hinge_position_controller, joint_state_controller
[INFO] [1571063966.830892, 3.267000]: Started controllers: left_rear_wheel_velocity_controller, right_rear_wheel_velocity_controller, left_front_wheel_velocity_controller, right_front_wheel_velocity_controller, left_steering_hinge_position_controller, right_steering_hinge_position_controller, joint_state_controller
[ERROR] Unable to import the waypoints [Errno 2] No such file or directory: '/app/robomaker-deepracer/simulation_ws/install/deepracer_simulation/share/deepracer_simulation/routes/Mexico_track.npy'
[car_reset_node-7] process has died [pid 1326, exit code 1, cmd /app/robomaker-deepracer/simulation_ws/install/deepracer_simulation/lib/deepracer_simulation/car_node.py __name:=car_reset_node __log:=/root/.ros/log/67a95d68-ee90-11e9-979d-0242ac130003/car_reset_node-7.log].
log file: /root/.ros/log/67a95d68-ee90-11e9-979d-0242ac130003/car_reset_node-7*.log

I can see the ros GUI in VNC with a car (that drives forward), but no track.

Fullscreen_10_14_19__8_40_AM

@daj

This comment has been minimized.

Copy link

daj commented Nov 11, 2019

For step #7, it might be better to update the env.sh script so it copes if your IP address changes.

Before:

export S3_ENDPOINT_URL=http://$(hostname -i):9000

After:

IPADDR=$(ifconfig|grep -e 'inet [197][970]' | awk '{print $2}')
export S3_ENDPOINT_URL=http://$IPADDR:9000

I forked your Gist and made the change above and a couple of others too: https://gist.github.com/daj/ae0ab3853e2dffe2f9edced2327f6ee1

@ebisbe

This comment has been minimized.

Copy link

ebisbe commented Dec 2, 2019

@scumola I'm having the same error. I see that the link does not exists. Have you manage to solve it?

[Errno 2] No such file or directory: '/app/robomaker-deepracer/simulation_ws/install/deepracer_simulation/share/deepracer_simulation/routes/Mexico_track.npy'

@lvthillo

This comment has been minimized.

Copy link

lvthillo commented Dec 10, 2019

@scumola @ebisbe I have the same issue. I've a driving car but no track. Any solution yet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.