Skip to content

Instantly share code, notes, and snippets.

@joezen777
Last active April 15, 2023 21:46
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save joezen777/6657bbe2bd4add5d1cdbd44db9761edb to your computer and use it in GitHub Desktop.
Save joezen777/6657bbe2bd4add5d1cdbd44db9761edb to your computer and use it in GitHub Desktop.
How to run DeepRacer locally on Mac

I took all my instructions from this page.  https://github.com/crr0004/deepracer

Here are the revised instructions for OSX (bold is console command)

  1. Change to a folder in terminal that is not case-sensitive. ~/ should be fine
  2. git clone --recurse-submodules https://github.com/crr0004/deepracer.git
  3. brew install minio/stable/minio -- you may need to install brew first -- /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  4. install vncviewer from here https://www.realvnc.com/download/file/viewer.files/VNC-Viewer-6.19.325-MacOSX-x86_64.dmg
  5. cd rl_coach
  6. vim env.sh
  7. replace the $(hostname -i) with your IP Address (i to edit, esc - :wq to save and quit) -- ifconfig|grep -e 'inet [197][970]'
  8. add a "g" before readlink, so that it reads greadlink
  9. save and exit
  10. brew install coreutils
  11. the "source" command in linux means run a shell script. in Mac you can use "." instead of "source"
  12. . ./env.sh
  13. minio server data
  14. Browse to http://127.0.0.1:9000 and use the credentials the minio command gave you to login
  15. Create a bucket called "bucket"
  16. Now edit the env.sh file again, this time replacing "minio" with the minio access key and "miniokey" with the access secret.
  17. Now you're all done setting up your fake s3 bucket/server
  18. Let's start Sagemaker setup, do Command T to open new terminal
  19. Go back to the "deepracer" or repo root folder cd ..
  20. python3 -m venv sagemaker_venv
  21. This assumes you already have python3 installed. You probably need both pythons installed, 2 and 3.
  22. . sagemaker_venv/bin/activate
  23. pip install PyYAML==3.11
  24. pip install urllib3==1.21.1
  25. pip install -U sagemaker-python-sdk/ awscli ipython pandas
  26. docker pull crr0004/sagemaker-rl-tensorflow:console
  27. docker tag crr0004/sagemaker-rl-tensorflow:console 520713654638.dkr.ecr.us-east-1.amazonaws.com/sagemaker-rl-tensorflow:coach0.11-cpu-py3
  28. I'm also assuming you already have docker installed and logged in with a docker account
  29. mkdir -p ~/.sagemaker && cp config.yaml ~/.sagemaker
  30. cd rl_coach
  31. export LOCAL_ENV_VAR_JSON_PATH=$(greadlink -f ./env_vars.json)
  32. mkdir ~/robo
  33. mkdir ~/robo/container
  34. ipython rl_deepracer_coach_robomaker.py
  35. NOW SAGEMAKER LOCAL should be working
  36. Now for Robomaker
  37. Command T to open new terminal window
  38. cd ..
  39. . sagemaker_venv/bin/activate
  40. cd rl_coach
  41. . ./env.sh
  42. docker pull crr0004/deepracer_robomaker:console
  43. cd ..
  44. edit the robomaker.env file to also reference your local ip address and your aws key and secret
  45. docker run --rm --name dr --env-file ./robomaker.env --network sagemaker-local -p 8080:5900 -it crr0004/deepracer_robomaker:console
  46. Command Space, open vnc viewer, connect to 127.0.0.1:8080 to view Gazebo
@iamharbie
Copy link

Make sure you have your reward.py and model_metadata.json in custom_files direction in your bucket

@SmiffyKMc
Copy link

@iamharbie where does the custom_files directory go? When following the steps above, nothing was mentioned about the custom_files? Thanks for your reply also :)

@iamharbie
Copy link

iamharbie commented Aug 6, 2019

I admit the tutorials are not beginner friendly. Had a lot of issues too.
To start with the default configs for your training...just copy. the custom_files in ~/deepracer to ~/deepracer/rl_coach/data/bucket

@SmiffyKMc
Copy link

@iamharbie it is a bit over the place, but still a great purpose :). Perfect! I'm going to try that when I get home! If I manage it, I'll get you a coffee xD. Do you use the mac for training often? I hear people set it up and then leave it as it's slow?

@iamharbie
Copy link

yes. I currently use Mac. Note that you'd have to change the sagemaker image to :console instead of Nvidia since Mac doesn't support GPU, that fix the error I had above.

It is a bit slow and I just let it be. If I need to see what is going on, I restart my system so I can have enough memory to run the training.

I also just started trading so I am still new

@SmiffyKMc
Copy link

Oh very handy!! You've been a great help in clearing up some unknowns. I don't mind if it's taking it's time as long as it finishes correctly. Nothing an overnight run can't do! Will try this when I make it home and will let you know, thanks for your help!

@shugert
Copy link

shugert commented Aug 7, 2019

When I run the last command I get the following message: docker: Error response from daemon: network sagemaker-local not found.
Any idea? Thanks

@iamharbie
Copy link

That means line 34 isn't done or not successful....the error is docker specific.
Line 34 is meant to pull a very large docker image ~1GB, and run it and create a network called sagemaker-local. Usually it doesn't display any output until it is done pulling the image.

@iamharbie
Copy link

That reminds me, you've pulled the image already in line 26. If line 34 is still trying to pull image when running it, that indicate you're pulling the NVIDIA version of the image which would not work eventually (Faced the same issue) since Mac doesn't support GPU. you'd need to edit the python script.

Kindly read the instructions here https://github.com/kevinmarlis/deep-racer/blob/master/Mac-Local-Training-Installation.md as it an updated version of this gist

@shugert
Copy link

shugert commented Aug 7, 2019

I made the changes from the link you share and still getting issues while running ipython rl_deepracer_coach_robomaker.py:

Looking for config file: /Users/SamuelNoriega/.sagemaker/config.yaml
Model checkpoints and other metadata will be stored at: s3://bucket/rl-deepracer-sagemaker
Uploading to s3://bucket/rl-deepracer-sagemaker
WARNING:sagemaker:Parameter `image_name` is specified, `toolkit`, `toolkit_version`, `framework` are going to be ignored when choosing the image.
s3.ServiceResource()
Using provided s3_client
INFO:sagemaker:Creating training-job with name: rl-deepracer-sagemaker
Starting training job
Using /Users/SamuelNoriega/Documents/Clientes/DeepRacer/robo/container for container temp files
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~/Documents/Clientes/DeepRacer/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in <module>
    130 
    131 
--> 132 estimator.fit(job_name=job_name, wait=False)

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
    232         self._prepare_for_training(job_name=job_name)
    233 
--> 234         self.latest_training_job = _TrainingJob.start_new(self, inputs)
    235         if wait:
    236             self.latest_training_job.wait(logs=logs)

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/estimator.py in start_new(cls, estimator, inputs)
    581             train_args['image'] = estimator.train_image()
    582 
--> 583         estimator.sagemaker_session.train(**train_args)
    584 
    585         return cls(estimator.sagemaker_session, estimator._current_job_name)

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/session.py in train(self, input_mode, input_config, role, job_name, output_config, resource_config, vpc_config, hyperparameters, stop_condition, tags, metric_definitions, enable_network_isolation, image, algorithm_arn, encrypt_inter_container_traffic)
    326         LOGGER.info('Creating training-job with name: {}'.format(job_name))
    327         LOGGER.debug('train request: {}'.format(json.dumps(train_request, indent=4)))
--> 328         self.sagemaker_client.create_training_job(**train_request)
    329 
    330     def compile_model(self, input_model_config, output_model_config, role,

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/local_session.py in create_training_job(self, TrainingJobName, AlgorithmSpecification, OutputDataConfig, ResourceConfig, InputDataConfig, **kwargs)
     76         hyperparameters = kwargs['HyperParameters'] if 'HyperParameters' in kwargs else {}
     77         print("Starting training job")
---> 78         training_job.start(InputDataConfig, OutputDataConfig, hyperparameters, TrainingJobName)
     79 
     80         LocalSagemakerClient._training_jobs[TrainingJobName] = training_job

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/entities.py in start(self, input_data_config, output_data_config, hyperparameters, job_name)
     68         self.state = self._TRAINING
     69 
---> 70         self.model_artifacts = self.container.train(input_data_config, output_data_config, hyperparameters, job_name)
     71         self.end = datetime.datetime.now()
     72         self.state = self._COMPLETED

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/image.py in train(self, input_data_config, output_data_config, hyperparameters, job_name)
     97         Returns (str): Location of the trained model.
     98         """
---> 99         self.container_root = self._create_tmp_folder()
    100         os.mkdir(os.path.join(self.container_root, 'output'))
    101         # create output/data folder since sagemaker-containers 2.0 expects it

~/Documents/Clientes/DeepRacer/deepracer/sagemaker_venv/lib/python3.7/site-packages/sagemaker/local/image.py in _create_tmp_folder(self)
    482         print("Using {} for container temp files".format(root_dir))
    483 
--> 484         working_dir = tempfile.mkdtemp(dir=root_dir)
    485 
    486         # Docker cannot mount Mac OS /var folder properly see

~/anaconda3/lib/python3.7/tempfile.py in mkdtemp(suffix, prefix, dir)
    364         file = _os.path.join(dir, prefix + name + suffix)
    365         try:
--> 366             _os.mkdir(file, 0o700)
    367         except FileExistsError:
    368             continue    # try again

FileNotFoundError: [Errno 2] No such file or directory: '/Users/SamuelNoriega/Documents/Clientes/DeepRacer/robo/container/tmp_rlz587w'

Any recommendations?

@iamharbie
Copy link

Trying to make sense of the error messages.... does the dir in the last line exist?

@shugert
Copy link

shugert commented Aug 7, 2019

the folder does exist but the file doesn't, does it get generated my the estimator fit?

@decarv
Copy link

decarv commented Aug 13, 2019

@ shugert The file is created at the moment, so it's the filepath -- and hence the folder, that probably doesn't exist. Maybe you could check again? I ran into the same error. The variable root_dir is assigned as robo/container, so when it tries to access from abspath it doesn't go to ~.../deepracer/robo/container. In sum, try to create the folders /Users/SamuelNoriega/Documents/Clientes/DeepRacer/robo/container/

@maxchen1220
Copy link

I had the same issue at step #34

@maxchen1220
Copy link

maxchen1220 commented Aug 15, 2019

I had the same issue with Step #34

TypeError Traceback (most recent call last)
~/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in
26 endpoint_url=os.environ.get("S3_ENDPOINT_URL", "http://127.0.0.1:9000"))
27
---> 28 sage_session = sagemaker.local.LocalSession(boto_session=boto_session, s3_client=s3Client)
29 s3_bucket = os.environ.get("MODEL_S3_BUCKET", "bucket") #sage_session.default_bucket()
30 s3_output_path = 's3://{}/'.format(s3_bucket) # SDK appends the job name and output folder

TypeError: init() got an unexpected keyword argument 's3_client'

Not sure what to do. Can anyone suggest?

@decarv
Copy link

decarv commented Aug 17, 2019

process [agent-9] has died!

Were you able to fix this issue? I can't find a way around it.

@naninuneno1703
Copy link

I'm running into a different problem on #34. It seems like s3_client is not a valid parameter when setting up the local Sagemaker session. I've been messing around with different ways to get sage_session to initialize properly, but haven't gotten much of anywhere.
Any tips?

(sagemaker_venv)  ~/workspace/deepracer/rl_coach   master ●  ipython rl_deepracer_coach_robomaker.py
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~/workspace/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in <module>
     27 endpoint_url=os.environ.get("S3_ENDPOINT_URL", "http://127.0.0.1:9000"))
     28
---> 29 sage_session = sagemaker.local.LocalSession(boto_session=boto_session, s3_client=s3Client)
     30 s3_bucket = os.environ.get("MODEL_S3_BUCKET", "bucket") #sage_session.default_bucket()
     31 s3_output_path = 's3://{}/'.format(s3_bucket) # SDK appends the job name and output folder

TypeError: __init__() got an unexpected keyword argument 's3_client'

I think http://127.0.0.1:9000 cause the problem.
Minio server should be accessible inside docker container.

When you start the minio server, you can see the endpoint url list.
Try other endpoint address.

kimwooglae, which endpoint url list must be changed ? minio server or in env.sh (Step 6). And what ip addess we must use? Thanks in advance

@kenpeter
Copy link

kenpeter commented Aug 25, 2019

wonder why line 34 got so many issues, here is mine:

Failed to upload /var/folders/x9/q1g5m54n1s18sp9_k683h0_r0000gn/T/tmp48tp6xoj/source.tar.gz to bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The access key ID you provided does not exist in our records.

@vsay01
Copy link

vsay01 commented Aug 26, 2019

  1. Anyone be able to resolve error occur by line 34 as @SmiffyKMc mentioned above?

  2. In addition to that, i have below error in line 45:

Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in <module> main() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 239, in main load_model_metadata(s3_client, args.model_metadata_s3_key, model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/utils.py", line 134, in load_model_metadata local_path=model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 145, in download_file s3_client = self.get_client() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 32, in get_client return session.client('s3', region_name=self.aws_region, endpoint_url=s3_url) File "/usr/local/lib/python3.5/dist-packages/boto3/session.py", line 263, in client aws_session_token=aws_session_token, config=config) File "/usr/local/lib/python3.5/dist-packages/botocore/session.py", line 839, in create_client client_config=config, api_version=api_version) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 86, in create_client verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 328, in _get_client_args verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/args.py", line 85, in get_client_args client_cert=new_config.client_cert) File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 261, in create_endpoint raise ValueError("Invalid endpoint: %s" % endpoint_url) ValueError: Invalid endpoint: 192.168.1.76

Note: 192.168.1.76 is my local IP, i made the change to S3_ENDPOINT_URL as mentioned in line 44

  1. Despite error above, when i connect 127.0.0.1:9000 in VNC Viewer, I got this program running; however, I don't know if this is the program we want:

Screen Shot 2019-08-25 at 7 17 28 PM

Any help would be appreciated. Thanks.

@hemantpahil44
Copy link

I was able to run using above instructions earlier but i have started getting errors. below is the error that is new. Any idea what could be wrong? i am suspecting that following command pulled the latest image which caused the issue:
docker pull crr0004/deepracer_robomaker:console

## Creating agent - name: agent
2019-08-29 21:15:45.640320: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-29 21:15:45.885102: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:45.943487: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:46.010246: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:46.054399: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
2019-08-29 21:15:46.120578: W tensorflow/core/framework/allocator.cc:113] Allocation of 23068672 exceeds 10% of system memory.
## Loading checkpoint: ./checkpoint/0_Step-0.ckpt
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in <module>
    main()
  File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 298, in main
    memory_backend_params = memory_backend_params
  File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 142, in rollout_worker
    graph_manager.create_graph(task_parameters)
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/graph_managers/graph_manager.py", line 153, in create_graph
    self.create_session(task_parameters=task_parameters)
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/graph_managers/graph_manager.py", line 265, in create_session
    self.restore_checkpoint()
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/graph_managers/graph_manager.py", line 572, in restore_checkpoint
    self.checkpoint_saver.restore(self.sess, checkpoint.model_checkpoint_path)
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/saver.py", line 118, in restore
    saver.restore(sess, self._full_path(restore_path, saver))
  File "/usr/local/lib/python3.5/dist-packages/rl_coach/architectures/tensorflow_components/savers.py", line 82, in restore
    sess.run(self._variable_update_ops, placeholder_dict)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 887, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1086, in _run
    str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (512, 6) for Tensor 'Placeholder_21:0', which has shape '(512, 10)'
================================================================================REQUIRED process [agent-9] has died!```

@vsay01
Copy link

vsay01 commented Sep 13, 2019

  1. Anyone be able to resolve error occur by line 34 as @SmiffyKMc mentioned above?
  2. In addition to that, i have below error in line 45:

Traceback (most recent call last): File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.5/runpy.py", line 85, in _run_code exec(code, run_globals) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 303, in <module> main() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/rollout_worker.py", line 239, in main load_model_metadata(s3_client, args.model_metadata_s3_key, model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/utils.py", line 134, in load_model_metadata local_path=model_metadata_local_path) File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 145, in download_file s3_client = self.get_client() File "/app/robomaker-deepracer/simulation_ws/install/sagemaker_rl_agent/lib/python3.5/site-packages/markov/s3_client.py", line 32, in get_client return session.client('s3', region_name=self.aws_region, endpoint_url=s3_url) File "/usr/local/lib/python3.5/dist-packages/boto3/session.py", line 263, in client aws_session_token=aws_session_token, config=config) File "/usr/local/lib/python3.5/dist-packages/botocore/session.py", line 839, in create_client client_config=config, api_version=api_version) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 86, in create_client verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/client.py", line 328, in _get_client_args verify, credentials, scoped_config, client_config, endpoint_bridge) File "/usr/local/lib/python3.5/dist-packages/botocore/args.py", line 85, in get_client_args client_cert=new_config.client_cert) File "/usr/local/lib/python3.5/dist-packages/botocore/endpoint.py", line 261, in create_endpoint raise ValueError("Invalid endpoint: %s" % endpoint_url) ValueError: Invalid endpoint: 192.168.1.76

Note: 192.168.1.76 is my local IP, i made the change to S3_ENDPOINT_URL as mentioned in line 44

  1. Despite error above, when i connect 127.0.0.1:9000 in VNC Viewer, I got this program running; however, I don't know if this is the program we want:
Screen Shot 2019-08-25 at 7 17 28 PM

Any help would be appreciated. Thanks.

This gist help clarify and resolved my issue:

Same as this steps except for mac we need to user CPU:

  • In ~/deepracer/rl_coach, open rl_deepracer_coach_robomaker.py in an editor.
    Make sure that your endpoint_url (line 27) is the url to your minio server (ie: 192.168.1.xxx:9000)
    For CPU training, make sure instance_type (line 92) is "local" and image_name (line 108) is "crr0004/sagemaker-rl-tensorflow:console"
    Save the file
  • ipython rl_deepracer_coach_robomaker.py

@scumola
Copy link

scumola commented Oct 14, 2019

When spawning the robomaker container, I'm getting this:

[INFO] [1571063966.818759, 3.258000]: Controller Spawner: Loaded controllers: left_rear_wheel_velocity_controller, right_rear_wheel_velocity_controller, left_front_wheel_velocity_controller, right_front_wheel_velocity_controller, left_steering_hinge_position_controller, right_steering_hinge_position_controller, joint_state_controller
[INFO] [1571063966.830892, 3.267000]: Started controllers: left_rear_wheel_velocity_controller, right_rear_wheel_velocity_controller, left_front_wheel_velocity_controller, right_front_wheel_velocity_controller, left_steering_hinge_position_controller, right_steering_hinge_position_controller, joint_state_controller
[ERROR] Unable to import the waypoints [Errno 2] No such file or directory: '/app/robomaker-deepracer/simulation_ws/install/deepracer_simulation/share/deepracer_simulation/routes/Mexico_track.npy'
[car_reset_node-7] process has died [pid 1326, exit code 1, cmd /app/robomaker-deepracer/simulation_ws/install/deepracer_simulation/lib/deepracer_simulation/car_node.py __name:=car_reset_node __log:=/root/.ros/log/67a95d68-ee90-11e9-979d-0242ac130003/car_reset_node-7.log].
log file: /root/.ros/log/67a95d68-ee90-11e9-979d-0242ac130003/car_reset_node-7*.log

I can see the ros GUI in VNC with a car (that drives forward), but no track.

Fullscreen_10_14_19__8_40_AM

@daj
Copy link

daj commented Nov 11, 2019

For step #7, it might be better to update the env.sh script so it copes if your IP address changes.

Before:

export S3_ENDPOINT_URL=http://$(hostname -i):9000

After:

IPADDR=$(ifconfig|grep -e 'inet [197][970]' | awk '{print $2}')
export S3_ENDPOINT_URL=http://$IPADDR:9000

I forked your Gist and made the change above and a couple of others too: https://gist.github.com/daj/ae0ab3853e2dffe2f9edced2327f6ee1

@ebisbe
Copy link

ebisbe commented Dec 2, 2019

@scumola I'm having the same error. I see that the link does not exists. Have you manage to solve it?

[Errno 2] No such file or directory: '/app/robomaker-deepracer/simulation_ws/install/deepracer_simulation/share/deepracer_simulation/routes/Mexico_track.npy'

@lvthillo
Copy link

@scumola @ebisbe I have the same issue. I've a driving car but no track. Any solution yet?

@ebisbe
Copy link

ebisbe commented Dec 10, 2019

@lvthillo I gave up...

@kkrglyt
Copy link

kkrglyt commented Jun 13, 2020

when I try to start sage maker, im getting below exception..Also getting InvalidAccessKeyId but i have given correct minio access key

Model checkpoints and other metadata will be stored at: s3://bucket/rl-deepracer-sagemaker
Uploading to s3://bucket/rl-deepracer-sagemaker
WARNING:sagemaker:Parameter image_name is specified, toolkit, toolkit_version, framework are going to be ignored when choosing the image.
s3.ServiceResource()
Using provided s3_client

ClientError Traceback (most recent call last)
/deepracer/sagemaker_venv/lib/python3.6/site-packages/boto3/s3/transfer.py in upload_file(self, filename, bucket, key, callback, extra_args)
278 try:
--> 279 future.result()
280 # If a client error was raised, add the backwards compatibility layer

/deepracer/sagemaker_venv/lib/python3.6/site-packages/s3transfer/futures.py in result(self)
105 # out of this and propogate the exception.
--> 106 return self._coordinator.result()
107 except KeyboardInterrupt as e:

/deepracer/sagemaker_venv/lib/python3.6/site-packages/s3transfer/futures.py in result(self)
264 if self._exception:
--> 265 raise self._exception
266 return self._result

/deepracer/sagemaker_venv/lib/python3.6/site-packages/s3transfer/tasks.py in call(self)
125 if not self._transfer_coordinator.done():
--> 126 return self._execute_main(kwargs)
127 except Exception as e:

/deepracer/sagemaker_venv/lib/python3.6/site-packages/s3transfer/tasks.py in _execute_main(self, kwargs)
149
--> 150 return_value = self._main(**kwargs)
151 # If the task is the final task, then set the TransferFuture's

/deepracer/sagemaker_venv/lib/python3.6/site-packages/s3transfer/upload.py in _main(self, client, fileobj, bucket, key, extra_args)
691 with fileobj as body:
--> 692 client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
693

/deepracer/sagemaker_venv/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
315 # The "self" in this scope is referring to the BaseClient.
--> 316 return self._make_api_call(operation_name, kwargs)
317

/deepracer/sagemaker_venv/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
634 error_class = self.exceptions.from_code(error_code)
--> 635 raise error_class(parsed_response, operation_name)
636 else:

ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The Access Key Id you provided does not exist in our records.

During handling of the above exception, another exception occurred:

S3UploadFailedError Traceback (most recent call last)
/deepracer/rl_coach/rl_deepracer_coach_robomaker.py in
128 )
129
--> 130 estimator.fit(job_name=job_name, wait=False)

/deepracer/sagemaker_venv/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
230 based on the training image name and current timestamp.
231 """
--> 232 self._prepare_for_training(job_name=job_name)
233
234 self.latest_training_job = _TrainingJob.start_new(self, inputs)

/deepracer/sagemaker_venv/lib/python3.6/site-packages/sagemaker/estimator.py in _prepare_for_training(self, job_name)
849 script = self.entry_point
850 else:
--> 851 self.uploaded_code = self._stage_user_code_in_s3()
852 code_dir = self.uploaded_code.s3_prefix
853 script = self.uploaded_code.script_name

/deepracer/sagemaker_venv/lib/python3.6/site-packages/sagemaker/estimator.py in _stage_user_code_in_s3(self)
892 dependencies=self.dependencies,
893 kms_key=kms_key,
--> 894 s3_client=self.sagemaker_session.s3_client)
895
896 def _model_source_dir(self):

/deepracer/sagemaker_venv/lib/python3.6/site-packages/sagemaker/fw_utils.py in tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory, dependencies, kms_key, s3_client)
193 else:
194 print("Using provided s3_client")
--> 195 s3_client.Object(bucket, key).upload_file(tar_file, ExtraArgs=extra_args)
196 finally:
197 shutil.rmtree(tmp)

/deepracer/sagemaker_venv/lib/python3.6/site-packages/boto3/s3/inject.py in object_upload_file(self, Filename, ExtraArgs, Callback, Config)
278 return self.meta.client.upload_file(
279 Filename=Filename, Bucket=self.bucket_name, Key=self.key,
--> 280 ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
281
282

/deepracer/sagemaker_venv/lib/python3.6/site-packages/boto3/s3/inject.py in upload_file(self, Filename, Bucket, Key, ExtraArgs, Callback, Config)
129 return transfer.upload_file(
130 filename=Filename, bucket=Bucket, key=Key,
--> 131 extra_args=ExtraArgs, callback=Callback)
132
133

/deepracer/sagemaker_venv/lib/python3.6/site-packages/boto3/s3/transfer.py in upload_file(self, filename, bucket, key, callback, extra_args)
285 raise S3UploadFailedError(
286 "Failed to upload %s to %s: %s" % (
--> 287 filename, '/'.join([bucket, key]), e))
288
289 def download_file(self, bucket, key, filename, extra_args=None,

S3UploadFailedError: Failed to upload /var/folders/kv/qfrzr8td1vsck5z77f1lv6xr0000gn/T/tmptehvrg1n/source.tar.gz to bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The Access Key Id you provided does not exist in our records.

@kkrglyt
Copy link

kkrglyt commented Jun 13, 2020

wonder why line 34 got so many issues, here is mine:

Failed to upload /var/folders/x9/q1g5m54n1s18sp9_k683h0_r0000gn/T/tmp48tp6xoj/source.tar.gz to bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The access key ID you provided does not exist in our records.

did your issue solved?

@mangoez
Copy link

mangoez commented Sep 17, 2020

Hi!
I'm getting this error:

Could not connect to the endpoint URL: "http://127.0.0.1:9000/bucket/rl-deepracer-sagemaker/source/sourcedir.tar.gz"

I have checked the username and password etc of the minio thingo, and I can even see the sourcedir.tar.gz in the right bucket and the right directories.

What's up?

@bhavik161
Copy link

I am stuck step 5.
I clone the repository, install minio and vnc. Viewer.
at step#5 it mentioned to cd to rl_coach but that directory doesn’t exist in my cloned repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment