-
-
Save jacobtomlinson/ee5ba79228e42bcc9975faf0179c3d1a to your computer and use it in GitHub Desktop.
Thanks Jacob...scheduler typo under Create Task Definitions btw
I'm getting the same timeout as ks233ever was, yet can connect to the ip and port from telnet on the same machine where the notebook is running.
@adair-kovac this gist is pretty old now. All current work is being done in dask-cloudprovider and things here are definitely not supported.
Is there an example of the above, but using the Dask Cloudprovider FargateCluster() class?
Thanks. Ive read this one a few times. I am struggling with the networking set up.
cluster = FargateCluster(
cluster_name_template='Dask-Test-30',
image="daskdev/dask:latest",
#environment={'EXTRA_PIP_PACKAGES':'joblib'},
scheduler_cpu=1024,
scheduler_mem=4096,
worker_cpu=4096,
worker_mem=16384,
execution_role_arn="arn:aws:iam::260849320:role/dask-fargate-execution", #UPDATED
task_role_arn='arn:aws:iam::260849720:role/dask-fargate-task', #UPDATED
#task_role_policies=[]
#vpc='vpc-0280b92031b9f010c',
subnets=[
'subnet-06cc237e',
'subnet-2a505861',
'subnet-cf04f2',
'subnet-3a2756',
'subnet-08ba9c01b59b6'
], # updated
security_groups=['sg-02fe57ad943901'], #updated
n_workers=25,
fargate_use_private_ip=False,
scheduler_timeout="15 minutes"
)
cluster
When I got to generate the link to the scheduler, I get network / connectivity issues, which makes me think my companies firewall is messing it up, and that my security group rules need to be updated, but I dont know how to troubleshoot this.
from IPython.core.display import display, HTML display(HTML('<a href="{url}" target="_blank">{url}</a>'.format(url='http://{}:8787/status'.format(scheduler_public_ip))))
Yeah, an AWS issue there, had the same problem.
One of the AWS Data Scientists suggested this post to me about using the load balancer to monitor:- https://aws.amazon.com/blogs/machine-learning/machine-learning-on-distributed-dask-using-amazon-sagemaker-and-aws-fargate/
Ive tried this as well. The cloud formation template doesnt work either. I essentially need to get my python instance into a the same VPC that the dask fargate cluster is in, or conversely, to use a proxy server.
@jacobtomlinson if we already have a fargate cluster set up, how are we able to connect to it in this step
c = distributed.Client('tcp://{}:8786'.format(scheduler_public_ip))
following the above steps returns a timeout error
Hi @ks233ever and @adair-kovac
Any resolution of this Client timeout issue?
To reiterate
All current work is being done in dask-cloudprovider and things here are definitely not supported.
Your cluster needs to expose the dash scheduler on the
scheduler_public_ip
on port8786
. If it isn't you will need to modify this to use the correct url.