Skip to content

Instantly share code, notes, and snippets.

@jacobtomlinson
Last active February 26, 2024 14:35
Show Gist options
  • Star 18 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jacobtomlinson/ee5ba79228e42bcc9975faf0179c3d1a to your computer and use it in GitHub Desktop.
Save jacobtomlinson/ee5ba79228e42bcc9975faf0179c3d1a to your computer and use it in GitHub Desktop.
Dask on Fargate from scratch
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jacobtomlinson
Copy link
Author

@adair-kovac this gist is pretty old now. All current work is being done in dask-cloudprovider and things here are definitely not supported.

@adamjacobgilbert
Copy link

Is there an example of the above, but using the Dask Cloudprovider FargateCluster() class?

@adamjacobgilbert
Copy link

Thanks. Ive read this one a few times. I am struggling with the networking set up.

cluster = FargateCluster(
cluster_name_template='Dask-Test-30',
image="daskdev/dask:latest",
#environment={'EXTRA_PIP_PACKAGES':'joblib'},
scheduler_cpu=1024,
scheduler_mem=4096,
worker_cpu=4096,
worker_mem=16384,
execution_role_arn="arn:aws:iam::260849320:role/dask-fargate-execution", #UPDATED
task_role_arn='arn:aws:iam::260849720:role/dask-fargate-task', #UPDATED
#task_role_policies=[]
#vpc='vpc-0280b92031b9f010c',
subnets=[
'subnet-06cc237e',
'subnet-2a505861',
'subnet-cf04f2',
'subnet-3a2756',
'subnet-08ba9c01b59b6'
], # updated
security_groups=['sg-02fe57ad943901'], #updated
n_workers=25,
fargate_use_private_ip=False,
scheduler_timeout="15 minutes"

                    )

cluster

@adamjacobgilbert
Copy link

When I got to generate the link to the scheduler, I get network / connectivity issues, which makes me think my companies firewall is messing it up, and that my security group rules need to be updated, but I dont know how to troubleshoot this.

from IPython.core.display import display, HTML display(HTML('<a href="{url}" target="_blank">{url}</a>'.format(url='http://{}:8787/status'.format(scheduler_public_ip))))

@RichardScottOZ
Copy link

Yeah, an AWS issue there, had the same problem.

@RichardScottOZ
Copy link

One of the AWS Data Scientists suggested this post to me about using the load balancer to monitor:- https://aws.amazon.com/blogs/machine-learning/machine-learning-on-distributed-dask-using-amazon-sagemaker-and-aws-fargate/

@adamjacobgilbert
Copy link

Ive tried this as well. The cloud formation template doesnt work either. I essentially need to get my python instance into a the same VPC that the dask fargate cluster is in, or conversely, to use a proxy server.

@c-leber
Copy link

c-leber commented Oct 7, 2021

@jacobtomlinson if we already have a fargate cluster set up, how are we able to connect to it in this step

c = distributed.Client('tcp://{}:8786'.format(scheduler_public_ip))

following the above steps returns a timeout error

Hi @ks233ever and @adair-kovac
Any resolution of this Client timeout issue?

@jacobtomlinson
Copy link
Author

To reiterate

All current work is being done in dask-cloudprovider and things here are definitely not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment