Skip to content

Instantly share code, notes, and snippets.

@jacobtomlinson
Last active February 26, 2024 14:35
Show Gist options
  • Star 18 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jacobtomlinson/ee5ba79228e42bcc9975faf0179c3d1a to your computer and use it in GitHub Desktop.
Save jacobtomlinson/ee5ba79228e42bcc9975faf0179c3d1a to your computer and use it in GitHub Desktop.
Dask on Fargate from scratch
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KseniiaPalin
Copy link

Thank you!

Probably the first guide of sort which works from the first attempt :)

p.s. Not counting setting the region. I had to to the following to overwrite my default region:
region = 'eu-west-2' ec2 = boto3.client('ec2', region_name=region) ecs = boto3.client('ecs', region_name=region) iam = boto3.client('iam', region_name=region) logs = boto3.client('logs', region_name=region)

@jacobtomlinson
Copy link
Author

@KseniiaPalin great to hear it worked for you! I usually set my region once with aws configure using the CLI tool, always forget about that :).

This work is actually now wrapped up in dask-cloudprovider. Would love some folks to test it out and give feedback. It should now be as easy as:

from dask_cloudprovider import FargateCluster
cluster = FargateCluster()  # This runs basically the whole notebook above

from distributed import Client
client = Client(cluster)

@KseniiaPalin
Copy link

Thank you @jacobtomlinson!

With dask-cloudprovider I am currently having problems specifying configuration.
My default AWS environment is eu-north-1 and I don't want to change it.
Here is the code I use to create a cluster:

from dask_cloudprovider import FargateCluster
cluster = FargateCluster(
    environment={ 'AWS_REGION': 'eu-west-2'}, # The resources are being created in `eu-north-1` anyway
    name='dask_fargate_cluster' # The name is being generated anyway
)  

What is a correct way to configure the AWS environment for a cluster?

Also, in this Notebook you are re-using resources when they've been created, but dask-cloudprovider produces and exception when i.e. CloudWatch log group name is already taken.

@jacobtomlinson
Copy link
Author

Thanks for testing this out. If you have issues like these could you please raise them on dask-cloudprovider.

Quick answers you are setting the environment for your workers there, not the current session. I recommend using os.environ to set you local variables. This does suggest that there should be a kwarg to set you region in FargateCluster, perhaps to set yours keys too.

Would you be able to raise all of this in issues on dask-cloudprovider? It would be super helpful!

@ks233ever
Copy link

ks233ever commented Feb 18, 2020

@jacobtomlinson if we already have a fargate cluster set up, how are we able to connect to it in this step

c = distributed.Client('tcp://{}:8786'.format(scheduler_public_ip))

following the above steps returns a timeout error

@jacobtomlinson
Copy link
Author

Your cluster needs to expose the dash scheduler on the scheduler_public_ip on port 8786. If it isn't you will need to modify this to use the correct url.

@RichardScottOZ
Copy link

Thanks Jacob...scheduler typo under Create Task Definitions btw

@adair-kovac
Copy link

I'm getting the same timeout as ks233ever was, yet can connect to the ip and port from telnet on the same machine where the notebook is running.

@jacobtomlinson
Copy link
Author

@adair-kovac this gist is pretty old now. All current work is being done in dask-cloudprovider and things here are definitely not supported.

@adamjacobgilbert
Copy link

Is there an example of the above, but using the Dask Cloudprovider FargateCluster() class?

@adamjacobgilbert
Copy link

Thanks. Ive read this one a few times. I am struggling with the networking set up.

cluster = FargateCluster(
cluster_name_template='Dask-Test-30',
image="daskdev/dask:latest",
#environment={'EXTRA_PIP_PACKAGES':'joblib'},
scheduler_cpu=1024,
scheduler_mem=4096,
worker_cpu=4096,
worker_mem=16384,
execution_role_arn="arn:aws:iam::260849320:role/dask-fargate-execution", #UPDATED
task_role_arn='arn:aws:iam::260849720:role/dask-fargate-task', #UPDATED
#task_role_policies=[]
#vpc='vpc-0280b92031b9f010c',
subnets=[
'subnet-06cc237e',
'subnet-2a505861',
'subnet-cf04f2',
'subnet-3a2756',
'subnet-08ba9c01b59b6'
], # updated
security_groups=['sg-02fe57ad943901'], #updated
n_workers=25,
fargate_use_private_ip=False,
scheduler_timeout="15 minutes"

                    )

cluster

@adamjacobgilbert
Copy link

When I got to generate the link to the scheduler, I get network / connectivity issues, which makes me think my companies firewall is messing it up, and that my security group rules need to be updated, but I dont know how to troubleshoot this.

from IPython.core.display import display, HTML display(HTML('<a href="{url}" target="_blank">{url}</a>'.format(url='http://{}:8787/status'.format(scheduler_public_ip))))

@RichardScottOZ
Copy link

Yeah, an AWS issue there, had the same problem.

@RichardScottOZ
Copy link

One of the AWS Data Scientists suggested this post to me about using the load balancer to monitor:- https://aws.amazon.com/blogs/machine-learning/machine-learning-on-distributed-dask-using-amazon-sagemaker-and-aws-fargate/

@adamjacobgilbert
Copy link

Ive tried this as well. The cloud formation template doesnt work either. I essentially need to get my python instance into a the same VPC that the dask fargate cluster is in, or conversely, to use a proxy server.

@c-leber
Copy link

c-leber commented Oct 7, 2021

@jacobtomlinson if we already have a fargate cluster set up, how are we able to connect to it in this step

c = distributed.Client('tcp://{}:8786'.format(scheduler_public_ip))

following the above steps returns a timeout error

Hi @ks233ever and @adair-kovac
Any resolution of this Client timeout issue?

@jacobtomlinson
Copy link
Author

To reiterate

All current work is being done in dask-cloudprovider and things here are definitely not supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment