-
-
Save jacobtomlinson/ee5ba79228e42bcc9975faf0179c3d1a to your computer and use it in GitHub Desktop.
@KseniiaPalin great to hear it worked for you! I usually set my region once with aws configure
using the CLI tool, always forget about that :).
This work is actually now wrapped up in dask-cloudprovider. Would love some folks to test it out and give feedback. It should now be as easy as:
from dask_cloudprovider import FargateCluster
cluster = FargateCluster() # This runs basically the whole notebook above
from distributed import Client
client = Client(cluster)
Thank you @jacobtomlinson!
With dask-cloudprovider
I am currently having problems specifying configuration.
My default AWS environment is eu-north-1
and I don't want to change it.
Here is the code I use to create a cluster:
from dask_cloudprovider import FargateCluster
cluster = FargateCluster(
environment={ 'AWS_REGION': 'eu-west-2'}, # The resources are being created in `eu-north-1` anyway
name='dask_fargate_cluster' # The name is being generated anyway
)
What is a correct way to configure the AWS environment for a cluster?
Also, in this Notebook you are re-using resources when they've been created, but dask-cloudprovider produces and exception when i.e. CloudWatch log group name is already taken.
Thanks for testing this out. If you have issues like these could you please raise them on dask-cloudprovider
.
Quick answers you are setting the environment for your workers there, not the current session. I recommend using os.environ
to set you local variables. This does suggest that there should be a kwarg to set you region in FargateCluster
, perhaps to set yours keys too.
Would you be able to raise all of this in issues on dask-cloudprovider
? It would be super helpful!
@jacobtomlinson if we already have a fargate cluster set up, how are we able to connect to it in this step
c = distributed.Client('tcp://{}:8786'.format(scheduler_public_ip))
following the above steps returns a timeout error
Your cluster needs to expose the dash scheduler on the scheduler_public_ip
on port 8786
. If it isn't you will need to modify this to use the correct url.
Thanks Jacob...scheduler typo under Create Task Definitions btw
I'm getting the same timeout as ks233ever was, yet can connect to the ip and port from telnet on the same machine where the notebook is running.
@adair-kovac this gist is pretty old now. All current work is being done in dask-cloudprovider and things here are definitely not supported.
Is there an example of the above, but using the Dask Cloudprovider FargateCluster() class?
Thanks. Ive read this one a few times. I am struggling with the networking set up.
cluster = FargateCluster(
cluster_name_template='Dask-Test-30',
image="daskdev/dask:latest",
#environment={'EXTRA_PIP_PACKAGES':'joblib'},
scheduler_cpu=1024,
scheduler_mem=4096,
worker_cpu=4096,
worker_mem=16384,
execution_role_arn="arn:aws:iam::260849320:role/dask-fargate-execution", #UPDATED
task_role_arn='arn:aws:iam::260849720:role/dask-fargate-task', #UPDATED
#task_role_policies=[]
#vpc='vpc-0280b92031b9f010c',
subnets=[
'subnet-06cc237e',
'subnet-2a505861',
'subnet-cf04f2',
'subnet-3a2756',
'subnet-08ba9c01b59b6'
], # updated
security_groups=['sg-02fe57ad943901'], #updated
n_workers=25,
fargate_use_private_ip=False,
scheduler_timeout="15 minutes"
)
cluster
When I got to generate the link to the scheduler, I get network / connectivity issues, which makes me think my companies firewall is messing it up, and that my security group rules need to be updated, but I dont know how to troubleshoot this.
from IPython.core.display import display, HTML display(HTML('<a href="{url}" target="_blank">{url}</a>'.format(url='http://{}:8787/status'.format(scheduler_public_ip))))
Yeah, an AWS issue there, had the same problem.
One of the AWS Data Scientists suggested this post to me about using the load balancer to monitor:- https://aws.amazon.com/blogs/machine-learning/machine-learning-on-distributed-dask-using-amazon-sagemaker-and-aws-fargate/
Ive tried this as well. The cloud formation template doesnt work either. I essentially need to get my python instance into a the same VPC that the dask fargate cluster is in, or conversely, to use a proxy server.
@jacobtomlinson if we already have a fargate cluster set up, how are we able to connect to it in this step
c = distributed.Client('tcp://{}:8786'.format(scheduler_public_ip))
following the above steps returns a timeout error
Hi @ks233ever and @adair-kovac
Any resolution of this Client timeout issue?
To reiterate
All current work is being done in dask-cloudprovider and things here are definitely not supported.
Thank you!
Probably the first guide of sort which works from the first attempt :)
p.s. Not counting setting the region. I had to to the following to overwrite my default region:
region = 'eu-west-2' ec2 = boto3.client('ec2', region_name=region) ecs = boto3.client('ecs', region_name=region) iam = boto3.client('iam', region_name=region) logs = boto3.client('logs', region_name=region)