I wanted to find out how much "power" Python shell scripts are afforded in AWS Glue.
For context, with AWS Glue, in general, you can choose either a Spark, Python Shell type for Jobs.
Pricing will be different because of minimal billed duration and minimal DPUs allocatable.
In my case, we are trying to use simple Python scripts (without need for PySpark), hence the investigation on Python Shell.
I declared a Python Shell (Glue 1.0, Python 3) Glue Job where the script attempts to output the disk space and CPU cores afforded.
See the test_environment.py.
This is tested with standard worker.
When running with Maximum capacity: 0.0625
(default DPU capacity),
Total: 19 GiB
Used: 5 GiB
Free: 14 GiB
no. of CPUs: 1
When running with Maximum capacity: 1
,
Total: 19 GiB
Used: 5 GiB
Free: 14 GiB
no. of CPUs: 4
As noted on their documentation, the disk space would be the same.
Number of CPUs available is expectedly different too. With multiple CPUs, we can achieve multi processing, so this is certainly powerful 💪.
Some good resources I found around the multiprocessing
module in Python:
this was tested as of Dec 11 2020.*