Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Folding at Home Babysitter - Original Code @danielocdh
#!/usr/bin/python3
# 1.0 - Original Code Belongs to @danielocdh
# 1.1 - Added ability to check FAH Control APIs on other ports
# 1.2 - Added support for Linux Control API (Slight changes in response checks)
# 1.3 - Fix for latest version of FAH Control API and Client (7.6.9)
# 1.4 - Added @danielocdh's feedback and his local changes around !# and spacing in the substitution
# 1.5 - @danielocdh updated expected result for authentication issues
# 1.6 - Removed un-needed tEnd references to end of readResult
# 1.7 - Added getting started
# 1.8 - Added "hacky" check to see if there is a new version of the gist using github rest api
################################################################################
## getting started ##
################################################################################
# 1. Install Python 3
# - https://www.howtogeek.com/197947/how-to-install-python-on-windows/
# - Scroll down the page untill you get to "How to Install Python 3" and follow it
# 2. Create a folder somewhere memorable, easy places C:\babysitter
# 3. Copy and paste the contents of this gist into notepad and save it as "babysitter.py" in C:\babysitter\
# 4. Next open a Command Prompt
# - Start Button > Command Prompt
# - Type: cd C:\babysitter
# - Hit Enter
# - Type: python babysitter.py
# 5. That should be all you need to do if you are just running 1 computer.
# If you need to run on more machines there are further instructions below
################################################################################
## options ##
################################################################################
hosts = [ #list of quoted strings, hosts or IPs, with optional colon separted port (e.g. localhost:36331), separated by comma
'localhost'
]
hostsPassword = '' #quoted string, if the host(s) don't use a password just leave it as: ''
restartLimit = 10 * 60 #in seconds, pause+unpause if next attempt to get WU is this or more
checkEvery = 2 * 60 #in seconds, do a check for all hosts every this seconds
checkUpdate = True # True or False, check for update in the script
checkUpdateCycles = 30 # number, multiply this by checkEvery and it will tell you how long between checks (defaults: 30 * 2 * 60 = 1 hour)
tConTimeout = 15 #in seconds, connection timeout
tReadTimeout = 10 #in seconds, read timeout
testMode = False # if set to True: checkEvery=6 and restartLimit=0 but won't actually pause+unpause slots
################################################################################
## code ##
################################################################################
import json
import re
import telnetlib
import time
import datetime
import urllib.request
import urllib.parse
if testMode:
restartLimit = 0
checkEvery = 6
version = 10 # internal version number that equals the number of commits in https://api.github.com/gists/1f3ac2f27790506b5e9bd0c1ec356d49/commits
countCycles = checkUpdateCycles # counter for cycles passed, set initially to the same as the interval so it'll give user feedback
countEvery = 1 #seconds, have to be a factor of checkEvery, default: 1
countEveryDec = max(0, str(countEvery)[::-1].find('.'))
countEveryDecStr = f'{{:.{countEveryDec}f}}'
def remSeconds(seconds):
if seconds > 0:
if (seconds * 10000) % (countEvery * 10000) == 0:
secondsP = countEveryDecStr.format(seconds)
pr(f'Next check in {secondsP} seconds', same=True)
time.sleep(countEvery)
seconds = round((seconds - countEvery) * 10000) / 10000
remSeconds(seconds)
def checkUpdate():
global countCycles
countCycles += 1
if(checkUpdate and countCycles >= checkUpdateCycles):
countCycles = 0
try:
resp = urllib.request.urlopen('https://api.github.com/gists/1f3ac2f27790506b5e9bd0c1ec356d49/commits')
if (resp):
commits = json.loads(resp.read().decode('utf-8'))
if (len(commits) > version):
print("New version of babysitter script is available at https://gist.github.com/jhutchings/1f3ac2f27790506b5e9bd0c1ec356d49")
except Exception as err:
print("Error checking version, continuing to run. Will check later!")
prLastLen = 0
prLastSame = False
def pr(t, indent=0, same=False, overPrev=False):
global prLastLen, prLastSame
if not overPrev and not same and prLastSame:
prLastLen = 0
print('')
t = str(t)
toPrint = (' ' * indent) + t
tLen = len(toPrint)
print(toPrint + (' ' * max(0, prLastLen - tLen)), end='\r')
prLastSame = same
prLastLen = tLen
if not same:
print('')
prLastLen = 0
def checkKeep():
while (True):
checkAll()
checkUpdate()
remSeconds(checkEvery)
def checkAll():
for host in hosts: check(host)
now = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
pr(f'check complete at {now}', 0, False, True)
prompt = '\n*>\s*'.encode('utf-8')
pyonEnd = '\n---\n'.encode('utf-8')
def readResult(expected, expectedResult=''):
index = expected[0]
readB = expected[2]
read = readB.decode('utf-8')
#noting
if index < 0 or read == '': return [False, 'nothing was read']
#expected result
if expectedResult:
result = re.sub('\s+>$', '', read.strip())
if (result != expectedResult):
return [False, f'{readB}']
#PyON->json
match = re.search('\n*PyON\s+(\d+)\s+([-_a-zA-Z\d]+)\n(.*)\n---\n', read, re.DOTALL)
#print('');print('');print('');print(index);print(match);print("read");print(read);print("readB");print(readB);print('');
if match:
version = match.group(1)
if version != '1': raise Exception('Response data version does not match')
data = match.group(3)
#to json
data = re.sub('(:\s*)False', r'\1false', data)
data = re.sub('(:\s*)True', r'\1true', data)
data = re.sub('(:\s*)None', r'\1null', data)
data = json.loads(data)
return [True, data]
#auth error
match = re.search('\nERROR: unknown command or variable', read, re.DOTALL)
if match:
raise Exception('error sending command, wrong password?')
#return read
return [True, read]
def tnCreate(host):
match = re.search('(.*):(\d+)', host);
port = 36330;
tEnd = [prompt];
if match:
host = match.group(1);
port = match.group(2);
tn = telnetlib.Telnet(host, port, tConTimeout)
readResult(tn.expect(tEnd, tReadTimeout),)
return tn
def sendCmd(tn, cmd, par=''):
#print(cmd);
if cmd == 'auth':
tEnd = [prompt]
if hostsPassword:
cmdStr = f'auth {hostsPassword}';
tn.write(f'{cmdStr}\n'.encode('utf-8'))
res = readResult(tn.expect(tEnd, tReadTimeout), 'OK')
if not res[0]: raise Exception(f'Error with {cmd}, {res[1]}')
return res[1]
return True
elif cmd == 'exit':
cmdStr = f'{cmd}';
tn.write(f'{cmdStr}\n'.encode('utf-8'))
tEnd = [prompt]
res = readResult(tn.expect(tEnd, tReadTimeout))
if not res[0]: raise Exception(f'Error with {cmd}, {res[1]}')
return res[1]
elif cmd == 'slot-info' or cmd == 'queue-info':
cmdStr = f'{cmd}';
tn.write(f'{cmdStr}\n'.encode('utf-8'))
tEnd = [pyonEnd]
res = readResult(tn.expect(tEnd, tReadTimeout))
if not res[0]: raise Exception(f'Error with {cmd}, {res[1]}')
return res[1]
elif cmd == 'get-info-and-restart':
queueData = sendCmd(tn, 'queue-info')
slotData = sendCmd(tn, 'slot-info')
###
#if type(queueData) == str: print('');print('');print('');print(queueData);print(queueData.encode('utf-8'));print('');
#if type(slotData) == str: print('');print('');print('');print(slotData);print(slotData.encode('utf-8'));print('');
restarted = []
for slot in slotData:
isStillRunning = False
queueDl = False
for queue in queueData:
if queue['slot'] == slot['id']:
if queue['state'] == 'RUNNING': isStillRunning = True
if queue['state'] == 'DOWNLOAD': queueDl = queue
if not isStillRunning and queueDl and queueDl['waitingon'] == 'WS Assignment':
match = re.match('\s?(\d+ days?)?\s?(\d+ hours?)?\s?(\d+ mins?)?\s?([\d.]+ secs?)?', queueDl['nextattempt'])
if match:
seconds = 0
if match.group(1): seconds += int(re.sub('[^\d.]', '', match.group(1))) * 3600 * 24
if match.group(2): seconds += int(re.sub('[^\d.]', '', match.group(2))) * 3600
if match.group(3): seconds += int(re.sub('[^\d.]', '', match.group(3))) * 60
if match.group(4): seconds += round(float(re.sub('[^\d.]', '', match.group(4))) * 1)
if seconds >= restartLimit:
if not testMode:
sendCmd(tn, 'pause', queueDl['slot'])
time.sleep(1)
sendCmd(tn, 'unpause', queueDl['slot'])
restarted.append([queueDl['slot'], queueDl['nextattempt']])
else: raise Exception(f'Error with {cmd}, parsing queue nextattempt:{queueDl["nextattempt"]}')
return restarted
elif par and (cmd == 'pause' or cmd == 'unpause'):
tEnd = [prompt]
cmdStr = f'{cmd} {par}';
tn.write(f'{cmdStr}\n'.encode('utf-8'))
res = readResult(tn.expect(tEnd, tReadTimeout))
if not res[0]: raise Exception(f'Error with {cmd}, {res[1]}')
return res[1]
else : return False
def check(host):
st = time.time()
pr(f'checking {host}', 1, True)
try:
tn = tnCreate(host)
sendCmd(tn, 'auth')
restarted = sendCmd(tn, 'get-info-and-restart')
if len(restarted):
pr(f'{host}: restarted {len(restarted)} slot{"s" if len(restarted) > 1 else ""}: ' + ', '.join(map(lambda item: '' + (' with '.join(item)), restarted)), 1, False, True)
sendCmd(tn, 'exit')
ed = time.time()
time.sleep(max(0, 1 - (ed - st)))
except Exception as err:
pr(f'{host} error: {err}', 1, False, True)
checkKeep()
@danielocdh

This comment has been minimized.

Copy link

danielocdh commented Apr 19, 2020

I was still testing the linux issues but it seems your script might solve that, just one thing. Are you sure the linux client is responding with prompt on exit? I remember the 7.5.1 client not sending any response so I just used this for the exit command:

    elif cmd == 'exit':
        cmdStr = f'{cmd}';
        tn.write(f'{cmdStr}\n'.encode('utf-8'))
        tn.close();
        return True

I also added a small fix, the false/true/null regexps to be:

        data = re.sub('(:\s*)False', r'\1false', data)
        data = re.sub('(:\s*)True', r'\1true', data)
        data = re.sub('(:\s*)None', r'\1null', data)

I also added the shebang line for python3 at the start of file: #!/usr/bin/python3

@jhutchings

This comment has been minimized.

Copy link
Owner Author

jhutchings commented Apr 19, 2020

I was still testing the linux issues but it seems your script might solve that, just one thing. Are you sure the linux client is responding with prompt on exit? I remember the 7.5.1 client not sending any response so I just used this for the exit command:

elif cmd == 'exit':
    cmdStr = f'{cmd}';
    tn.write(f'{cmdStr}\n'.encode('utf-8'))
    tn.close();
    return True`

It seems to, on the command prompt, but in reviewing some of the readB and read information there was a response but I think it's ordering of the information since Linux doesn't send \n>\s\n, I think it sends \n> which gets caught by the prompt regex change. Originally I just had that elif catching an EOFError and eating it until I caught that variance. That being said it probably makes more sense to return True vs returning the message sent back since it's kinda garbage :D

I also added a small fix, the false/true/null regexps to be:

    data = re.sub('(:\s*)False', r'\1false', data)
    data = re.sub('(:\s*)True', r'\1true', data)
    data = re.sub('(:\s*)None', r'\1null', data)

I also added the shebang line for python3 at the start of file: #!/usr/bin/python3`

I'll add the shebang and the sub changes in, does this gist work for you or should we set something easier to collaborate on up?
@bafoah has some interesting stuff going too, dunno if it makes sense to merge everything together or keep them as separate projects. I don't know @bafoah 's ultimate goals as theirs seems to have a lot more logic behind it and gathering of other metrics. One thing that might simplify all of our lives is actually just using the FAHControl's Connection class and layer the logic of re-trying on top of it, instead of trying to stay on top control API differences since they use the API the same as we do and their tool.

Also @bafoah they have an example of the safe eval that I was talking about, they just eval(data, {}, {}) since they need neither global or local scoped functions/variables for converting PyON to Python Objects.

TBH I'm personally interested as well in working with @tamaracha as well as they had the same idea as me to do a native NodeJS version 😃. Originally I was going to write my own grammar as they were using antlr4 and I wanted to stay native javascript but they changed to nearley w/ moo which was what I was literally researching, I believe, at the same time they were writing their update and pushing to their repo hahaha.

@danielocdh

This comment has been minimized.

Copy link

danielocdh commented Apr 19, 2020

I think for this file this gist should be enough.
Originally I just wanted something simple to restart the slots automatically, I think that many people might just want that. If there is people that wants something else, that could be made into another project.

About eval, I honestly don't know much about python but I just tested a simple eval('open("file.txt", "w")', {}, {}) and as expected it created/cleared the file. It was good I decided not to use eval, having an exploit on something as simple as this script would be really bad.

@jhutchings

This comment has been minimized.

Copy link
Owner Author

jhutchings commented Apr 19, 2020

Yeah I figured the babysitter was good as something just simple, did it's job and got out. No need to be overly fancy, so I can agree with your sentiment that if people want something else there can be other projects for that!

Yeah eval is a scary beast, same with most languages that executing code sent to you by a third party or a user is generally a worrisome prospect. I am surprised that it doesn't seem that there are many PyON libraries to serialize/deserialize (at least that I'm aware of) if it's a "standard" that someone was working on. I really hope they just switch to JSON as there are JSON libs like the one you are using and it's a standard for many other systems, failing that protobuf or something would be nice too. PyON just seems like an obtuse choice right now haha, the biggest challenge I have with their implementation of PyON is that they aren't consistent in using it...

Funny enough on the eval side technically the FAHControl API technically exposes a vulnerability in that the socket protocol is not secured and if something MITM that stream they could send commands to FAHControl to eval with no global/local scopes, like your file example (https://github.com/FoldingAtHome/fah-control/blob/master/fah/Connection.py#L201)

@danielocdh

This comment has been minimized.

Copy link

danielocdh commented Apr 19, 2020

Oh wow, that's... wow. Gonna have to think what to do about it.

Anyways... the authentication was not working with 1.4, I made some changes and tested with windows(7.5.1 and 7.6.9) and ubuntu(7.5.1 and 7.6.9) fahclients, it seems to be working now
I replaced prompt to be prompt = '\n*>\s*'.encode('utf-8') and #expected result inside readResult with:

    #expected result
    if expectedResult:
        result = re.sub('\s+>$', '', read.strip())
        if (result != expectedResult):
            return [False, f'{readB}']

I didn't have "waiting on ws assignment" slots on the 4 systems so I tested pause/unpause semi manually

@tamaracha

This comment has been minimized.

Copy link

tamaracha commented Apr 19, 2020

Hi, I was not aware of this gist until I was mentioned here. Indeed, both decisions (PyON and telnet) look really weird from today's technical point of view. ;-) I have no experience in Python, but I thought that YAML was the python way of serialization, if JSON was not sufficient. Both would be fine in my opinion. The telnet interface serves as a frontend for humans and as an api for machines at the same time, which leads to mixed concerns at makes it difficult for both kinds of users.

I didn't find anything about PyON online, that's why I wrote my fah-pyon package. Your are welcome to try it out, if you find it useful. It's not on npm, but it can be installed from github releases. Maybe I must update the readme since I migrated to nearley.

I also have a fah-client package which contains helper stuff for command generation and response parsing. This is more related to the telnet topic, but it uses fah-pyon.

In fact, at the beginning of this journey, I wanted to create a more screenreader-friendly web frontend. The FahControl GUI is completely inaccessible and the WebControl could be nicer. I am trying to create an electron app which uses my fah libraries. It seems to work well, but I haven't published a project yet, because I wanted to explore my architecture decisions a bit further.

@jhutchings, if you're still interested in working with me (I am only one person), we should discuss the fields and tasks that are still open.

@jhutchings

This comment has been minimized.

Copy link
Owner Author

jhutchings commented Apr 19, 2020

@danielocdh updated to 1.5 and then 1.6 to clean up reference to tEnd that was no longer needed after cleaning the expectedResult check

@tamaracha I wanted to bring visibility to your projects from these two as I've been checking out your updates and am excited to have another person working on this. Do you have a board that you're working from? I'll try to be as active as possible to help update these libs. I like the idea of an electron app, I haven't done any work in electron yet but I can do some research. I was hoping to write a client on top of either an initial grammar & lib or now that you've advanced so far to work with you and what you have developed. After that maybe hosting a service that the client communicates to, build a UI on top of that. Potentially eventually pulling together data from other users using said client (long term) as it's more real-time direct from client and then cross-relating it to the results from F@H Stats to sure up the values later or show the delta between client predictions vs WS/CS actual values. It seems that Extreme OC's F@H stats are great but it's the piece that they lack is that they are fully dependent on the F@H team's updates and they are done only periodically and in bulk.

Granted I have other after work projects that I should be working on so it might be a slow burn, so any help I can provide any of the aforementioned project owners alongside anything I might try to play with the better 😃

@jhutchings

This comment has been minimized.

Copy link
Owner Author

jhutchings commented Apr 20, 2020

Added 1.7 with getting started info based on what marknd59 wrote in the forum

@bafoah

This comment has been minimized.

Copy link

bafoah commented Apr 21, 2020

Hi, I just start writing my own code because I didn't know is already written, my goal is just to know python better (I just finding this project could be interesting and useful too)

Beside wu short-out (and this pause-unpause solution) I think it would be better if this babysitter can give us notification if there is something wrong (like FAHClient freezing - there is no log-updates in a looooong time) for now I just think of Telegram Bot

I also fold in several client, so I think it would be better if I can "simplify" the process, for example after installed FAHClient instead setting this machine by hand, I want just to run this babysitter, and viola!! Yes I know it only one-time-setup (and I just very lazy), but just imagined if you fold on cloud, and if you abusing google free credit (yes I do it so I know the pain...)

I use eval just because I follow this reference https://github.com/FoldingAtHome/fah-control/wiki/3rd-party-FAHClient-API but since @jhutchings mention it, I planed to make my own "parsing" method (basically it just converting string to python a variable), maybe....

My babysitter future plan

  1. Have config file - so babysitter can run on several machine with different configuration (host-list etc)
  2. Can manage FAHClient setting (user, passkey, team, next-unit-percentage, client-type etc)
  3. Summarized all machine and folding slot into simple matrix (ie. like PPD, GPU/CPU count)
  4. Display another information from external site (Like project cause - because I think so many people will be excited if they know they fold for Covid-19, team info, total point, etc)

Maybe all of this doesn't fit to do with python (because involved some kind of "GUI"), maybe in future I can create "http-server" and just display everything on web browser... maybe, I just don't know...

Maybe I just try to make folding a little bit fun... for me and everyone else... Because you know, its boring, and eat my electricity like a monster...

@jhutchings

This comment has been minimized.

Copy link
Owner Author

jhutchings commented Apr 21, 2020

Added 1.8 with an optional check for script updates 😃

@jhutchings

This comment has been minimized.

Copy link
Owner Author

jhutchings commented Apr 21, 2020

@bafoah I completely agree with the write something, it's fun to do right? I was about to start to write something initially for the pause issue then found this and modified. Then I was going to write some node stuff and found node-fah-xyz stuff so I figured might as well help that out.

The Telegram stuff is kinda cool, I haven't used that in a while (one of my servers used to Telegram me if there were issues with downloads). It can likely be useful for some people too! I kinda like the idea of having the default values for your setup in babysitter as well since it's not too hard and it makes sure people are configured correctly, heck the other day I found out my Azure client lost a config entry and I was folding for Default on Anonymous for I don't know how long.

But honestly keep on building! It's fun right? Honestly whenever something I'm playing with has an API I have to immediately look at what data is available via that API to imagine new things I could build from it...
I have similar ideas to you for that UI I was talking about above 👍

As far as the eval stuff goes, the comments above are purely just observational and nothing against the approach. Heck the Folding@Home people are using it for their implementation of FAHControl, there are inherent risks with it but if you look at the possibility of attack it's fairly low right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.