Created
October 13, 2014 23:07
-
-
Save andre-merzky/fd80d083e30d9df7e701 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Request #302541 | |
inconsistent backend state ? | |
Andre Merzky | |
Jul 03 06:18 | |
I have trouble to understand the semantics of various globus online commands, as the results do not reflect my expectations. In particular, I would expect that an operation which returns no error has actually *completed* on the backend, i.e. that the respective changes are committed onto the storage system. But in fact I get a different impression, and I am not sure if that is because of aggressive caching, or because of different, conflicting code paths, or something else. My best guess is that GO uses different protocols for different operations, and state is getting out of sync? | |
As example I include a globus online shell session below, which is an exemplary for the kind of problem I am encountering in different contexts (i.e. with different operations), too. [Some output lines of ls which refer to other people's files have been omitted in the session log, for clarity] | |
--------------------------------------------------------------------------------- | |
$ ls -l gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-03 10:52 am/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-02 22:43 am1/ | |
$ rm -r -f gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/am | |
Task ID: 7f1dd2f9-02a0-11e4-b581-12313940394d | |
Type <CTRL-C> to cancel or bg<ENTER> to background | |
[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 1/1 0.00 mbps | |
$ rm -r -f gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/am1 | |
Task ID: 81c16d8b-02a0-11e4-b581-12313940394d | |
Type <CTRL-C> to cancel or bg<ENTER> to background | |
[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 1/1 0.00 mbps | |
$ ls -l gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-03 10:52 am/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-02 22:43 am1/ | |
$ rm -r -f gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/am/ | |
Task ID: 879bf14f-02a0-11e4-b581-12313940394d | |
Type <CTRL-C> to cancel or bg<ENTER> to background | |
[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 1/1 0.00 mbps | |
$ ls -l gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/ | |
$ mkdir gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/am | |
Error: Path already exists | |
Details: Error (mkdir) | |
Server: andremerzky#gsiftp_gridftp.stampede.tacc.xsede.org | |
(gridftp.stampede.tacc.xsede.org:2811) | |
Message: Path '/tmp/am' already exists | |
$ ls -l gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/ | |
$ mkdir gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/am/ | |
$ ls -l gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-03 10:52 am/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-02 22:43 am1/ | |
--------------------------------------------------------------------------------- | |
Note that operations seem to fail for the wrong reason, seem not to fail but not to do anything either, and that the file system entries seem to come and go somewhat randomly. | |
What am I missing? Is this expected behavior? Is this a problem of this specific backend (I did not test other backends thoroughly)? | |
I am trying to use those commands programatically, and the (apparent) inconsistent state is causing me quite some grief, to be honest... For completeness, some details on the current setup of the shell: | |
--------------------------------------------------------------------------------- | |
$ profile | |
User Name: andremerzky | |
DN: /C=US/O=National Center for Supercomputing Applications/CN=Andre Merzky | |
Email Address: andre--globus@merzky.net | |
Task Notifications: No | |
$ endpoint-list -v | |
Name : andremerzky#gsiftp_gridftp.stampede.tacc.xsede.org | |
Host(s) : gsiftp://gridftp.stampede.tacc.xsede.org:2811 | |
Subject(s) : | |
Target Endpoint : n/a | |
Default Directory : n/a | |
Force Encrypted Transfer: No | |
Disable Verify : No | |
MyProxy Server : n/a | |
MyProxy DN : n/a | |
MyProxy OAuth Server : n/a | |
Credential Status : ACTIVE | |
Credential Expires : 2014-07-12 09:03:41Z | |
Credential Subject : /C=US/O=National Center for Supercomputing Applications/CN=Andre Merzky/CN=1960031471 | |
S3 URL : n/a | |
Owner Activated : No | |
Name : andremerzky#gsisftp_trestles-dm1.sdsc.edu | |
Host(s) : gsiftp://trestles-dm1.sdsc.edu:2811 | |
Subject(s) : | |
Target Endpoint : n/a | |
Default Directory : n/a | |
Force Encrypted Transfer: No | |
Disable Verify : No | |
MyProxy Server : n/a | |
MyProxy DN : n/a | |
MyProxy OAuth Server : n/a | |
Credential Status : ACTIVE | |
Credential Expires : 2014-07-12 09:03:41Z | |
Credential Subject : /C=US/O=National Center for Supercomputing Applications/CN=Andre Merzky/CN=1960031471 | |
S3 URL : n/a | |
Owner Activated : No | |
--------------------------------------------------------------------------------- | |
Thanks, Andre. | |
Comments | |
User photo | |
Globus Team - Diane | |
globus support | |
Hello Andre, | |
Thanks for reaching out to Support! We are researching your questions. An engineer will get back to you when they have more information to provide, or need to ask questions. | |
Thanks & Regards, | |
Diane Collins | |
Globus HelpDesk | |
July 03, 2014 09:53 | |
User photo | |
Globus Team - Stephen | |
globus support | |
Hello Andre, | |
To the best of our ability to tell, after some investigation, everything is working correctly. | |
The issues that you are experiencing arise from the semantics of Globus Transfer operations. | |
As you note below, when an command in the CLI returns, it does not indicate that the requested Transfer operation has been completed. | |
This is not an accident, but part of the CLI design. The return of a CLI operation without error messages indicates that the Transfer task has been submitted. | |
You can check on the state of running Transfers with the CLI's `status` command, which by default lists all of your active Transfer tasks. | |
In one of our CLI tutorial documents, there is a section on "Monitoring" that you may find useful: https://support.globus.org/entries/29642203 | |
I hope that this answers your questions satisfactorily. If you have further issues, or want more information than is provided in the above document, please don't hesitate to contact us again. | |
Thanks, | |
-Stephen | |
July 08, 2014 14:31 | |
User photo | |
Andre Merzky | |
But alas, I don't think that has anything to do with tasks, really. I don't use '-D' on the rm calls, so the call should (according to the man page) only return success if the task was successfully *completed*. I can also inspect the listed task ID afterwards, and see it in 'SUCCEEDED' state 00 and still the result is not as expected. Worse, I see directories which are unrelated to the last command *appearing* on the next ls: | |
---------------------------------------------------------------------------------- | |
merzky@cameo:~ $ gsissh andremerzky@cli.globusonline.org | |
Welcome to globusonline.org, andremerzky. Type 'help' for help. | |
$ ls -l gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-03 19:25 am/ | |
-rw------- root root 4294967296 2013-03-11 17:00 swapfile | |
... | |
$ rm -r -f gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/am | |
Task ID: 5aaf9c91-06dc-11e4-b589-12313940394d | |
Type <CTRL-C> to cancel or bg<ENTER> to background | |
[XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 1/1 0.00 mbps | |
$ status 5aaf9c91-06dc-11e4-b589-12313940394d | |
Task ID : 5aaf9c91-06dc-11e4-b589-12313940394d | |
Request Time: 2014-07-08 20:13:41Z | |
Command : rm -r -f gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/am | |
Label : n/a | |
Status : SUCCEEDED | |
$ ls -l gsiftp_gridftp.stampede.tacc.xsede.org:/tmp/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-03 12:27 am/ | |
drwxrwxr-x tg803521 G-81625 4096 2014-07-02 22:43 am1/ | |
-rw------- root root 4294967296 2013-03-11 17:00 swapfile | |
... | |
----------------------------------------------------------------------------------------- | |
Can I really be sure that those operations end up on the same storage backend? It kind of looks like this, as some files have the exact same size and access time (which would be very surprising otherwise) -- I included an example ('swapfile) in the listing above. | |
If you insist that this is expected behavior, I can accept that -- but that makes it neigh impossible for me to use the CLI programatically I'm afraid :/ | |
Thanks :) Andre. | |
July 08, 2014 15:22 | |
User photo | |
Globus Team - Stephen | |
globus support | |
Hi Andre, | |
I apologize for misunderstanding your issue. You are correct, the `rm` calls without `-D` are synchronous commands that return upon completion. | |
I'm not aware of any instances of this type of problem appearing on smaller systems, so it seems that this is an issue with GridFTP with a distributed Lustre backend. | |
I've opened up discussion within our team about this issue, and I will get back to you when I know more. | |
Thanks, | |
-Stephen | |
July 09, 2014 14:17 | |
User photo | |
Andre Merzky | |
Hi Stephen, | |
thanks for the follow-up! I didn't bother to test other systems similarly, but if you have the feeling it might be caused by the backend FS type, I'll run a couple of tests on non-lustre machines. Honestly, I can't imagine this problem to be very prevalent, it would have triggered problems all over the place... | |
Thanks again, Andre. | |
July 09, 2014 14:24 | |
User photo | |
Globus Team - Stephen | |
globus support | |
Hi Andre, | |
After discussing this with our team, we have found two possible sources for this issue. | |
The first is that you are using the `-f` option to `rm` in the CLI. This ignores some classes of errors silently, so the files may not actually be deleted by these `rm` commands. To determine if this is interfering, simply run without the `-f` option. | |
The second is a delayed write of file metadata in Lustre: https://jira.hpdd.intel.com/browse/LU-274 | |
The bug reported there is not identical to your issue, but it may be related. Unfortunately, I'm not sure how we can determine whether or not this is the case unless Stampede is running an unaffected version of Lustre. | |
We will continue to look into possible sources of this problem, but it seems likely at this stage that they are related to the configuration of Stampede, not of Globus. | |
Thanks, | |
-Stephen | |
July 14, 2014 10:22 | |
User photo | |
Globus Team - Stephen | |
globus support | |
Hello Andre, | |
We haven't heard from you on this ticket for a while, and there has been no new information at our end. | |
As a result, we're going to assume that the issue has been resolved to your satisfaction, or that you have resolved it yourself, and close the ticket. | |
If you have any further questions, please don't hesitate to contact us again. | |
Thanks, | |
-Stephen | |
July 24, 2014 14:41 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment