The following are created using Ian's ID so that the Extractor he trained can be used.
If not logged in, run the following from the command line.
curl -c da-user-cookies.txt -XPOST -d "username=your-name&password=your-password" "https://api.staging-owl.com/auth/login"
The Extractor is only able to keep track of one run at a time. If multiple people are following these instructions then they will interfere with each other, it is not a bug. The UI will prevent this however curl doesn't care.
The URLs batch will use must be attached to the Extractor urlList
field. Note: In the example below the final url
is a deliberate error. Replace the attachment if different urls are required.
curl -vv -b da-user-cookies.txt -H "Content-Type: text/plain" -XPUT "https://store.staging-owl.com/extractor/f1b0eb39-d617-49f1-8571-6183881c9895/_attachment/urlList" -d 'http://doom.import.io/php/playback/playback-simple-1-results.php?query=asdar [11:53:16]
http://doom.import.io/php/playback/playback-simple-1-results.php?query=asda2
http://doom.import.io/php/playback/playback-simple-1-results.php?query=asda4
http://doom.import.io/php/playback/playback-simple-1-results.php?query=asdar3
sttp://doom.import.io/php/playback/playback-simple-1-results.php?query=asdar
'
If the CrawlRun extractor.nextCrawlRunId has a non-terminal status then the existing job will need to be cancelled. Note, as you are working as Ian you may be able cancelling someone else's (likely Ian's) job.
curl -b da-user-cookies.txt -XPOST 'https://run.staging-owl.com/f1b0eb39-d617-49f1-8571-6183881c9895/cancel'
To start the run:
curl -b da-user-cookies.txt -XPOST 'https://run.staging-owl.com/f1b0eb39-d617-49f1-8571-6183881c9895/start
When completed the
{
_meta: {
timestamp: 1458835191309,
lastEditorGuid: "7c574c79-0a4e-40af-8a9d-c6234137c230",
ownerGuid: "7c574c79-0a4e-40af-8a9d-c6234137c230",
creatorGuid: "84920b9e-9578-3948-0174-5f15b344d094",
creationTimestamp: 1458835174211
},
guid: "8761e499-b82e-4109-af38-e8c263b80850",
runtimeConfigId: "6d508bc2-10cf-4893-b1c1-4d9f05e682e8",
extractorId: "f1b0eb39-d617-49f1-8571-6183881c9895",
stoppedAt: 1458835175618,
totalUrlCount: 5,
successUrlCount: 4,
failedUrlCount: 1,
state: "FINISHED",
urlListId: "d01de56b-b2e5-4781-83e4-945b6578d7c1",
json: "ab62191f-5027-420a-8c39-6138c3e8e6be",
csv: "f35baf94-62ff-428d-b35f-b55b4f1f611a",
log: "4026cc68-b910-4f97-bbcd-f3205c680c86",
sample: "94f14818-c307-4141-bf52-f73a41655c40"
}
The attachements for each can be retrieved using a command like the follwing:
curl -b da-user-cookies.txt --remote-name -H "Accept-Encoding: gzip" -XGET 'https://store.staging-owl.com/store/crawlRun/aaeed814-f78f-49cb-95b6-3d1963ee9a36/_attachment/log/7d421728-53b5-4107-ba32-7eb80503ec07'