karliss/async cutter.md

## async cutter.md

      
    Raw
  

              async cutter.md
            
          
    Assumptions

It is unsafe to perform any kind of r2 operations in parallel. This means thats that all access to r2 api must be controlled by a lock (this is already being done). Any kind of access to r2 API can be blocking for unknown amount of time. No r2 API can be done in GUI thread.
Proposed architecture version 2


run all interaction with r2 through r2_tasks except the task launching and some setup code
required r2 task changes

runing c function as task
threadpool


create ease use of functionality in Cutter for common usecases

run(lambda)->Future  - function wich accepts a lambda and returns QFuture or similar interface object which can be used for controling the task, and obtaining result, should support qt signal binding for state change.
run(cmd)->Future
run(cmd)->Future
version which allows easily change the queued query a bit similar to ReplacingRefreshDeferrer
Three ways for handling task result

as QFuture or future like object. Useful when full controll is necesarry. Can be used to build other variations. It should be possible to connect QT signals to it.
Passing a second lambda which receives task result as argument and gets executed in GUI thread. Less boilerplate code connecting signals compared to previous version. Would discourage doing accesing GUI from within task function which would be wrong. Useful for simple cases like run this r2 command, set the result as value of this GUI field.
Global signal which can be connected independently of invoking the task. In this approach widget can immediately forget about the task. This way multiple widgets can benefit from the updated information not only the one requesting it.


Interaction with r_event_hook

Architure proposal above mentions "Global signals" approach for handling results. r_event_hook callbacks can be considered one more way for trigering those. Assuming widgets have been written so that they can respond to signals about new data at any time, they don't need any additional code for handling this. As time moves on and r_event_hook gets improved to report changes more reliably, the amount of manual refreshes can be reduced. It isn't stricly necesarry but if skipping of outdated queries is implemented well there won't be too much duplicate work.
Responsiveness and feedback


Busy/non empty queue indicator - the spinning thing we already have, could be expanded to get more information like task name
Need to prevent situation where user can queue the same action multiple times
If the widget is still loading data there should be visible indicator

"Loading" placeholder, disabled status for empty widget
some kind of indicator for partially filled list which is loading more items when the existing ones can be interaced with already (search results like widget)
Obvious placelholder values for tables where data is requested from r2 on demand


When user does something UI should immediately react with something

When possible open the dialog or context menu first, fill the data afterwards. Don't wait silently while the data loads and only then open dialog.
For actions that are expected to be instantanios most of the time (like set base, set as code, delete item from list) but do not trigger opening of dialog or context menu make the widget temporary disabled or open a popup with cancel button to indicate that it is in progress and prevent user from trigering it repeatedly.
Actions that already have blocking popup for entering arguments like rename dialog can keep the dialog opened with inputs disabled while the action is being performed and close only after it is finished. This also allows to cancel the action while it is in queued state.
Buttons, menu items can be disabled after clicking. It isn't ideal if the action gets stuck but better than seemingly doing nothing and user clicking a bunch of times.


For fast actions when queue is empty immediately showing the popup or placeholder for a single frame can make the UI feel less repsonsive. This could improved by having fastpath: when calling a short high priority command try locking the r2 access object, if locking succeeds do all the work in same thread immediately and trigger the result signal. Otherwise queue the request as task. This logic should be implemented within r2 access wrapper layer, invisible to most code within widgets.

On demand table models

Many parts of cutter currently use the Qt Model mechanism in very simplistic way - query all the items form r2 into a list and in the model return data from the list. Qt model interface and the views are written to support smart models which fetch only the required data. Some of the examples of this are QFileSystemModel and QSqlTableModel. The later one even allows sorting by column using the functionality builtin the database.
Even with async architecture proposed above making large queries isn't great. Querying only the currently visible data would avoid long blocking actions and reduce overall amount of data exchanged between r2 and cutter. There can be two approaches to this: full and partial.
Fully on demand model requires std::vector like interface from r2: querying number of items and querying item by index. In some cases it can be obtained by acessing the r2 structure directly using c api. It could be also made an optional feature for r2 table API. The code providing data to table API needs to be aware of this as generating full table and then droping items within table API part would make the performance worse than non-on demand model which requests everything once.
For operations like search where count and getting item by index can't be obtained without full operation on demand approach can be applied partially. When performing the search query return only minimal amount of information describing each item like offset and length. Rest of the information can be later queried on demand using offset. The second approach can also be applied if items are stored in a dictionary by offset which allows efficient queries by offset but not index. It is possible to implement a binary search tree which supports querying both by key and index but they are more complicated and take more space so they should be only used if other r2 operations require it.
On demand qurerying could be implemnted without async r2 operations but both techniques can also be applied together.
On demand table model should used in combination with sort/filter proxy model that is aware of it. Depending on the way data is stored and can be queried form r2, when sorting it might be better to fetch and cache either all the data or only the column used for sorting. In case of async on demand model it needs to be taken into account to avoid sorting by placeholder values which get returned immediately.
Keeping blocking r2 operations short

Since all r2 api access block each other long operations like searching something, pefroming analysis or querying large list will block all other r2 operations. On demand models solve one of these cases. Other cases could be mitigated by spliting the long operations into smaller chunks as much us possible so that higher priority actions triggered by UI events can be scheduled between them. What can be done without large changes in r2? Analysis could be split into stages, from what I understand "a"*N commands are alias for specific analysis commands. Would calling each step in Cutter cause a lot of code duplications, any other approach? Can function analysis be done for each function separately? Search operations could be split into memory chunks. How to deal with results overlaping chunk borders?
More concurrency

In situations where processing of results returned by r2 takes some time use the the concurrency tools provided by Qt like QtConcurrent::run and QFuture. There shouldn't be too much of these cases as most of the logic should be within r2. One candidate is processing large graphs especially when using graphviz layout mode.
Once most of the Cutter is converted. It will become appearant that some r2 commands take long time and block everything else. Those should be made more concurrency friendly using r_cons_is_breaked() r_core_task_sleep_begin().
Incremental implementation

Implementing this in single PR doesn't seam feasable. Since all the r2 operations in cutter already lock core object code can be converted to the queue approach suggested above one by one. The initial versions should probably imeplement it once for each case and make sure it's well designed before starting to convert everything.