Arkatufus/report.md

## report.md

      
    Raw
  

              report.md
            
          
    Hunches


BatchingSqlJournal expects a Terminated message to properly dispose of itself, but it never receives any.

Related to akkadotnet/Akka.Persistence.SqlServer#114 (comment)
"the number of recovering actors are more that 3500!" (strong possible clue)
Possible fix: add Context.Watch(Self); to the constructor
Context.Watch(Self) is called in AddTagSubscriber and AddAllSubscriber, need further investigation.


A combination of BatchingSqlJournal and sharding supervising is causing the system to recreate failing actors
way too frequently, spamming the system with "CircuitBreaker is failing fast" debug message.

Have not checked the sharding code yet, still a hypothesis.


Tested hypothesis


CircuitBreaker.AttemptReset() never got called.

Made a spec using pure PersistentTestkit to test a failing SnapshotStore database that goes through all the steps
of failing database
Result: CircuitBreaker is working properly.


Something in Akka.Persistence.SqlServer.BatchingSqlSServerJournal is faulty

Made a "chaos monkey" test console app to manually drop and restart different part of a SqlServer backed persistent
actor system during run time.
Wasted too much time making sure that Docker.DotNet reports the proper docker container and networking statuses
(the monitoring part is glitchy as hell, prone to deadlocking)
Tried various fails scenarios to see how the persistent actor behaves.
Result: Actor seemed to work properly, found no variation of scenario conditions that causes a failure.


Something in BatchingSqlJournal base class isn't working properly.

BatchingSqlJournal does not inherit from AsyncWriteJournal so it does not work with PersistentTestkit
Tried to create a working PersistentTestkit look alike that would work with BatchingSqlJournal,
so I can start testing failure scenarios using the cheaper Sqlite database
Work in progress, still not working as intended