Zardoz89/DMA.txt

## DMA.txt
There is this kind of interaction between devices and memory:
- Pasive read
- Copy from RAM to device
- Copy from device to RAM

Passive reads, works "watching" the data/address bus when the CPU (or other device) is writing to the RAM. Could keep sync a device internal memory/buffer respect the changes that happens in these memory block. This is simply implemented, reading the desired RAM block with VComputer.RAM(). If a device does passive read to keep sync his internal memory/buffer, it would need a initial Copy from RAM to device to set-up his internal memory/buffers.

Copy from RAM to device, does copy data to the device internal memory/buffers. Obviously this need an active usage of the data and address buses, and only could be a single device doing that at same time. This not means exactly that only a device could be doing a DMA, but that if more that one device try to do DMA, it will need to share the bus time with the other device doing DMA and the transfer rate will be halve.

Copy from device to RAM, does copy data from the device internal memory/buffers, writing it on a particular place of the RAM. Like Copy from RAM to device, needs an active usage of the data and address buses, and only could be a single device doing that at same time.

With a 32 bit data bus, this means that any transfer of data in a clock cycle is limited to transfer 4 bytes. With that the devices works on a 100 Khz device clock, this means that the max. transfer rate is 50Kbit/s -> ~390KiB/s.

We have many options to choose :
- Enforce a single device doing DMA at same time and it could use the max. transfer rate.
- Allow to do multiples DMAs but transfer rate of each one, will be divided between each device doing DMA.

And how implement it over the Virtual Computer code base :
- Copy 4 or less bytes of data on each IDevice Tick(). Could introduce many overhead, be less cache friendly and if we allow multiple DMAs operations, more that 4 will be more problematic to implement in this way.
- Wait X clock ticks, like would be necessary to do the transfer, and then do a simple block copy. Is more cache friendly and should have less overhead, but could have unexpected behaviour if the software try to write date over the affected RAM block that is being copied to the device (We should advice that software should not do that).
  - If we do this with one single DMA at same time (at max. transfer rate), could be implemented with a counter that in every device clock tick is decreased and when goes to 0, does the copy. The initial counter value would be calculated in function how many data will need to be transfer.
  - If we do this with multiples DMAs, we could do this with multiple counters (assigned each one to a device doing DMA, that the decreasing value is in function how many devices are doing the DMA

This stuff affects the devices in this way :
- Allow to enforce floppy device (and hard disk, etc...) transfers rates between his internal buffer and RAM.
- Enforces to do a delay when a passive read device (graphics cards) changes/setup the address where to read his data. In this case, TDA would need a delay enough big to transfer 2400 bytes (~6ms). I think that this last effect is not strongly necessary and that we could omit it, to avoid do too complex the TDA or any other graphics device.
	There is this kind of interaction between devices and memory:
	- Pasive read
	- Copy from RAM to device
	- Copy from device to RAM

	Passive reads, works "watching" the data/address bus when the CPU (or other device) is writing to the RAM. Could keep sync a device internal memory/buffer respect the changes that happens in these memory block. This is simply implemented, reading the desired RAM block with VComputer.RAM(). If a device does passive read to keep sync his internal memory/buffer, it would need a initial Copy from RAM to device to set-up his internal memory/buffers.

	Copy from RAM to device, does copy data to the device internal memory/buffers. Obviously this need an active usage of the data and address buses, and only could be a single device doing that at same time. This not means exactly that only a device could be doing a DMA, but that if more that one device try to do DMA, it will need to share the bus time with the other device doing DMA and the transfer rate will be halve.

	Copy from device to RAM, does copy data from the device internal memory/buffers, writing it on a particular place of the RAM. Like Copy from RAM to device, needs an active usage of the data and address buses, and only could be a single device doing that at same time.

	With a 32 bit data bus, this means that any transfer of data in a clock cycle is limited to transfer 4 bytes. With that the devices works on a 100 Khz device clock, this means that the max. transfer rate is 50Kbit/s -> ~390KiB/s.

	We have many options to choose :
	- Enforce a single device doing DMA at same time and it could use the max. transfer rate.
	- Allow to do multiples DMAs but transfer rate of each one, will be divided between each device doing DMA.

	And how implement it over the Virtual Computer code base :
	- Copy 4 or less bytes of data on each IDevice Tick(). Could introduce many overhead, be less cache friendly and if we allow multiple DMAs operations, more that 4 will be more problematic to implement in this way.
	- Wait X clock ticks, like would be necessary to do the transfer, and then do a simple block copy. Is more cache friendly and should have less overhead, but could have unexpected behaviour if the software try to write date over the affected RAM block that is being copied to the device (We should advice that software should not do that).
	- If we do this with one single DMA at same time (at max. transfer rate), could be implemented with a counter that in every device clock tick is decreased and when goes to 0, does the copy. The initial counter value would be calculated in function how many data will need to be transfer.
	- If we do this with multiples DMAs, we could do this with multiple counters (assigned each one to a device doing DMA, that the decreasing value is in function how many devices are doing the DMA

	This stuff affects the devices in this way :
	- Allow to enforce floppy device (and hard disk, etc...) transfers rates between his internal buffer and RAM.
	- Enforces to do a delay when a passive read device (graphics cards) changes/setup the address where to read his data. In this case, TDA would need a delay enough big to transfer 2400 bytes (~6ms). I think that this last effect is not strongly necessary and that we could omit it, to avoid do too complex the TDA or any other graphics device.