milesrout/DMA.txt

## DMA.txt
There are two types of interaction between devices and memory:
- Passive read
- Copy from RAM to device
- Copy from device to RAM

Passive reads work by "watching" the data/address bus when the CPU (or
other device) is writing to the RAM. Could keep sync a device internal
memory/buffer respect the changes that happens in these memory
block. This is simply implemented, reading the desired RAM block with
VComputer.RAM(). If a device does passive read to keep sync his
internal memory/buffer, it would need a initial Copy from RAM to
device to set-up his internal memory/buffers.

Copy from RAM to device copies data to the device's internal
memory/buffers. This actively uses of the data and address buses, and
there can only be a single device doing this at a time. This doesn't
mean that only one device could be doing a DMA, but that if more than
one device try to do DMA. However, they will need to share the bus
time with the other devices doing DMA and the transfer rate will be
halved/thirded/etc.

Copy from device to RAM copies data from a device's internal
memory/buffers, writing it to a particular place of the RAM. Like Copy
from RAM to device, this requires active use of the data and address
buses, and can only be done by a single device at a time.

With a 32 bit data bus this means that any transfer of data in a clock
cycle is limited to transfer 4 bytes. With that the devices works on a
100 Khz device clock, this means that the max. transfer rate is
50Kbit/s -> ~390KiB/s.

We have many options to choose:
- Enforce a single device doing DMA at same time and it could use the
max. transfer rate.
- Allow to do multiples DMAs but transfer rate of each one, will be
  divided between each device doing DMA.

And how implement it over the Virtual Computer code base :
- Copy 4 or less bytes of data on each IDevice Tick(). Could introduce
  many overhead, be less cache friendly and if we allow multiple DMAs
  operations, more that 4 will be more problematic to implement in
  this way.
- Wait X clock ticks, like would be necessary to do the transfer, and
  then do a simple block copy. Is more cache friendly and should have
  less overhead, but could have unexpected behaviour if the software
  try to write date over the affected RAM block that is being copied
  to the device (We should advice that software should not do that).
  - If we do this with one single DMA at same time (at max. transfer
    rate), could be implemented with a counter that in every device
    clock tick is decreased and when goes to 0, does the copy. The
    initial counter value would be calculated in function how many
    data will need to be transfer.
  - If we do this with multiples DMAs, we could do this with multiple
    counters (assigned each one to a device doing DMA, that the
    decreasing value is in function how many devices are doing the DMA

This stuff affects the devices in this way:
- Allow to enforce floppy device (and hard disk, etc...) transfers
  rates between his internal buffer and RAM.
- Enforces to do a delay when a passive read device (graphics cards)
  changes/setup the address where to read his data. In this case, TDA
  would need a delay enough big to transfer 2400 bytes (~6ms). I think
  that this last effect is not strongly necessary and that we could
  omit it, to avoid do too complex the TDA or any other graphics
  device.
	There are two types of interaction between devices and memory:
	- Passive read
	- Copy from RAM to device
	- Copy from device to RAM

	Passive reads work by "watching" the data/address bus when the CPU (or
	other device) is writing to the RAM. Could keep sync a device internal
	memory/buffer respect the changes that happens in these memory
	block. This is simply implemented, reading the desired RAM block with
	VComputer.RAM(). If a device does passive read to keep sync his
	internal memory/buffer, it would need a initial Copy from RAM to
	device to set-up his internal memory/buffers.

	Copy from RAM to device copies data to the device's internal
	memory/buffers. This actively uses of the data and address buses, and
	there can only be a single device doing this at a time. This doesn't
	mean that only one device could be doing a DMA, but that if more than
	one device try to do DMA. However, they will need to share the bus
	time with the other devices doing DMA and the transfer rate will be
	halved/thirded/etc.

	Copy from device to RAM copies data from a device's internal
	memory/buffers, writing it to a particular place of the RAM. Like Copy
	from RAM to device, this requires active use of the data and address
	buses, and can only be done by a single device at a time.

	With a 32 bit data bus this means that any transfer of data in a clock
	cycle is limited to transfer 4 bytes. With that the devices works on a
	100 Khz device clock, this means that the max. transfer rate is
	50Kbit/s -> ~390KiB/s.

	We have many options to choose:
	- Enforce a single device doing DMA at same time and it could use the
	max. transfer rate.
	- Allow to do multiples DMAs but transfer rate of each one, will be
	divided between each device doing DMA.

	And how implement it over the Virtual Computer code base :
	- Copy 4 or less bytes of data on each IDevice Tick(). Could introduce
	many overhead, be less cache friendly and if we allow multiple DMAs
	operations, more that 4 will be more problematic to implement in
	this way.
	- Wait X clock ticks, like would be necessary to do the transfer, and
	then do a simple block copy. Is more cache friendly and should have
	less overhead, but could have unexpected behaviour if the software
	try to write date over the affected RAM block that is being copied
	to the device (We should advice that software should not do that).
	- If we do this with one single DMA at same time (at max. transfer
	rate), could be implemented with a counter that in every device
	clock tick is decreased and when goes to 0, does the copy. The
	initial counter value would be calculated in function how many
	data will need to be transfer.
	- If we do this with multiples DMAs, we could do this with multiple
	counters (assigned each one to a device doing DMA, that the
	decreasing value is in function how many devices are doing the DMA

	This stuff affects the devices in this way:
	- Allow to enforce floppy device (and hard disk, etc...) transfers
	rates between his internal buffer and RAM.
	- Enforces to do a delay when a passive read device (graphics cards)
	changes/setup the address where to read his data. In this case, TDA
	would need a delay enough big to transfer 2400 bytes (~6ms). I think
	that this last effect is not strongly necessary and that we could
	omit it, to avoid do too complex the TDA or any other graphics
	device.