snickerbockers

## gist:ebab00e7145e7417a086
(c-add-style "custom-C++"
             '("stroustrup"

               (c-indent-comments-syntactically-p 1)
               (c-offsets-alist
                (substatement-open . 0)
                (defun-open . 0)
                (defun-close . 0)
                (defun-block-intro . 4)
                (class-open . 0)

## test.c
#include <stdlib.h>
#include <stdio.h>

/*
 * Options:
 *     PAREN    - use parenthesis around the ternary
 *     GREATER  - make the root_node's key greater than the key sent to less
 *                (default is less)
 */

## sa2_chao_log
Last updated: 11:20 PM, 11/12/2017
Name:         Dave
Swim:         26
Fly:          28
Run:          41
Power:        38
Luck:         00
Intelligence: 15
Stamina:      18

## keybase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                snickerbockers
                / keybase.md
            
            
              Created
              November 19, 2017 01:20
            
              
                keybase authentication
              
          
    Keybase proof

I hereby claim:

I am snickerbockers on github.
I am snickerbockers (https://keybase.io/snickerbockers) on keybase.
I have a public key ASCtACoOrMLdcwaEPscgEjMLxff6WEZexDFQGLLFUJLc1go

To claim this, I am signing this object:

  
## washingtondc_gprof.txt
This is with the native x86_64 jit backend (-x option).
I'm not sure if the generated code counts towards dreamcast_run or not.
It does seem like the code cache is one of my primary bottlenecks, though.

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 20.67      2.77     2.77                             dreamcast_run

## upside_down.diff
diff --git a/src/gfx/opengl/opengl_output.c b/src/gfx/opengl/opengl_output.c
index e00a4df..172596f 100644
--- a/src/gfx/opengl/opengl_output.c
+++ b/src/gfx/opengl/opengl_output.c
@@ -56,17 +56,11 @@ static struct shader fb_shader;
 #define FB_VERT_LEN 5
 #define FB_VERT_COUNT 4
 static GLfloat fb_quad_verts[FB_VERT_LEN * FB_VERT_COUNT] = {
-    /*
-     * it is not a mistake that the texture-coordinates are upside-down

## washingtondc_pmp.txt
jay@sbckrs_desktop ~/programs/washingtondc/build $ bash ~/pmp.sh
    200 epoll_wait,epoll_dispatch,event_base_loop,io_main,start_thread,clone
     31 ??,??,dreamcast_run,main
     18 dreamcast_run,main
     17 ??,??,??,??,??,??,??
     14 ??,??,??,??
     13 sh4_read_mem_32,??,bios,??
     11 ??,bios,??
     10 sh4_read_mem_32,??,??,??,??,??,??,??
     10 memory_map_read_32,sh4_read_mem_32,??,??,??,??,??,??,??

## wash_jit_perf_stats.txt
==== Code Cache perf stats ====
JIT: 828742910 total accesses
JIT: 185777 total tree searches
JIT: 177434 table evictions
JIT: max depth was 13
JIT: max cache size was 4259
JIT: height of root at shutdown is 13
JIT: balance of root at shutdown is 0
JIT: The top 10 most popular code blocks were accessed:
JIT: 	0x8c0b5d8c - 30683 times

## fb_unify.txt
Need to unify texture cache and framebuffer to support render-to-texture
without needlessly copying the framebuffer back to texture memory after every
STARTRENDER command.

Also need to support framebuffer access from the SH4 (meaning situations where
the CPU writes directly to the framebuffer without using the graphics hw at
all, or situations where the CPU reads from something that's already been
rendered).

Previously I've attempted to solve this problem the "easy" way by only copying

## gprof_crazy_taxi_attract
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 10.76      4.22     4.22 52871895     0.00     0.00  on_packet_received
  8.39      7.51     3.29 508938409     0.00     0.00  memory_map_write_32
  6.14      9.92     2.41 422975160     0.00     0.00  pvr2_ta_fifo_poly_write_32
  5.58     12.11     2.19                             sh4_fpu_inst_fmov_indgeninc_fpu
  5.10     14.11     2.00 519802356     0.00     0.00  memory_map_read_float
	(c-add-style "custom-C++"
	'("stroustrup"

	(c-indent-comments-syntactically-p 1)
	(c-offsets-alist
	(substatement-open . 0)
	(defun-open . 0)
	(defun-close . 0)
	(defun-block-intro . 4)
	(class-open . 0)
	#include <stdlib.h>
	#include <stdio.h>

	/*
	* Options:
	* PAREN - use parenthesis around the ternary
	* GREATER - make the root_node's key greater than the key sent to less
	* (default is less)
	*/
	Last updated: 11:20 PM, 11/12/2017
	Name: Dave
	Swim: 26
	Fly: 28
	Run: 41
	Power: 38
	Luck: 00
	Intelligence: 15
	Stamina: 18
	This is with the native x86_64 jit backend (-x option).
	I'm not sure if the generated code counts towards dreamcast_run or not.
	It does seem like the code cache is one of my primary bottlenecks, though.

	Flat profile:

	Each sample counts as 0.01 seconds.
	% cumulative self self total
	time seconds seconds calls ms/call ms/call name
	20.67 2.77 2.77 dreamcast_run
	diff --git a/src/gfx/opengl/opengl_output.c b/src/gfx/opengl/opengl_output.c
	index e00a4df..172596f 100644
	--- a/src/gfx/opengl/opengl_output.c
	+++ b/src/gfx/opengl/opengl_output.c
	@@ -56,17 +56,11 @@ static struct shader fb_shader;
	#define FB_VERT_LEN 5
	#define FB_VERT_COUNT 4
	static GLfloat fb_quad_verts[FB_VERT_LEN * FB_VERT_COUNT] = {
	- /*
	- * it is not a mistake that the texture-coordinates are upside-down
	jay@sbckrs_desktop ~/programs/washingtondc/build $ bash ~/pmp.sh
	200 epoll_wait,epoll_dispatch,event_base_loop,io_main,start_thread,clone
	31 ??,??,dreamcast_run,main
	18 dreamcast_run,main
	17 ??,??,??,??,??,??,??
	14 ??,??,??,??
	13 sh4_read_mem_32,??,bios,??
	11 ??,bios,??
	10 sh4_read_mem_32,??,??,??,??,??,??,??
	10 memory_map_read_32,sh4_read_mem_32,??,??,??,??,??,??,??
	==== Code Cache perf stats ====
	JIT: 828742910 total accesses
	JIT: 185777 total tree searches
	JIT: 177434 table evictions
	JIT: max depth was 13
	JIT: max cache size was 4259
	JIT: height of root at shutdown is 13
	JIT: balance of root at shutdown is 0
	JIT: The top 10 most popular code blocks were accessed:
	JIT: 0x8c0b5d8c - 30683 times
	Need to unify texture cache and framebuffer to support render-to-texture
	without needlessly copying the framebuffer back to texture memory after every
	STARTRENDER command.

	Also need to support framebuffer access from the SH4 (meaning situations where
	the CPU writes directly to the framebuffer without using the graphics hw at
	all, or situations where the CPU reads from something that's already been
	rendered).

	Previously I've attempted to solve this problem the "easy" way by only copying