Thibault Schueller Ryp

## gist:2144712
// half->float variants.
// by Fabian "ryg" Giesen.
//
// I hereby place this code in the public domain.
//
// half_to_float_fast: table based
// tables could be done in a more compact fashion (in particular, can store tab2 in low word of tab1!)
// but something of a dead end since not very SIMD-friendly. pretty much abandoned at this point.
//
// half_to_float_fast2: use FP adder hardware to deal with denormals.

## cone-culling-experiments.log
Algorithms used for Cone* preprocess the mesh in some way, then split sequentially into 64-triangle clusters:
ConeBase: optimize mesh for transform cache
ConeSort: split mesh into large planar connected clusters, bin clusters into 6 buckets by cardinal axes, optimize each bucket for transform cache
ConeAcmr: optimize mesh for transform cache, split sequentially into variable length clusters that are relatively planar, sort clusters by avg normal
ConeCash: optimize mesh for transform cache, picking triangles that reduce ACMR but prioritizing those that keep current cluster planar
MaskBase: split sequentially into 64-triangle clusters, store a 64-bit conservative triangle mask for 6 frustums (cube faces)
ManyConeN: split sequentially into 64-triangle clusters, store N (up to 4) cones for each cluster and a cone id per triangle (2 bit)

Note that all Cone* solutions get significantly worse results with 128 or 256 triangle clusters; it doesn't matter much for Mask.
The biggest challenge with Cone* algorithms is t
	// half->float variants.
	// by Fabian "ryg" Giesen.
	//
	// I hereby place this code in the public domain.
	//
	// half_to_float_fast: table based
	// tables could be done in a more compact fashion (in particular, can store tab2 in low word of tab1!)
	// but something of a dead end since not very SIMD-friendly. pretty much abandoned at this point.
	//
	// half_to_float_fast2: use FP adder hardware to deal with denormals.
	Algorithms used for Cone* preprocess the mesh in some way, then split sequentially into 64-triangle clusters:
	ConeBase: optimize mesh for transform cache
	ConeSort: split mesh into large planar connected clusters, bin clusters into 6 buckets by cardinal axes, optimize each bucket for transform cache
	ConeAcmr: optimize mesh for transform cache, split sequentially into variable length clusters that are relatively planar, sort clusters by avg normal
	ConeCash: optimize mesh for transform cache, picking triangles that reduce ACMR but prioritizing those that keep current cluster planar
	MaskBase: split sequentially into 64-triangle clusters, store a 64-bit conservative triangle mask for 6 frustums (cube faces)
	ManyConeN: split sequentially into 64-triangle clusters, store N (up to 4) cones for each cluster and a cone id per triangle (2 bit)

	Note that all Cone* solutions get significantly worse results with 128 or 256 triangle clusters; it doesn't matter much for Mask.
	The biggest challenge with Cone* algorithms is t