* added bitwise operation based sqrt16
- replacement for fastled, it is about 10% slower for numbers smaller 128 but faster for larger numbers. speed difference is irrelevant to WLED but it saves some flash.
* updated to 32bit, improved for typical WLED use
- making it 32bits allows for larger numbers
- added another initial condition check for medium sized numbers
- increased the "small number" optimization to larger numbers: the function is currently only used to calculate sqrt(x^2+y^2) which even for small segments is larger than the initially used 64, so optimizing for 1024 makes more sense, although the value is arbitrarily chosen
-> distortion waves 3x speedup
-> hiphotic 2x speedup
-> waving cell 1.5x speedup
* replace sin8_t by lookup-table with pre-computed values
* moved integer sin and cos to fcn_declare.h (inlined by the compiler)
* moved gamma32 to fcn_declare.h (inlined by the compiler)
* a few other small tweaks