Skip to content

Instantly share code, notes, and snippets.

@dogtopus
Last active April 27, 2024 01:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save dogtopus/b376bb7ee9fa115f8bbe9389d113fff5 to your computer and use it in GitHub Desktop.
Save dogtopus/b376bb7ee9fa115f8bbe9389d113fff5 to your computer and use it in GitHub Desktop.
Besta RTOS reversing notes

Besta RTOS reversing notes

Some random notes about Besta RTOS. Will probably ended up on a wiki somewhere after Project Muteki become mostly usable.

Windows CE?

NOPE. Not even close.

Then why there's Windows CE stuff all over?

For unknown reasons, instead of using e.g. GCC, seems that Besta decided to bend MSVC CE toolchain to build applets for their own not-Windows-CE OS. It's speculated that coredll.dll was included solely for the compiler helpers it provide (namely soft floating point emulation, 64-bit arithmetic and integer division routines). As an unfortunate side effect, this also caused a lot of things to be (seemingly) broken including C++ exceptions and threading/TLS due to Windows CE-specific helpers obviously not working on a completely different OS.

For developing Besta RTOS applets with Windows CE toolchain, custom CRT0 must be used and coredll functions must NOT be used for anything unless the functions are OS-independent (like the previously mentioned helper routines). All syscalls must either go through sdklib/krnllib or be invoked via bare SVCs (e.g. with help from mutekishims).

There are also some Win32API-looking routines provided by sdklib/krnllib although there are no technical reasons for them to be Windows-ish (other than providing some familiarity to Windows devs but with huge caveats sometimes).

If it's not Windows CE, what is it?

Looking at the scheduler code, seems like the Besta RTOS is based on a heavily modified uC/OS-II kernel with a drastically different set of OS API exposed to user code. The rest of the system seems to be developed in-house, sometimes utilizing various existing open-source (e.g. SQLite3, FFmpeg, YAFFS2) and closed-source (e.g. Voxware) components.

You mentioned FFmpeg. Does that mean I can GPL troll Besta?

Maybe. I haven't tried it yet and I'm not a lawyer but maybe.

Diagnostic mode

Diagnostic mode can be used to identify the board type (usually a 5 character identifier starts with BA, CA, EA, etc. that is different from the device model number) and verify the integrity of the system ROM. On some newer TLCS-900 based systems where a diagnostic menu is known to exist (tested on CA736) it can also be used to dump the system ROM currently installed onto the on-board NAND flash.

Entering diagnostics mode

Open the TAD app (it's usually called Service Home, 服务中心 (服務中心), etc. depending on your language settings) and type "diagnostic" using the keyboard. This works on all Arm-based devices (except for HP Prime since it does not have the TAD app installed) and some later devices based on the TLCS-900 architecture.

For HP Prime calculators, holding C, F and O button and pressing the Reset button enters the diagnostic mode.

Syscall scheme

Once called (in Arm mode), push r0 and lr to the stack in this exact order (needs 2 instrucrtions) and initiate an SVC call with the desired syscall #.

Syscalls will not work directly in THUMB mode due to instruction size limit. Interwork is needed in order to do syscall from THUMB code.

Memory management

Besta RTOS uses a single address space memory layout with kernel, all applets and shared libraries sharing the same address space. There's no MMU support even on SoCs with MMU support so there's zero memory protection, meaning user space code can have direct access to hardware registers, etc. Beware that this also makes NULL an valid address and this will cause NULL dereferencing to be harder to debug.

Seems like there are different heaps maybe for kernel and userspace. More info needed. Doesn't seem like so under further inspection. Although there's a AllocBlock() function that is stubbed in BA110L, what is it?

Applet executables

Applet executables are mainly in PE format with ELF being an alternative option. The Windows CE subsystem type is not required although elf2bestape would add it for consistency with some newer Besta RTOS applets. Applet can either be relocatable (true for most of the PE files) or loadable to an absolute base address, although applet of latter type is in practice only runnable if it was the "init" program (i.e. first program to run after the OS is initialized).

It's unclear whether ELF applets support relocation or not since the only one ELF applet known to exist is Prime G1's armfir.elf and it loads to an absolute address.

Shared libraries

Like applets, shared libraries can either be in PE format or ELF. The _start function seems to get ignored when loading them.

Like under Windows, putting a shared library in the same directory as the applet overshadows the system version. This could be used to e.g. trace syscalls.

Threads

The thread model seems to be very similar to uC/OS-II (down to the algorithm level almost source-line-to-source-line), although the public API is totally different.

Total number of 64 threads can be created at the same time. With 38 threads accessible directly via OSCreateThread().

Broken THUMB? (Maybe not)

The code to handle THUMB mode in the CPU context initializer, which is in stock uC/OS-II's Arm Generic port, seems to be missing. Does that mean the THUMB mode is broken? (Maybe not but use THUMB function as an entry point might not work with workarounds i.e. interwork function or patching the saved CPSR. Given that we might need a stack aligner for EABI->OABI conversion anyway this might not be so bad.)

Thread priority

Priority is implied in the natural order of the threads in the global thread table (uC/OS-II just calls this priority table). Some slots in the table seem to be reserved (8 for the top and 18 for the bottom) and are not accessible by just allocating the thread with OSCreateThread(). User can move threads to these reserved slots by calling the OSSetThreadPriority() function.

Scheduling

The scheduler always executes the task that has the highest priority, so using OSSleep() is necessary to prevent one thread getting hold of the CPU for too long.

(TODO figure out if the thread can be yielded when waiting for IO)

1 jiffy is 1ms.

Delay and OSSleep

The OSSleep(jiffies) syscall calls the OSTimeDly() function in uC/OS-II scheduler, which then put the thread to sleep for specified amount of jiffies. Since 1 jiffy is 1ms in Besta RTOS, this practically delays the thread for less than or equal to the specified amount of milliseconds.

There's also a Delay() syscall that delays beyond INT16_MAX jiffies all within a single SVC call.

Events

Events are stripped down version of uC/OS-II mboxes. They don't have the ability to pass arbitrary message pointers like mboxes do.

Events have one extra flag compare to mboxes. Once set by OSCreateEvent(), it will prevent the event flag from cleared once a OSWaitForEvent() call completes without hitting a timeout or error.

Critical sections (aka locks, mutexes, etc.)

Critical sections provide mutually exclusive access to shared resources between threads. They seem to be recursive (as the context struct seems to hold a copy of the reent struct whenever it enters from the same thread/has the same reent struct pointer).

When a thread acquires a free critical section, it only changes the state of that critical section and nothing else on the kernel side is touched. (Unless, of course, when another thread tries to acquire the same critical section. Then that thread will be set to wait for that critical section.)

They also seem to have some kind of index value and a byte array for unknown purpose. More investigations needed. These are standard uC/OS-II thread wait states.

Get current thread

Since the context holds a copy of the current thread pointer, it is possible to use critical sections to know which thread is currently running. To do this, create a critical section locally first. This ensures that no other thread is acquiring it. After this, simply acquire the descriptor with OSEnterCriticalSection() and read out the pointer.

One safe implementation (4 syscalls) is shown as follows:

#include <muteki/threading.h>

thread_t *get_current_thread() {
  thread_t *thr = NULL;
  critical_section_t mutex;
  OSInitCriticalSection(&mutex);
  OSEnterCriticalSection(&mutex);
  thr = mutex.thr;
  OSLeaveCriticalSection(&mutex);
  OSDeleteCriticalSection(&mutex);
  return thr;
}

There is also a faster but hackier way. It abuses an implementation detail of the critical section that there's no other resource allocated/state changed when a free critical section is acquired for the first time. By only barely initializing the critical section and call OSEnterCriticalSection() without any clean up, this brings down the number of syscalls required to only 1. This works on both CD-580+ and WuDi V7.

#include <muteki/threading.h>

thread_t *get_current_thread() {
  critical_section_t mutex;
  // Magic is not checked so not needed here
  mutex.thr = NULL;
  mutex.refcount = 0;
  OSEnterCriticalSection(&mutex);
  return mutex.thr;
}

Error code

Error code is stored in the thread descriptors.

Code set via OSSetLastError will have the flag 0x20000000 set when read back by _GetLastError().

See muteki/errno.h for error codes documented by parsing FormatMessage() string table.

HCA

See hca.xxdm

Direct hardware access

Would be useful for e.g. emulators.

Framebuffer

GetActiveVRamAddress() returns a framebuffer descriptor. It includes the framebuffer as well as its format. This could be a potential way of accessing the framebuffer with e.g. a sw renderer that has no tie to the kernel.

Audio

TODO.

(OpenPCMCodec and ClosePCMCodec look suspicious)

Load an executable at boot

This can be used to e.g. simplify syscall black box testing or implement untethered other OS booting.

Create a file under C:\SYSTEM\DESKTOP.INI with DOS line ending and put

[DESKTOP SETTING]
ENTRY = <dos-8.3-path-to-exe-you-want-to-run>

into the file.

WARNING: This will replace the home screen with the file you specified and might cause the system to not boot properly. If this happens, a full system reset (clearing settings and wiping C: drive) will fix it although it will erase all data in system memory and settings. Alternatively, if chainloading a secondary program is possible, you can also run a program that can help you recover from this situation (e.g. using \\.\EXPLORE.ROM to delete the ini file and reboot this will not be possible on most systems without a loader that strips the v4 args).

PATH_MAX

PATH_MAX is 256 UTF-16 CUs (512 bytes) with NUL terminator. For the 8.3 paths used by CreateFile(), PATH_MAX is 80 bytes with NUL terminator. For 8.3 paths in CWD, PATH_MAX seems to be 64 bytes with NUL terminator.

I am not happy with this ROM.

Neither am I.

Known glitches/vulnerabilities

"Continuous Moan of Death"

When using the Chinese -> English full sentence translation feature, fill the text box with the Chinese character "哼" and press Enter. The system will crash shortly after.

The name of this glitch comes from the result of translation, that contains a huge amount of the phrase "moan continually".

This is at least fixed on BA110L and not present on the HJ translation engine (legacy engine used on TLCS-900 systems).

No length check in INI parser

_GetPrivateProfileString() does not check the size parameter. Therefore if a string property was longer than expected, this will cause a stack/heap overflow.

No known system fixes this vulnerability.

.coding ascii
.endian little
// HCA format
// Related patent (Chinese): https://patents.google.com/patent/CN1281063C/zh
// Magic
"HCA"
// Pixel format. 0f: 4bpp, ff: 8bpp, c0: 12bpp
c0
// Height and width
(u2:8) (u2:8)
// Number of frames? Number of frame buffers?
(u1:1)
// Valid palette size
(u1:0)
// Number of frames? Number of frame buffers?
(u1:1)
// Index of transparent color (when in indexed mode). This color will be treated as transparent. In indexed mode
(u1:255)
// Palette
// The pixels in indexed image are always processed as a vector of 2 pixels.
// For 4bpp mode, the palette is indexed per-vector. This gives the fixed 1024 bytes palette size.
// For 8bpp mode, 2 palettes (one verbatim, one rotated 4 bits left) are used to take care of unaligned bits.
// (u4:0x00000000) ...
// Size of image data (excluding offset table)
(u4:268)
// Frame offsets (excluding the table)
// (u4:0x0) ...
(u4:0x0)
// Framebuffer 0
// FU: Uncompressed, FC: Compressed (?)
"FU" 00 00 00000000
// Framebuffer type (?) 00000000: Normal, 00264c00: Indexed
00000000
// Framebuffer data
// ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment