The audio is encoded as 8bit, mono, linear PCM @ 25Khz. Standard Roland fare.
The sound ROM(IC11) has a smattering of jumbled data and address lines that makes it hard to read/write. Most of the pins can be unscrambled in software but, the actual chip(TC531000P) has a non-standard pinout. Its A16 is where most 28pin chips have an output enable line. A little rewiring is required to read it and a larger chip is required to duplicate it.
Even with the pins in order, the sounds are still scrambled. The short interleaved samples and the long chunked-up samples are sorted just like the 626, so I'll just reiterate a bit.
The short samples are combined: one sound on the odd addresses and one sound on the even addresses.
The long samples are put into "piles": A,B,C,D. Each pile contains every fourth byte. A has every fourth byte starting from 0, B has every fourth byte starting from 1, C every 4th from 2, D 4th from 3.
As one final point of confusion, A0 is inverted. Some basic glue logic was used to generate A0 (and A13-A16) and it was probably just easier to invert A0 in the ROM rather than invert the signal. Based on the binary I've found, no other address lines are inverted.
Here is the ROM table from the service manual.