Carbonite Audio Data Interface¶
Overview¶
The carb::audio::IAudioData
interface provides access to audio assets and their general management. All
audio assets are stored in sound data objects (SDOs). The audio playback interface and audio group interface
operates only on sound data objects as the asset data for their various operations. This interface allows
assets to be loaded, created, converted, and saved to file. It also provides methods to send output to a
file, and encoding and decoding sound data objects.
Sound data objects support loading and manipulating sound assets in the following formats (PCM formats allow up to 64 channels):
8-bit unsigned integer PCM data.
16, 24, and 32 bit signed integer PCM data.
32 bit floating point PCM data.
Vorbis.
FLAC.
Opus.
MP3 (encoding to MP3 is not supported).
A sound data object consists of a set of information about the sound data’s format (ie: its frame rate, channel count, sample size and type, etc), the length of the asset, and potentially a buffer of data to be processed by a decoder. A sound data object may be fully decoded in memory, streamed from an encoded buffer in memory, or streamed from disk. These objects may also contain additional optional information such as meta data strings (ie: authoring, genre, format information), peak volume information, event point records, loop points, play list information, etc.
Each sound data object may have a single ‘user data’ object associated with it. This user data consists
of a block of data and an optional destructor function for the user data object. The user data object
is never accessed internally by any functionality, but the host app may use it to associate its own
object with the sound data object. The host app may retrieve this object from the sound data object
at any time with IAudioData::getUserData()
. When the sound data object that holds the user data
block is destroyed, its optional destructor is called. Similarly, if the user data block is replaced,
the destructor for the previous user data block will be called.
Creating and Loading¶
Sound data objects may be created in one of many ways, but all from the same call - IAudioData::createSound()
.
The method of loading or creating the object depends on the settings passed in the SoundDataLoadDesc
descriptor. A sound asset can be loaded from a disk file, a blob in memory, or a ‘user decoded stream’.
If the asset is loaded from a blob in memory, it may either be copied into internally owned memory or it
could reference the user memory that was originally passed in. The asset may also either be decoded on
load or decoded at runtime as it plays in order to save memory. Additionally, a new sound data object may
be created as empty of a given length. When an asset is loaded from file and the file contains format
information, that will always be used instead of any format information in the load descriptor. If the
asset data does not contain format information (ie: a user decoded stream) or there is no asset data (ie:
an empty asset), the format information must be provided in the load descriptor.
Once created, all sound data objects are reference counted. When a sound data object is played, a reference
to the object will be taken internally as long as it is in use. The external caller may release only the
references that it has acquired. A reference is acquired on object creation and one for each call to
IAudioData::acquire()
. A sound data object will only be destroyed once all references have been
released.
A sound asset does not necessarily have to be loaded into the sound data object in its original format. It may be decoded on load or converted to a different format on load. This is especially true for user decoded streams. A user decoded stream allows the caller to provide a stream of PCM data to be delivered from an arbitrary format. This effectively allows the caller to provide the initial decoding of the asset data from a proprietary format, or to allow for a data generator system to be used. The user decode stream consists of a single callback function that is used to provide a specified number of frames of PCM data. A user decode stream can either just provide the data at load time or it can be used as a streaming source. When it is used as a streaming source, a positioning callback function needs to be provided as well. This allows the stream to be repositioned as needed for playback instead of always having to start from the beginning.
Converting¶
Once created, a sound data object can be converted to any other supported format as needed. The conversion can either be done in-place and replace the existing object’s internal asset data, or done by creating a new sound data object. Whether the conversion is done in place or not, the returned object will be given a new reference that will need to be released by the caller at some point.
The conversion will attempt to be done in the least destructive manner possible. However, depending on the original format and the selected destination format, a loss of information may be unavoidable. For example, converting from 32-bit float PCM to 8 bit integer PCM will result in a lot of information lost. Note that a conversion will only change the sample format, not other aspects of the asset’s format (ie: its channel count).
Decoding & Encoding¶
When a sound data object needs to be played or processed, its decoded PCM data can be accessed using a
decoder. In most cases, this is not necessary for external callers. This will be done automatically
internally when the asset is played through the carb::audio::IAudioPlayback
interface. The
decoder can be used to get access to a stream of PCM data for the asset regardless of its encoded format.
Similarly, a stream of PCM data can be used to encode a stream of PCM data into another format in a
sound data object.
Encoding or decoding a sound has the following limitations:
decoding from a sound data object may be done from any sample format, but must always be decoded into a PCM format.
encoding into a sound data object may be encoded into any sample format, but must always come from a PCM source.
Performing an encode or decode operation requires that a codec state object be created first. The codec
state is created with the IAudioData::createCodecState()
function. The codec state allows
for a sound data object to be decoded multiple times simultaneously without affecting any other instances.
Each codec state may only perform operations in a single direction - encoding or decoding. Once the
codec state is created, buffers of data may either be received from the decoder or submitted to the
encoder (depending on the direction of the codec state). A codec state’s current position may be queried
or changed as needed, though the accuracy of the positioning depends on the format. For example, some
compressed formats may not support frame accurate seeking. Once the encode or decode operation is
complete, the codec state can be destroyed with IAudioData::destroyCodecState()
.
Saving & Output Streams¶
A sound data object may be written to a file on disk if needed. There may be an optional format change when writing the asset to disk or it may be written in its current format. For some formats, a conversion or re-encoding may need to occur in order to write it to disk. Future versions may allow the file to be written to a blob in memory as well.
An output stream is similar to saving a sound to file (in fact, an output stream is used internally when saving to file). As with saving to file, an optional format conversion may occur in the output stream. The output stream allows an arbitrary stream of PCM data to be sent to a file. The PCM data is sent in anonymous buffers by ‘writing’ it to the stream. Any conversion to the destination format is performed on the buffer before attempting to write it to the stream. Depending on the destination format, the converted data may not be flushed to disk immediately. The only time that it will be guaranteed to be flushed to disk is when the output stream is closed.
Event Points & Loop Points:¶
Each sound data object may optionally include a set of caller specified event points or loop points. An event point is a spot in the asset where an event is expected to occur. An event point may or may not be used when playing the sound - that is decided by the caller when playing the sound. An event point consists of the frame number where the event point is expected to be triggered, an arbitrary identifier (used to match it when updating, deleting, or adding event points), a text name (for UI display), optional text, and a user data object. A loop point is a specialized event point that also specifies a region length and optional play index. There is no limit on the number of event points or loop points that a sound data object may contain.
Some asset file formats may include information for event points that may be parsed out of the file on load. These event points are only guaranteed on RIFF/WAV files. However, not many authoring tools have the ability to create or store this event point information in the file.
Event points and loop points are set on a sound data object using IAudioData::setEventPoints()
.
These are always set in a group. In a single call, individual event points may be added, deleted, or
modified depending on the arbitrary identifier value in the source buffer and the given frame number.
Event points may be retrieved either in groups or individually by various criteria (ie: by identifier,
by play index, or by index).
When playing a sound data object, event points may be enabled using the fPlayFlagUseEventPoints
flag. When this flag is used, a callback function is also expected to be specified in the play descriptor.
When one of the sound data object’s event points is hit, the callback will be performed with the
VoiceCallbackType::eEventPoint
value set in its type
parameter. The triggered event
point descriptor will be passed in the callback’s data
parameter. This callback and event point
data may be used to trigger some external action in the program. Note that if the
fPlayFlagRealtimeCallbacks
flag is also used, the callback should execute and return as
quickly as possible otherwise it will stall the audio processing engine. In general, the callback should
just flag that something needs to occur and store any required information for another thread to handle.
If the fPlayFlagRealtimeCallbacks
flag is not used, the IAudioPlayback::update()
function must be called in order for the event points callbacks to be performed.
Some examples of using event points may be:
apply closed captioning text to a dialogue sound track. The event point’s
EventPoint::text
member would contain the text to be displayed or added to a scrolling display. Each new line of text would show up when the audio sound track’s playback reached each event point frame.trigger the next part of an animation that is synchronized to the sound a character is making. When the event point is fired, the character’s animation state would be updated to continue the sequence. This would allow for audio data driven animation sequences without needing to modify code.
trigger an in-game cut-scene sequence such as unlock a door or start an automated fight sequence at a certain point during a voice-over sequence. Again, this would be data driven so re-recording or editing the dialogue would not require code changes.
Typical Usage¶
This interface should be used to handle all operations on sound assets. This typically begins by calling
IAudioData::createData()
to create or load the sound data object from some form of source data
(ie: a file on disk, a blob in memory, or create an empty buffer). The sound data object is then passed
to a playback context to play on a new voice using IAudioPlayback::playSound()
. Once the play
task is done, the sound data object is released using IAudioData::release()
.
Another usage scenario would be to convert a sound asset from one format to another. The original sound
asset is loaded with IAudioData::createData()
. A conversion request is then setup and a call to
IAudioData::convert()
. This can either create a new sound data object with the conversion result,
or replace the data in the original sound data object. The resulting sound data object can then be
played, saved to disk, or otherwise operated on. The new and original sound data objects are then
released with IAudioData::release()
when they are no longer needed. Both IAudioData::release()
calls are needed even if the conversion replaces the original object’s contents.
The IAudioData
interface can also be used to manually decode a sound data object to raw PCM or
encode it to another format from raw PCM. This is done by creating a ‘codec state’ object (with
IAudioData::createCodecState()
) for an existing sound data object, then either decoding or
encoding buffers of data with IAudioData::decodeData()
or IAudioData::encodeData()
.