Carbonite Audio Data Interface

Overview

The carb::audio::IAudioData interface provides access to audio assets and their general management. All audio assets are stored in sound data objects (SDOs). The audio playback interface and audio group interface operates only on sound data objects as the asset data for their various operations. This interface allows assets to be loaded, created, converted, and saved to file. It also provides methods to send output to a file, and encoding and decoding sound data objects.

Sound data objects support loading and manipulating sound assets in the following formats (PCM formats allow up to 64 channels):

  • 8-bit unsigned integer PCM data.

  • 16, 24, and 32 bit signed integer PCM data.

  • 32 bit floating point PCM data.

  • Vorbis.

  • FLAC.

  • Opus.

  • MP3 (encoding to MP3 is not supported).

A sound data object consists of a set of information about the sound data’s format (ie: its frame rate, channel count, sample size and type, etc), the length of the asset, and potentially a buffer of data to be processed by a decoder. A sound data object may be fully decoded in memory, streamed from an encoded buffer in memory, or streamed from disk. These objects may also contain additional optional information such as meta data strings (ie: authoring, genre, format information), peak volume information, event point records, loop points, play list information, etc.

Each sound data object may have a single ‘user data’ object associated with it. This user data consists of a block of data and an optional destructor function for the user data object. The user data object is never accessed internally by any functionality, but the host app may use it to associate its own object with the sound data object. The host app may retrieve this object from the sound data object at any time with IAudioData::getUserData(). When the sound data object that holds the user data block is destroyed, its optional destructor is called. Similarly, if the user data block is replaced, the destructor for the previous user data block will be called.

Creating and Loading

Sound data objects may be created in one of many ways, but all from the same call - IAudioData::createSound(). The method of loading or creating the object depends on the settings passed in the SoundDataLoadDesc descriptor. A sound asset can be loaded from a disk file, a blob in memory, or a ‘user decoded stream’. If the asset is loaded from a blob in memory, it may either be copied into internally owned memory or it could reference the user memory that was originally passed in. The asset may also either be decoded on load or decoded at runtime as it plays in order to save memory. Additionally, a new sound data object may be created as empty of a given length. When an asset is loaded from file and the file contains format information, that will always be used instead of any format information in the load descriptor. If the asset data does not contain format information (ie: a user decoded stream) or there is no asset data (ie: an empty asset), the format information must be provided in the load descriptor.

Once created, all sound data objects are reference counted. When a sound data object is played, a reference to the object will be taken internally as long as it is in use. The external caller may release only the references that it has acquired. A reference is acquired on object creation and one for each call to IAudioData::acquire(). A sound data object will only be destroyed once all references have been released.

A sound asset does not necessarily have to be loaded into the sound data object in its original format. It may be decoded on load or converted to a different format on load. This is especially true for user decoded streams. A user decoded stream allows the caller to provide a stream of PCM data to be delivered from an arbitrary format. This effectively allows the caller to provide the initial decoding of the asset data from a proprietary format, or to allow for a data generator system to be used. The user decode stream consists of a single callback function that is used to provide a specified number of frames of PCM data. A user decode stream can either just provide the data at load time or it can be used as a streaming source. When it is used as a streaming source, a positioning callback function needs to be provided as well. This allows the stream to be repositioned as needed for playback instead of always having to start from the beginning.

Converting

Once created, a sound data object can be converted to any other supported format as needed. The conversion can either be done in-place and replace the existing object’s internal asset data, or done by creating a new sound data object. Whether the conversion is done in place or not, the returned object will be given a new reference that will need to be released by the caller at some point.

The conversion will attempt to be done in the least destructive manner possible. However, depending on the original format and the selected destination format, a loss of information may be unavoidable. For example, converting from 32-bit float PCM to 8 bit integer PCM will result in a lot of information lost. Note that a conversion will only change the sample format, not other aspects of the asset’s format (ie: its channel count).

Decoding & Encoding

When a sound data object needs to be played or processed, its decoded PCM data can be accessed using a decoder. In most cases, this is not necessary for external callers. This will be done automatically internally when the asset is played through the carb::audio::IAudioPlayback interface. The decoder can be used to get access to a stream of PCM data for the asset regardless of its encoded format. Similarly, a stream of PCM data can be used to encode a stream of PCM data into another format in a sound data object.

Encoding or decoding a sound has the following limitations:

  • decoding from a sound data object may be done from any sample format, but must always be decoded into a PCM format.

  • encoding into a sound data object may be encoded into any sample format, but must always come from a PCM source.

Performing an encode or decode operation requires that a codec state object be created first. The codec state is created with the IAudioData::createCodecState() function. The codec state allows for a sound data object to be decoded multiple times simultaneously without affecting any other instances. Each codec state may only perform operations in a single direction - encoding or decoding. Once the codec state is created, buffers of data may either be received from the decoder or submitted to the encoder (depending on the direction of the codec state). A codec state’s current position may be queried or changed as needed, though the accuracy of the positioning depends on the format. For example, some compressed formats may not support frame accurate seeking. Once the encode or decode operation is complete, the codec state can be destroyed with IAudioData::destroyCodecState().

Saving & Output Streams

A sound data object may be written to a file on disk if needed. There may be an optional format change when writing the asset to disk or it may be written in its current format. For some formats, a conversion or re-encoding may need to occur in order to write it to disk. Future versions may allow the file to be written to a blob in memory as well.

An output stream is similar to saving a sound to file (in fact, an output stream is used internally when saving to file). As with saving to file, an optional format conversion may occur in the output stream. The output stream allows an arbitrary stream of PCM data to be sent to a file. The PCM data is sent in anonymous buffers by ‘writing’ it to the stream. Any conversion to the destination format is performed on the buffer before attempting to write it to the stream. Depending on the destination format, the converted data may not be flushed to disk immediately. The only time that it will be guaranteed to be flushed to disk is when the output stream is closed.

Event Points & Loop Points:

Each sound data object may optionally include a set of caller specified event points or loop points. An event point is a spot in the asset where an event is expected to occur. An event point may or may not be used when playing the sound - that is decided by the caller when playing the sound. An event point consists of the frame number where the event point is expected to be triggered, an arbitrary identifier (used to match it when updating, deleting, or adding event points), a text name (for UI display), optional text, and a user data object. A loop point is a specialized event point that also specifies a region length and optional play index. There is no limit on the number of event points or loop points that a sound data object may contain.

Some asset file formats may include information for event points that may be parsed out of the file on load. These event points are only guaranteed on RIFF/WAV files. However, not many authoring tools have the ability to create or store this event point information in the file.

Event points and loop points are set on a sound data object using IAudioData::setEventPoints(). These are always set in a group. In a single call, individual event points may be added, deleted, or modified depending on the arbitrary identifier value in the source buffer and the given frame number. Event points may be retrieved either in groups or individually by various criteria (ie: by identifier, by play index, or by index).

When playing a sound data object, event points may be enabled using the fPlayFlagUseEventPoints flag. When this flag is used, a callback function is also expected to be specified in the play descriptor. When one of the sound data object’s event points is hit, the callback will be performed with the VoiceCallbackType::eEventPoint value set in its type parameter. The triggered event point descriptor will be passed in the callback’s data parameter. This callback and event point data may be used to trigger some external action in the program. Note that if the fPlayFlagRealtimeCallbacks flag is also used, the callback should execute and return as quickly as possible otherwise it will stall the audio processing engine. In general, the callback should just flag that something needs to occur and store any required information for another thread to handle. If the fPlayFlagRealtimeCallbacks flag is not used, the IAudioPlayback::update() function must be called in order for the event points callbacks to be performed.

Some examples of using event points may be:

  • apply closed captioning text to a dialogue sound track. The event point’s EventPoint::text member would contain the text to be displayed or added to a scrolling display. Each new line of text would show up when the audio sound track’s playback reached each event point frame.

  • trigger the next part of an animation that is synchronized to the sound a character is making. When the event point is fired, the character’s animation state would be updated to continue the sequence. This would allow for audio data driven animation sequences without needing to modify code.

  • trigger an in-game cut-scene sequence such as unlock a door or start an automated fight sequence at a certain point during a voice-over sequence. Again, this would be data driven so re-recording or editing the dialogue would not require code changes.

Typical Usage

This interface should be used to handle all operations on sound assets. This typically begins by calling IAudioData::createData() to create or load the sound data object from some form of source data (ie: a file on disk, a blob in memory, or create an empty buffer). The sound data object is then passed to a playback context to play on a new voice using IAudioPlayback::playSound(). Once the play task is done, the sound data object is released using IAudioData::release().

Another usage scenario would be to convert a sound asset from one format to another. The original sound asset is loaded with IAudioData::createData(). A conversion request is then setup and a call to IAudioData::convert(). This can either create a new sound data object with the conversion result, or replace the data in the original sound data object. The resulting sound data object can then be played, saved to disk, or otherwise operated on. The new and original sound data objects are then released with IAudioData::release() when they are no longer needed. Both IAudioData::release() calls are needed even if the conversion replaces the original object’s contents.

The IAudioData interface can also be used to manually decode a sound data object to raw PCM or encode it to another format from raw PCM. This is done by creating a ‘codec state’ object (with IAudioData::createCodecState()) for an existing sound data object, then either decoding or encoding buffers of data with IAudioData::decodeData() or IAudioData::encodeData().