[ previous ] [ next ] [ threads ]
 To :  <yate@v...>
 From :  "Andrew McDonald" <andrew.mcdonald@n...>
 Subject :  [yate] Incorrect rtp timestamping causes poor audio
 Date :  Thu, 31 Aug 2006 14:34:29 +0800

I recently found a bug in rtp timestamping in Yate which leads to poor audio for (conference) call participants, as it interferes with remote jitter buffers that rely on timestamps. 

The key ingredient that exposes this bug is running in an environment where mixed RTP packets sizes were in use i.e some participants were using 20ms audio and some using 30 ms audio.

I currently have a 1 line alteration to 
void YRTPConsumer::Consume(const DataBlock &data, unsigned long tStamp)
that corrects issue, which I have included in this posting. 

However I wanted to take the time to explain the background to finding this bug as it revealed other potential issues and I think the 1 line change is possibly a "hack",  and may not be the best solution, so my apologies for the long post.....please bear with me ;)

I was testing Yate 1.0 and had connected several parties into a conference. The resulting audio mixes when sent back to participants sounded _very_ poor in a fashion that usually indicates bad jitter. This was the case both for our own endpoints (using the oRtp stack) and also a Sipura SPA-841 handset.

Captures of the rtp stream by Wireshark (Ethereal) showed that no audio was being lost and also that audio was jittered in the fashion typical of simple unbuffered translation of from 30ms to 20ms packet sizes i.e. after 30ms elapsed time a single 20ms packet, then after another 30ms, two 20 ms packets, for a total of 60ms audio in 60ms. This last fact was not surprising as the conference contained participants using each sizing. 

As an aside current Asterisk releases are hardcoded to use 20ms and when passed 30ms audio will introduce similar jitter patterns into the output 20ms stream.

However, there were some puzzling aspects that countered the theory that the jitter was the cause. 

Firstly the observed level of jitter should have been handled by our jitter buffers. In fact increasing the sizes of our jitter buffers, and also those on the Sipura, made no difference to the audio quality. Additionally if the audio was sent to an X-lite UA the audio sounded fine. This was quite puzzling and so we began wondering if the oSip jitter buffer actually had a bug (and the Sipura ?!? ). A variety of tests ruled this out. 

Even more puzzling, if we made direct calls to a 30ms source via Asterisk or even Yate itself, the resulting return audio stream would have the same expected characteristic jitter pattern but would sound perfectly fine on all the endpoints. This also tended to suggest it was not simply jitter that was causing the problem.

It was at this point we noticed that the in the original scenario the timestamps in the rtp stream in the from Yate would move _backwards_ on occasion !!

This can been seen in Wireshark capture file OldTimestamps.cap in the attached Timestamps.zip file. The captures contain the SIP signalling and one of the return rtp streams. I suggest viewing with a colouring rule set (rtp && (ip.len != 200)) to highlight the changes in packet length and resulting timestamps.

For example, from OldTimestamps.cap we have the following sequence of packets;

seq=34		len=160		ts=5440
seq=35		len=240		ts=5760
seq=36		len=80		ts=6000
seq=37		len=160		ts=5920

When we look at this sequence there are actually two things wrong.
Firstly, and most obviously, the timestamp on packet 37 is earlier than packet 36. 

Secondly, and more subtly, the timestamp on packet 35 has advanced by 320 compared to packet 34, even though packet 3 contained only 160 bytes. 

It is this latter fact that hints at what is going wrong. Basically timestamps are being _pre_incremented based on packet size. A further hint in this direction is that packet 1 has timestamp 160. 

(As an aside, you can use the rtp analysis on this stream in WireShark will reveal the typical 30ms->20ms audio jitter pattern.)

This timestamping explains the earlier puzzling results. In the direct calls, to 30ms sources, timestamps do not move backwards. This also suggests that the X-lite jitter buffer uses only sequence numbers and that Wireshark payload exports are ordered strictly by sequence number.

Here is an outline of what was occurring inside Yate. 

Firstly a conference is created with two participants, one using 20ms audio, the other using 30ms audio. 

There are buffering checks at the tail end of;  
void ConfConsumer::Consume(const DataBlock& data, unsigned long tStamp)
and at the beginning of 
void ConfRoom::mix(ConfConsumer* cons) 
which are based on the following constants;

// size of the outgoing data blocks in bytes - divide by 2 to get samples
#define DATA_CHUNK 320
// minimum amount of buffered data when we start mixing
#define MIN_BUFFER 480
// maximum size we allow the buffer to grow
#define MAX_BUFFER 960

Note that these constants mean that mixed audio from the conference will be produced in 20ms (160 sample) chunks. 

I will use the word "chunk" to denote the size of the data buffers being passed internally through the data end-points in Yate to distinguish this from the  actual packet sizes that may ultimately be sent onto the network.

The presence of a 30ms and 20ms source means the current buffering algorithms in the conference will mostly emit 160 samples but sometimes emit 320 or more. The chunks will typically have the jitter pattern mentioned earlier. As an experiment I rewrote the buffering in a form based on a simple jitter buffer and this can mix 20ms and 30ms with little or no jitter. This also works around the problem as it means there very few changes in chunk sizes being passed to the DataSource, and it is these changes in chunk sizes which provokes the timestamping bug.

The chunks from the conference are passed to 
void DataSource::Forward(const DataBlock& data, unsigned long tStamp)
typically with tStamp set to -1.

The following logic is run on the timestamp _prior_ to passing the chunk on via DataConsumer::Consume;

    // no timestamp provided - try to guess
    if (tStamp == (unsigned long)-1) {
	tStamp = m_timestamp;
	const FormatInfo* f = m_format.getInfo();
	if (f)
	    tStamp += f->guessSamples(data.length());

The guessSamples() method will essentially return an increment directly proportional to the number of bytes, for linear audio.
After calling DataConsumer::Consume the new timestamp is then stored in m_timestamp for subseuqnt use in calculating the timestamp of the next chunk.

The key issue with this code is that it is _pre_ incrementing the timestamp. 

From RFC3550, 5.1 RTP Fixed Header Fields

   timestamp: 32 bits
      The timestamp reflects the sampling instant of the first octet in
      the RTP data packet.  The sampling instant MUST be derived from a
      clock that increments monotonically and linearly in time to allow
      synchronization and jitter calculations (see Section 6.4.1). 
      As an example, for fixed-rate audio
      the timestamp clock would likely increment by one for each
      sampling period.  If an audio application reads blocks covering
      160 sampling periods from the input device, the timestamp would be
      increased by 160 for each such block, regardless of whether the
      block is transmitted in a packet or dropped as silent.

      The initial value of the timestamp SHOULD be random, as for the
      sequence number.  Several consecutive RTP packets will have equal
      timestamps if they are (logically) generated at once, e.g., belong
      to the same video frame.  Consecutive RTP packets MAY contain
      timestamps that are not monotonic if the data is not transmitted
      in the order it was sampled, as in the case of MPEG interpolated
      video frames.  (The sequence numbers of the packets as transmitted
      will still be monotonic.)

The key phrase is that the timestamp reflects the sampling instance of the _first_ octet i.e. the relative time for the beginning of the packet. This means that typically for audio, the timestamp of any packet must be moved forward based on the size of the _previous_ packet. The code in DataSource::Forward moves the timestamp forward based on the time of the _current_ chunk. Note also that DataSource forward is (implicitly) initialising the timestamp to 0 when not supplied, whereas RFC3550 recommends initialising it to a random value.

This obviously makes little or no difference when all the chunks are the same size, however to see the problem imagine chunks change sizes (as they can and do). For example a chunk of 160 followed by 320 then 160.

Chunk 1 will generate a timestamp of 160 then use it and save it. Note the fact that this chunk is 160 in size means the _next_ timestamp should be 160 higher.
Chunk 2 will increment the timestamp by 320 then use it and save it. Note this is incorrect as the timestamp should only move forward 160.
Chunk 3  will increment the timestamp by 160 then use it and save it. Note that this is also incorrect as the timestamp should have moved forward by 320. 

The effect of this is that even if all three chunks are sent with zero jitter,  the incorrect timestamps will effectively fool remote jitter buffers into thinking that there is more jitter than is really the case.

This explains the first timestamping error, but not how the timestamps go backwards. 

DataSource::Forward calls 

void YRTPConsumer::Consume(const DataBlock &data, unsigned long tStamp)

via (DataSource::Forward -> DataConsumer::Consume ->  SimpleTranslator::Consume -> DataSource::Forward -> DataConsumer::Consume)
Note that each of these intermediary components tend to contain logic that can operate on timestamps.

In YRTPConsumer::Consume code exists to break large chunks into smaller packets; 

    if (!(m_wrap && m_wrap->bufSize() && m_wrap->rtp()))
    unsigned long delta = (tStamp && m_timestamp) ? (tStamp - m_timestamp) : 0;
    XDebug(&splugin,DebugAll,"YRTPConsumer writing %d bytes, delta=%lu ts=%lu [%p]",
    unsigned int buf = m_wrap->bufSize();
    const char* ptr = (const char*)data.data();
    unsigned int len = data.length();
    // make it safe to break a long octet buffer
    if (len == delta)
	delta = 0;
    while (len && m_wrap && m_wrap->rtp()) {
	unsigned int sz = len;
	if (m_wrap->isAudio() && (sz > buf) && !delta) {
	    DDebug(&splugin,DebugAll,"Creating %u bytes fragment of %u bytes buffer",buf,len);
            sz = buf;
	// if timestamp increment is not provided we have to guess...
	tStamp += delta ? delta : sz;
	len -= sz;
	ptr += sz;

Note that this logic is only activated if the size of the chunk is greater than the value returned from bufSize() on the wrapper (m_wrap) and if the size of the chunk exactly matches the delta in the timestamps.

One issue here is that m_wrap->bufSize() returns a value based on the static value in 
#define BUF_SIZE 240
static int s_bufsize = BUF_SIZE;

This means yrtpchan.cpp defaults to breaking large chunks into 30ms packets !! Thus a 320 byte chunk is broken into a 240 and 80 !!

Basically BUF_SIZE here is mismatched compared to DATA_CHUNK in conference.cpp, with one specifiying 30ms and the other 20ms ?
This leads to emitted packets varing from 160 to 240 and 80.

RFC3550 does _not_ require any specific consistency in packet sizes, but it may be preferable to match these sizes ??

When YRTPConsumer::Consume receives a 320 byte chunk the timestamp has already been (pre) incremented (erroneously) by DataSource::Forward. 
In other words tStamp is already at the value it should take _after_ all the data is sent. YRTPConsumer::Consume sends 240 bytes then increments the timestamp by 80 for the 80 byte packet. Thus resulting in a time on the 80 byte packet that is advanced by 80 over the correct final timestamp. When the next data chunk is passed through the next timestamp calculated is thus 80 _behind_ the timestamp used on the short 80 byte packet. 

My initial attempt at fixing this consisted of changing DataSource::Forward to supply an initial value (of 0, but should be random) when the timestamp is not supplied (-1), pass the timestamp to DataSource::Forward and to increment and store the result _after_ this.  However I was concerned as to whether this would produce regression problems with other functions in the handling the stream that may be (implicitly) relying on the pre-increment.

This change corrected the timestamps and eliminated the bad audio,  however it had the sideeffect of altering the packetisation strategy of YRTPConsumer::Consume. This was because because now len != delta and thus the fragmentation code would not run and basically a single 360 byte packet was sent. Again while this is perfectly legal I was unhappy that my change had this side-effect, especially as code in this function was implicitly relying on the pre-increment. 

Instead I altered YRTPConsumer::Consume to subtract the chunk length from the passed timestamp. 

The diffs for this change was;

>     tStamp -= len; 

This is effectively reversing the pre-increment done by DataSource::Forward and thus eliminates the timestamp problems and retains the previous packetisation i.e. 320 bytes are broken into 240 and 80. The resulting audio stream is in NewTimestamps.cap in the attached zip file. 

While happier that this change might be less likely to have regression effects, from a design point-of-view it is a hack that increases coupling, as one piece of code is effectively compensating for another from a completely different module and function. 

To summarise;  

Current timestamping is incorrect in two ways.

(a) Larger packets will bear a timestamp that is too late, and the subsequent packet one that is too early.
(b) A larger than normal packet followed by a smaller than normal packet will lead to an earlier time on the following packet. 

The timestamping issues can affect remote jitter buffers, leading to inappropriately jittered audio. 
Timestamping errors will occur if chunk sizes change. 
This can occur if conferencing togther sources with different packet sizes. 
The first problem may also occur when directly calling between audio sources of different packet sizes (I have not checked).

It would be useful if someone could advise me what is the appropriate action for the following issues:

(1) Timestamping.  

It seems the best fix for this, from a design point of view would be to alter DataSource::Forward, however I am unsure of how many of the other functions that can perform timestamp operations would also need to change, and whether clients of DataSource::Forward would also be affected. The single line change to YRTPConsumer::Consume minimises this risk for us right now bit seems to be a bit of hack. 

What would be most appropriate ?

(2) Mismatch in default buffer sizes between conference.cpp and yrtpchan.cpp. 

Should they be brought into line with each other ?

(3) Start timestamps from random number. 

Not required, by recommended by RFC. Should this be done ?

(4) Conference audio buffering can probably be altered to reduce/eliminate current jitter in output stream. Is this worthwhile ? 

Once the timestamping is corrected this jitter is readily handled by remote jitter buffers, however a rewritten version would not take more CPU or introduce more latency. 

As mentioned earlier I have tested a very simple rewrite of the code that eliminates almost all introduced jitter.

If this does seem worthwhile, then I would appreciate it if someone could clarify the mathematical goals/invariants of the current scheme as I am having trouble undertanding what it is trying to achieve, compared to a (very simple) jitter buffer.  (???)

Once again apologies for the long post, and I hope it all made sense.


Andrew McDonald
System Architect

> Norwood Systems Australia Pty Ltd
71 Troy Terrace
PO Box 1281
Subiaco, WA 6904

> Tel	+61 8 9380 7766		
> Fax	+61 8 9380 7733  		
> The information in this email, and any attachments, may contain confidential information and is intended solely for the attention and use of the named addressee (s). It must not be disclosed to any person(s) without authorization. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorized to, and must not, disclose, copy, distribute, or retain this message or any part of it. If you have received this communication in error, please notify the sender immediately.