Not Just AstLinux Stuff: 2013

Tuesday, December 10, 2013

High Quality Entropy

This is a (relatively) short post that I’ve been meaning to write for a while. A recent Ars Technica story (and the resulting HN discussion) prompted me to finally sit down and write this.

I’ve been aware of software defined radios for quite some time. The entire concept seemed very cool (especially because I’ve always been interested in radio). Unfortunately they were also somewhat expensive.

The RTL-SDR series of dongles changed all of that. For less than $20 you could enter the exciting world of software defined radio - using GNURadio and other programs to apply open source and programming to radio. It’s all very cool stuff but I struggled to find a practical application. Then I found rtl_entropy.

This time last year you would never hear words like “encryption”, “cryptography”, and “entropy” on news broadcasts like CNN. Then Edward Snowden starting revealing the reach of the National Security Agency’s various programs and since then we’ve been bombarded with a new revelation on an almost weekly basis.

Ideas that would have seemed crazy and paranoid just a year ago now seem remarkably likely. Backdoors in closed source programs. Submarines tapping undersea cables. Data collection on a massive scale. Tampering of hardware by various state agencies (the NSA and counterparts in China, for example).

Just today Ars Technica posted a story about the FreeBSD developers no longer explicitly trusting hardware RNG modules from Intel (RDRAND) and Via (padlock). Note that I say explicitly because unlike Linux (which has always mixed all available sources of entropy) FreeBSD uses one hardware source explicitly. With all of the questions surrounding the (potential) involvement of the NSA in these designs this probably wasn’t the best idea. In FreeBSD 10 that’s going to change.

However, RDRAND and padlock were always very good (supposedly) sources for LOTS of entropy. With their use being called into question, what are we to do for applications that require large amounts of available system entropy but are (hopefully) free from tampering and yet readily available?

That’s where rtl_entropy comes in. I first found this project back in August. rtl_entropy uses atmospheric radio noise from an RTL-SDR dongle to generate entropy. Because the RTL-SDR hardware is able to sample about 3.2 MS/S this translates into about 3.4Mbits/s of entropy. That’s a lot, especially when you consider what some high price (yet still proprietary and unknown) hardware entropy sources cost.

rtl_entropy is open source and available for inspection. It runs with cheap, mass-produced radio dongles which are (presumably) difficult to “cook” for this purpose. As the author states “it samples atmospheric noise, does Von-Neumann debiasing, runs it through the FIPS 140-2 tests, optionally (-e) does Kaminsky debiasing if it passes the FIPS tests, then writes to the output. It can be run as a daemon which by default writes to a FIFO, which can be read by rngd to add entropy to the system pool”. All of this for less than $20.

When compared to hardware random number generator devices from the Wikipedia table we can see just how revolutionary this is. With proprietary hardware (and in some cases software), which is still unavailable for inspection, the next cheapest source of high quality entropy is $235 per Mbit/s. With a $15 RTL-SDR dongle rtl_entropy is about $6 per Mbit/s and the entire software stack is open source. As said before, it’s also ready to use by just about any Linux system via rngd.

Also, because Linux always mixes the sources of available entropy rtl_entropy can be safely combined with various other mechanisms available - RDRAND, haveged, aed, etc.

I’m proud my company and I have been able to sponsor Paul and his work on rtl_entropy. This is what open source is all about!

Monday, September 23, 2013

Learning "Stupid NAT Tricks" from Apple

In my last post I spent some time (and many words) describing the what, how, and why behind some of the changes Apple has made to the Facetime protocol over the past three years.

Needless to say I was impressed. There were several things those of us in the open source world can learn from that analysis:

- SIP, STUN, RTP, RTCP port multiplexing (while not including the a=rtcp-mux attribute)

- Compact SIP headers

- SDP minimization (removing rtpmap lines, etc)

- SDP compression

The IETF is currently working on port multiplexing (mainly via WebRTC) but I have yet to see a proposal that includes multiplexing signaling and media on the same port (with the architecture of WebRTC that's not a surprise). Either way I'll consider this "in progress". I guess there's always IAX ;).

Compact SIP headers are already supported by a variety of open source applications. We'll consider this one done.

SDP minimization is something that can't generally be done in a standards compliant way. You'll note I labelled these optimizations as a "slight protocol violation" in my previous post. We'll table this one for now.

SDP compression, however, is very interesting. SIP bares more than a resemblance to HTTP and HTTP compression has been standard and well supported for quite some time. With that in mind I see no reason why it can't become more widely supported for SIP applications as well. It's also important to point out that while I'm calling this "SDP compression" it's really SIP body compression that could be used for any MIME type or combinations of MIME type (multipart).

As we saw from my last post this is no easy task. The parsing of plaintext SDPs is so easy and commonplace today that anything else is a significant undertaking. For compressed SIP bodies to be transparent to me in my everyday life they would need to be supported in at least the following projects/products:

- FreeSWITCH

- Kamailio

- PJSIP

- Wireshark

- HOMER

- Ixia IXLoad (probably not going to happen)

Fortunately I have good news! Shortly after my last post Daniel-Constantin Mierla from Kamailio contacted me and let me know that he had written the gzcompress module for "the fun of a quick coding in the evening". There's still work to be done on this module but this was an awesome (and unexpected) start!

In the same day I was also contacted by ashdnazg from the Pyreshark project. First of all, for a protocol junky like myself the ability to write Wireshark protocol dissectors on the fly in Python is very, very exciting. I can't believe I hadn't heard of this project before! Anyway, within hours of receiving the Facetime packet capture ashdnazg was able to provide me with a Pyreshark-based Wireshark dissector (in 26 lines of Python, no less)! My local (Ubuntu) instance of Wireshark now looks like this when loading a Facetime packet capture:

Oh yeah!

This is another marvel of open source - within 24 hours of my last post the leaders of two open source projects from across the globe were able to collaborate with me to support the functionality I described here. Amazing.

Of course we're not done yet. I created a bounty for FreeSWITCH to support this functionality as well. Anthony and the boys are very busy but we'll see if I can get any traction on this one.

That leaves PJSIP and HOMER. Seeing as HOMER is based on Kamailio it's probably not that difficult to implement this. I'd also like to think (based on the use case and project goals) that they may be more interested in this than anyone.

Stay tuned!

Friday, September 20, 2013

Apple's new Facetime - a SIP Perspective

A colleague of mine recently sent over a PCAP file containing an Apple Facetime session between two iPhone devices running just-released iOS 7. Being a protocol junkie he thought I might be interested in seeing them. Clearly that’s the case!

As always (because I’m lazy) I looked around the internet to see what other research people had done on the “Facetime protocol”. I found several excellent (though now somewhat dated) articles. Without giving it away just yet it looks like Apple has been very busy over the last three years (like you needed me to tell you that). As we usually do here, let’s start looking at packets!

At first glance Facetime 2013 resembles any normal SIP capture:

Then some interesting details begin to emerge… What are these unknown packets? What’s up with this INVITE?

Let’s talk about the unknown UDP packets. We’ll select one and force Wireshark to decode it as RTP:

That worked and we can see this is RTP using a dynamic payload type. Our RTP packets are now correctly dissected but Wireshark seems to have confused the SIP and STUN packets for RTP. Why?

Forcing Wireshark to decode a given stream (where “stream” is a SRC IP:PORT pair and a DST IP:PORT pair) attempts to force a decode of that protocol type on all packets belonging to that stream. So what happened here?

If we look closer at the UDP layer we can see that STUN, SIP, and RTP all appear to be using the same port number on each endpoint (16402 in this case). We also have no idea what codec payload type 104 is (we’d need to see the SDP for that). Now’s probably a good time to look at the SIP signaling a bit closer.

Apple is using compact SIP headers (interesting). The User-Agent uses the codename for Facetime and GK more than likely implies the use of Apple’s GameKit. That SDP, however, looks a little strange. Certainly not the simple, printable ASCII we’re used to seeing in SIP bodies!

However, not all hope is lost. SIP compact header “e” maps to encoding. Encoding “deflate” is pretty standard HTTP 1.1 compression (from RFC 1951). Wireshark clearly doesn’t support this with SIP so we’ll have to do a little more work…

I saved the SIP message bodies (SDPs) from the INVITE and 200 OK into separate files. Here’s what hexdump had to say about the INVITE:

0000000 da78 4f75 6e41 3083 bc10 f123 3f07 4da0

0000010 36bc e021 0f95 2886 d46d 4224 8a26 057a

0000020 554a 20d4 9010 bfbc a6eb 7352 d641 3d6a

0000030 3b33 f5e3 734d ebdf cbf4 b9db aa6b fd3a

0000040 a62a 1ebc 746e 9c65 eece 76c8 c059 1620

0000050 080b 85a3 b090 8800 6f7c 6dd4 3657 da97

0000060 2af7 373d 6a53 2b93 39c1 4fe5 c29a af7c

0000070 dbd0 8e7d 6b67 c714 bdc3 6a59 00a0 28dd

0000080 721e 0cf5 3fb8 4c59 624d 4c52 92ad dfb8

0000090 323a ee23 0555 ac68 960a 49f2 032e b77c

00000a0 22e8 8737 bac4 9a9e f7cc 5d5a 3f5c 8e9a

00000b0 1841 c170 29ec 9a5b c673 d380 7c7a 1545

00000c0 98b2 057e b082 5410 9ce0 54c3 eaf5 e1d7

00000d0 67d0 f53b 98ca e594 db45 ea5f ab31 e487

00000e0 55d2 2cdf f888 ba7d 6ddf 8494 8e28 5ae5

00000f0 d2c4 c571 8555 75ab effc 2f77 e58e e580

0000100 f3a3 594f 2acd 1b21 0aeb 5467 97da 07d4

0000110 770c 03fc 5b01 3c74

0000118

Oh, you can’t read that? Yeah, I can’t either. We’ll have to inflate these. Unfortunately no standard utility (gunzip, etc) seemed to want to uncompress these binary blobs so I had to hack up some PHP (first thing I could find with Google, whatever):

#!/usr/bin/php

//Add the usual php open and close statements (thanks Blogger)

$filename = "./my-sdp";

$handle = fopen($filename, "rb");

$contents = fread($handle, filesize($filename));

fclose($handle);

$uncompressed = gzuncompress($contents);

echo $uncompressed;

Which gave me a legible SDP offer from the INVITE:

v=0

o=GKVoiceChatService 0 0 IN IP4 192.168.231.118

s=mobile

c=IN IP4 192.168.231.118

b=AS:2000

t=0 0

a=FLS;VRA:0;MVRA:0;RVRA1:1;AS:2;MS:-1;LTR;CABAC;CR:3;LF:-1;PR;CH:4;AR:4/3,3/4;XR;

a=DMBR

a=CAP

m=audio 16402 RTP/AVP 104 105 106 9 0 124 122 121

a=rtcp:16402

a=fmtp:AAC SamplesPerBlock 480

a=rtpID:3189937293

a=au:65792

a=fmtp:104 sbr;block 480

a=fmtp:105 sbr;block 480

a=fmtp:106 sec;sbr;block 480

a=fmtp:122 sec

a=fmtp:121 sec

Now we can begin to analyze what’s actually happening here. Our INVITE contains a perfectly valid audio offer advertising support for PCMU (payload type 0), 16kHz G722 (payload 9), and six dynamic payload types. The IANA has defined static payload types for RTP. Simply put, we know payload type 0 is PCMU but anything between 96-127 SHOULD (RFC 4566) have a corresponding rtpmap line to map RTP payload type to a “media encoding name”. Looking at this trace alone I don’t know what these RTP payload types are.

However, it’s very likely that the payload types have stayed the same even though Apple has now removed the suggested rtpmap lines. More than likely Apple has “hardcoded” these payload type codec maps internally. We’ll get to why in just a bit.

The other iPhone 5 running iOS 7 responds with the following SDP answer in the 200 OK:

v=0

o=GKVoiceChatService 0 0 IN IP4 192.168.231.100

s=mobile

c=IN IP4 192.168.231.100

b=AS:2000:2000

t=0 0

a=FLS;VRA:0;MVRA:0;RVRA1:1;AS:2;MS:-1;LTR;CABAC;CR:3;LF:-1;PR;CH:4;AR:4/3,3/4;XR;

a=DMBR

a=CAP

m=audio 16402 RTP/AVP 104 106 121 122

a=rtcp:16402

a=fmtp:AAC SamplesPerBlock 480

a=rtpID:3770747611

a=au:65792

a=fmtp:104 sbr;block 480

a=fmtp:106 sec;sbr;block 480

a=fmtp:121 sec

a=fmtp:122 sec

Using what we know from the 2010 analysis (complete with rtpmap lines) it seems we have agreed to use the AAC_ELD codec at 24kHz and 16kHz sample rates. However, absent analysis of an RTP stream with payloads 121 or 122 (and missing rtpmap lines) it’s hard for me to say what those other payload types represent.

A casual reader may have read this far and thought to themselves: at the end of the day we’re ending up with 24kHz AAC_ELD audio just like we were in 2010. Apple has gone through all of this work for nothing. Oh and by the way, how/why are SIP, STUN, RTP, and RTCP using the same UDP port?

It’s called port multiplexing and it has become all the rage these days (it’s standard in WebRTC, for example - although not to this extent). Unfortunately I’m not able to determine if port multiplexing is new to Facetime 2013 (I’m sure someone will chime in here). Either way it’s an important technical distinction. Typically a SIP session of this type would require at least three UDP ports:

- SIP signaling

- RTP for audio

- RTCP for well, RTCP

What’s wrong with three ports? Three ports make it much more difficult (if not impossible) to cross some types of NAT devices and firewalls. Three ports and three possibly bad interactions with some firewall or other device. Three times as much exposure to Murphy's law. With port multiplexing Apple has greatly increased the chances that Facetime will work through challenging network environments. With this aggressive (I’ve never seen it before) use of UDP multiplexing for all of these standard protocols Apple has virtually guaranteed that if the signalling works the media and everything else will too.

UPDATE: The always on-point Olle Johansson has postulated that Apple might be multiplexing everything over a single port for greater compatibility with carrier grade NAT (CGN) implementations, which place specific limits on the number of ports used per client. Thanks Olle!

Any SIP engineer will tell you NAT is the bane of our existence. That’s why (even from 2010) Apple has made use of protocols and features such as STUN, ICE, TURN, etc. These are discussed all over the internet so I won’t get into them here but in summary they are all technologies used to traverse NAT devices and network firewalls.

It is very clear Apple has made Facetime (finally) ready for primetime. Let me explain why and how I came to that conclusion. First let’s talk strategy.

Without going into all of the details it is my opinion that Skype became as popular as it is because (like many successes) “it just works”. The main reason Skype “just works” is its almost-magical NAT traversal. The creators of Skype learned a lot from their previous gig defeating firewalls for peer to peer music sharing with Kazaa. Both Skype and Kazaa have an almost legendary reputation for NAT and firewall traversal. If there is a way through a NAT device or firewall they will probably figure it out. It’s with this technology (and timing, codecs, etc) that Skype became so popular. When nothing else worked, Skype would (and still does). It just works (and as we’ve seen that’s hugely valuable).

The more endpoints that “just work” the more Metcalfe’s law comes into effect. If Apple can succeed in making as many endpoints as possible “just work” they have a hugely valuable real-time communications network with Facetime.

A skeptic might point out that Apple is already using STUN, TURN, and ICE. They’ve already got NAT “figured out”. Generally speaking, yes. However, they’ve now taken some extraordinary steps to take their NAT handling (and by extension the “value” of Facetime) to the next level. Apple wants Facetime to work on as many networks as possible and they’ve spent a lot of time making sure of that. Let’s look at the changes from Facetime of 2010 to now:

- SIP, STUN, RTP, RTCP port multiplexing (while not including the a=rtcp-mux attribute)

- Compact SIP headers

- SDP minimization (removing rtpmap lines, etc)

- SDP compression

This clearly isn’t an off the shelf SIP and RTP based solution anymore, but how does it all add up? Also, why are three of these efforts focused on minimizing packet size (even if it means violating standards)?

I know this has been a long and winding road. Hopefully you’re still with me! To understand the value of minimizing packet size you need to peek into another little-known area of the internet: IP fragments. I’m going to butcher a lot of this but at this point you just need to get the broad strokes.

Each network link type is configured for a maximum transmission unit (MTU). For Ethernet this is typically 1500 bytes. This means that the maximum size of a single Ethernet frame can be 1500 bytes. In many, cases, however, the various links and links inside of links (encapsulation), etc mean that the effective end-to-end MTU is significantly smaller than that (we won’t get into ATM, etc). There are two ways this can be addressed:

- PMTU (Path MTU discovery)

- IP fragmentation

With these two technologies who cares about packet size? Firewalls and NAT devices, that’s who. First, many firewall administrators or vendors carelessly block all ICMP packets. That means PMTU is out. If Apple Facetime had greater than end-to-end MTU sized packets and depended on functioning PMTU there would be many instances where it would not work.

IP fragmentation poses another problem. Once again, many network vendors and firewall administrators outright block IP fragments. To add insult to injury (or is that injury to insult?) many of these devices have broken and/or buggy support for IP fragment reassembly. This especially goes for UDP (which to be fair is not a “connection oriented” protocol). I have personally witnessed many instances where firewall devices could successfully reassemble IP+TCP fragments but not IP+UDP fragments.

This leads to an almost unwinnable situation. Apple could use TCP for the signalling and UDP for media but then they’d lose the benefits of single port multiplexing. They could use TCP over a single port for everything but that would just be crazy. In either case TCP has a larger header anyway (larger header = larger packet for the same amount of data).

This has posed such a longstanding problem that some network operators and academics study the behavior of MTU interactions and IP fragmentation on the internet. It came up most recently on NANOG a few weeks ago. Let’s look at the results of Emile Aben’s mini-study:

Results:

size = ICMP packet size, add 20 for IPv4 packet size
fail% = % of vantage points where 5 packets where sent, 0 where received.
#size   fail%   vantage points
100     0.88    2963
300     0.77    3614
500     0.88    1133
700     1.07    3258
900     1.13    3614
1000    1.04    770
1100    2.04    3525
1200    1.91    3303
1300    1.76    681
1400    2.06    3014
1450    2.53    3597
1470    3.01    2192
1470    3.12    3592
1473    4.96    3566
1475    4.96    3387
1480    6.04    679
1480    4.93    3492 [*]
1481    9.86    3489
1482    9.81    3567
1483    9.94    3118

From these results we can see that you start to have more than 1% failure when packet size is greater than ~~700~~ 1000 bytes.

The Facetime 2010 INVITE packet was 1093 bytes. The Facetime 2013 INVITE packet is 714 bytes. In any case there is a greater chance that the 714 byte packet will reach its intended destination. From these results we can also see that you start to run into real trouble once you reach 1400 bytes. Sure that’s twice the size of our current packet but who knows what the future holds for the capabilities of Facetime? Screen sharing, multi-party conferencing, file transfer, etc all depend on larger SDP descriptions. With the changes Apple has made they’ve not only increased the robustness of Facetime today, they’ve given themselves “room to grow” in the future.

Apple has spent some time in the trenches over the last few years and found out how difficult real-time communications can be in the real world. Facetime isn't playing around anymore and Apple is becoming a serious networking (services?) company that's posed to take on current best-of-breed solutions.

If you're interested to see how this might be implemented in various open source projects, check out the update!

Wednesday, February 20, 2013

findpod - Find Your Own "Packets of Death"

After my original post on “packets of death” I’ve spent the last couple of weeks receiving reports from users all over the world ranging from:

- Able to reproduce on 82574L with your “packets of death”
- Not able to reproduce on 82574L or any other controller
- Not able to reproduce with your “packets of death” but experiencing identical, sporadic failures across a wide range of controllers

Of course the last category intrigued me the most. There seem to be an awful lot of people experiencing sporadic failures of their ethernet controllers but many of them (as I noted in another update) don’t have the time or tools to diagnose the issue further. In most cases the symptoms are identical to what I described in my original post:

- Ethernet controller loses link (or reports some other hardware error)
- Varying amounts of time since boot (hours, days, weeks)
- Can only be resolved by a reboot or in some cases a complete power cycle

Many of these users have been dealing with these failures in various ways but have been unable to find a root cause. I’ve created a tool to help them with their diagnosis. It’s called findpod and it’s been tested on various Debian-based Linux distributions. Findpod uses the excellent ifplugd daemon, the venerable tcpdump, and screen.

Once installed and started iflpugd will patiently wait to receive link status notifications from the Linux kernel. Once link is detected on the target interface it will start a tcpdump session running inside of screen. This tcpdump session will log all packets sent and received on that interface. Here’s the thing - many of these failures are reported after days or weeks of processed traffic - the tcpdump capture file could easily reach several gigabytes or more! Here’s where one key trick in findpod comes into play - by default findpod will only log the last 100MB sent or received on the target interface. As long as ifplugd doesn’t report any link failures tcpdump will keep writing to the same 100MB circular capture file.

What happens when the interface loses link? This is the second unique feature of findpod. When ifplugd reports a loss of link it will wait for 30 seconds before stopping the packet capture and moving the capture file to a meaningful (and known) name. If you think your ethernet controller failures could be related to the types of traffic you’re sending or receiving (as I discovered with my “packets of death”) findpod will help you narrow it down to (at most) 100MB of network traffic, even if the capture runs for weeks and your interface handles GBs of data!

Of course (more than likely) it’s even easier than that; if your link loss is being caused by a specific received packet it will be the last packet in the capture file provided by findpod and you’ll only have a 100MB capture file to work with. If your issue is anything like mine you should be able to isolate it down to a specific packet that you can feed to tcpreplay; reproducing your controller issue on demand.

Please tell me about your experiences with findpod. As always, comments and suggestions are welcome!

Friday, February 8, 2013

Packets of Death - UPDATE

UPDATE - Intel has pointed me towards the successor to the 82574, which includes some of the features I suggest here. I think it's safe to say we're all looking forward to this chip hitting the streets!

The last 48 hours has been interesting, to say the least.

My original post gathered much more attention than I originally thought. I’ll always remember being on a conference call and having someone tell me “Hey, you’re on Slashdot”. Considering the subject matter I suppose I should have expected that.

As of today, here’s what I know:

Many of you have shared the results of your testing. The vast majority of tested Intel ethernet controllers do not appear to be affected by this issue.

Intel has responded with an expanded technical explanation of the issue. I also received a very pleasant and professional phone call from Douglas Boom (at Intel) to update me on their assessment of the situation and discuss any ongoing concerns or problems I may have. Thank you Doug and well done Intel! Note to other massive corporations that could be presented with issues like this: do what Intel did.

To summarize their response, Intel says that a misconfigured EEPROM caused this issue. The EEPROM is written by the motherboard manufacturer and not Intel. Intel says my motherboard manufacturer did not follow their published guidelines for this process. Based on what I’ve seen, how my issue was fixed, and what I’m learning from the crowdsourced testing process this seems like a perfectly plausible explanation. Once again, thanks Intel!

However, I still don’t believe this issue is completely isolated to this specific instance and one motherboard manufacturer. For one, I have received at least two confirmed reports from people who were able to reproduce this issue - my “packet of death” shutting down 82574L hardware from different motherboard manufacturers. This doesn’t surprise me at all.

One thing we’re reminded of in this situation is just how complex all of these digital systems have become. We’re a long way from configuring ethernet adapters with IRQ jumpers. Intel has designed an incredibly complex ethernet controller - the datasheet is 490 pages long! Of course I’m not faulting them for this - the features available in this controller (or many other controllers) dictate this level of complexity - management/IPMI features, WOL, various offloading mechanisms, interrupt queues, and more. This complexity doesn’t even scratch the surface of the various other systems involved in getting data across the internet and into your eyeballs!

Like any sufficiently advanced product all of these features are driven by a configuration mechanism. The Linux kernel module for the 82574L (e1000e) has various options that can be passed to modify the behavior of the adapter. Makes sense. If I passed some stupid or unknown parameter to this module I would expect it to return with some kind of error informing me of my mistake. I’m only human, mistakes are going to happen.

At a lower level Intel has exposed these EEPROM configuration parameters to control various aspects of the controller. As Intel says these EEPROM values are to be set by the motherboard manufacturer. Here’s where the problem lies - it’s certainly possible this could be done incorrectly. Motherboard manufacturers are human and they make mistakes too.

Unfortunately, as we’ve learned in this case, there isn’t quite the same level of feedback when EEPROM misconfigurations happen. In my previous example if I pass unknown parameters to a kernel module it’s going to come back and say “Hey - I don’t know what that is (dummy)” and exit with an error.

As I’ve shown in some cases (mine) if an EEPROM is misconfigured everything appears normal until some insanely specific packet is received. Then the controller shuts down, for some reason.

Does that behavior make sense to anyone?

I suggest the following:

1) Make future controllers have as much in-hardware sane behavior as possible when unknown conditions are encountered. Error checking, basically. Users can input data on a web form, that’s why there’s error checking. Everyone knows users do stupid things. Clearly some of the people programming Intel EEPROMs for motherboard OEMs do stupid things too. What is sane default behavior? EEPROM error encountered = adapter shutdown and error message. Give the user notification and provide some mechanism for EEPROM validation and management...

2) Put more EEPROM validation in operating system drivers. Intel maintains ethernet drivers for various platforms. Why aren’t these drivers doing more validation of the adapter EEPROM? If my EEPROM was so badly misconfigured, why couldn’t the e1000e module have discovered that and notified me?

3) Produce and support an external tool for EEPROM testing, programming, and updating. In the course of working with Intel last fall I was provided a version of this tool for my testing so I know it exists. While I can understand why you don’t want random users messing with their EEPROM (and causing potential support nightmares) it seems the benefits would clearly outweigh any potential problems (of which there are already plenty).

The reality is Intel has no idea how many systems are affected by this issue or could be affected by issues like it. How could they? They’re expecting motherboard OEMs to follow their instructions (and understandably so). Just look at the combination of variables required to reproduce this issue:

- Intel 82574L
- Various specific misconfigured bytes in the EEPROM
- An insanely specific packet with the right value at just the right byte, received at a specific time

While most people weren’t able to reproduce this issue with their controller and EEPROM combinations it did kick off various discussions of periodic, random, sporadic failures across a wide range of ethernet adapters and general computing weirdness. A quick Google search returns a wide assortment of complaints with these adapters (and others like it) from a whole slew of users. EEPROM corruption. Random adapter resets. Packet drops. Various latency issues. PCI bus scan errors. ASPM problems. The list goes on and on.

Perhaps the “packet of death” for a slightly misconfigured Intel 82579 (for example) is my packet shifted 20 bytes in one direction or the other. Who knows? Please, please, please Intel - lets do everyone a favor and get these EEPROMs under control. End users update firmware all of the time - routers, set-top boxes, sometimes even their cars! Why can’t we have some utility to make sure our ethernet adapters aren’t just waiting to freak out when they receive the wrong packet?

I don’t believe in magic, swamp gas, sun spots, or any of the other “explanations” offered for some of the random strange behavior we often see with these complex devices (ethernet adapters or otherwise). That’s why I spent so long working on this issue to find a root cause (well that and screaming customers). I, like anyone else, encounter bugs and general weirdness in devices and software everyday in my life. Most of the time how do I respond to these bugs? I reboot, shrug my shoulders, say “that was weird”, and move on. Meanwhile I know, deep down, that there is a valid explanation for what just happened. Just like there was with my ethernet controllers.

Even with the explanation offered by Intel we could go much deeper. Why these bytes at that location? Why this packet? What’s up with the “inoculation” effect of some of the values? There are still many unanswered questions.

I’ve enjoyed reading many others report their tales of “extreme debugging” with the digital devices in their lives. It seems I’m not the only one that isn’t always satisfied with saying “that was weird” and moving on.

I've said it before and I'll say it again - I love the internet!

Wednesday, February 6, 2013

Packets of Death

When you're done reading this check out my update! Experiencing similar ethernet controller issues but don't know where to start?

UPDATE: See the packets and find out if you're affected here.

UPDATE 2: Yes, I've reproduced this issue regardless of OS, ASPM state/settings, or software firewall settings. Obviously if you have a layer 2/3 firewall in front of an affected interface you'll be ok.

Packets of death. I started calling them that because that’s exactly what they are.

Star2Star has a hardware OEM that has built the last two versions of our on-premise customer appliance. I’ll get more into this appliance and the magic it provides in another post. For now let’s focus on these killer packets.

About a year ago we released a refresh of this on-premise equipment. It started off simple enough, pretty much just standard Moore’s Law stuff. Bigger, better, faster, cheaper. The new hardware was 64-bit capable, had 8X as much RAM, could accommodate additional local storage, and had four Intel (my preferred ethernet controller vendor) gigabit ethernet ports. We had (and have) all kinds of ideas for these four ports. All in all it was pretty exciting.

This new hardware flew through performance and functionality testing. The speed was there and the reliability was there. Perfect. After this extensive testing we slowly rolled the hardware out to a few beta sites. Sure enough, problems started to appear.

All it takes is a quick Google search to see that the Intel 82574L ethernet controller has had at least a few problems. Including, but not necessarily limited to, EEPROM issues, ASPM bugs, MSI-X quirks, etc. We spent several months dealing with each and every one of these. We thought we were done.

We weren’t. It was only going to get worse.

I thought I had the perfect software image (and BIOS) developed and deployed. However, that’s not what the field was telling us. Units kept failing. Sometimes a reboot would bring the unit back, usually it wouldn’t. When the unit was shipped back, however, it would work when tested.

Wow. Things just got weird.

The weirdness continued and I finally got to the point where I had to roll my sleeves up. I was lucky enough to find a very patient and helpful reseller in the field to stay on the phone with me for three hours while I collected data. This customer location, for some reason or another, could predictably bring down the ethernet controller with voice traffic on their network.

Let me elaborate on that for a second. When I say “bring down” an ethernet controller I mean BRING DOWN an ethernet controller. The system and ethernet interfaces would appear fine and then after a random amount of traffic the interface would report a hardware error (lost communication with PHY) and lose link. Literally the link lights on the switch and interface would go out. It was dead.

Nothing but a power cycle would bring it back. Attempting to reload the kernel module or reboot the machine would result in a PCI scan error. The interface was dead until the machine was physically powered down and powered back on. In many cases, for our customers, this meant a truck roll.

While debugging with this very patient reseller I started stopping the packet captures as soon as the interface dropped. Eventually I caught on to a pattern: the last packet out of the interface was always a 100 Trying provisional response, and it was always a specific length. Not only that, I ended up tracing this (Asterisk) response to a specific phone manufacturer’s INVITE.

I got off the phone with the reseller, grabbed some guys and presented my evidence. Even though it was late in the afternoon on a Friday, everyone did their part to scramble and put together a test configuration with our new hardware and phones from this manufacturer.

We sat there, in a conference room, and dialed as fast as our fingers could. Eventually we found that we could duplicate the issue! Not on every call, and not on every device, but every once in a while we could crash the ethernet controller. However, every once in a while we couldn’t at all. After a power cycle we’d try again and hit it. Either way, as anyone who’s tried to diagnose a technical issue knows the first step is duplicating the problem. We were finally there.

Believe me, it took a long time to get here. I know how the OSI stack works. I know how software is segmented. I know that the contents of a SIP packet shouldn’t do anything to an ethernet adapter. It just doesn’t make any sense.

Between packet captures on our device and packet captures from the mirror port on the switch we were finally able to isolate the problem packet. Turns out it was the received INVITE, not the transmitted 100 Trying! The mirror port capture never saw the 100 Trying hit the wire.

Now we needed to look at this INVITE. Maybe the userspace daemon processing the INVITE was the problem? Maybe it was the transmitted 100 Trying? One of my colleagues suggested we shutdown the SIP software and see if the issue persisted. No SIP software running, no transmitted 100 Trying.

First we needed a better way to transmit the problem packet. We isolated the INVITE transmitted from the phone and used tcpreplay to play it back on command. Sure enough it worked. Now, for the first time in months, we could shut down these ports on command with a single packet. This was significant progress and it was time to go home, which really meant it was time to set this up in the lab at home!

Before I go any further I need to give another shout out to an excellent open source piece of software I found. Ostinato turns you into a packet ninja. There’s literally no limit to what you can do with it. Without Ostinato I could have never gotten beyond this point.

With my packet Swiss army knife in hand I started poking and prodding. What I found was shocking.

It all starts with a strange SIP/SDP quirk. Take a look at this SDP:

v=0
o=- 20047 20047 IN IP4 10.41.22.248
s=SDP data
c=IN IP4 10.41.22.248
t=0 0
m=audio 11786 RTP/AVP 18 0 18 9 9 101
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:0 PCMU/8000
a=rtpmap:18 G729/8000
a=fmtp:18 annexb=no
a=rtpmap:9 G722/8000
a=rtpmap:9 G722/8000
a=fmtp:101 0-15
a=rtpmap:101 telephone-event/8000
a=ptime:20
a=sendrecv

Wireshark picture:

Yes, I saw it right away too. The audio offer is duplicated and that’s a problem but again, what difference should that make to an Ethernet controller?!? Well, if nothing else it makes the ethernet frame larger...

But wait, there were plenty of successful ethernet frames in these packet captures. Some of them were smaller, some were larger. No problems with them. It was time to dig into the problem packet. After some more Ostinato-fu and plenty of power cycles I was able to isolate the problem pattern (with a problem frame).

Warning: we’re about to get into some hex.

The interface shutdown is triggered by a specific byte value at a specific offset. In this case the specific value was hex 32 at 0x47f. Hex 32 is an ASCII 2. Guess where the 2 was coming from?

a=ptime:20

All of our SDPs were identical (including ptime, obviously). All of the source and destination URIs were identical. The only difference was the Call-IDs, tags, and branches. Problem packets had just the right Call-ID, tags, and branches to cause the “2” in the ptime to line up with 0x47f.

BOOM! With the right Call-IDs, tags, and branches (or any random garbage) a “good packet” could turn into a “killer packet” as long as that ptime line ended up at the right address. Things just got weirder.

While generating packets I experimented with various hex values. As if this problem couldn’t get any weirder, it does. I found out that the behavior of the controller depended completely on the value of this specific address in the first received packet to match that address. It broke down to something like this:

Byte 0x47f = 31 HEX (1 ASCII) - No effect
Byte 0x47f = 32 HEX (2 ASCII) - Interface shutdown
Byte 0x47f = 33 HEX (3 ASCII) - Interface shutdown
Byte 0x47f = 34 HEX (4 ASCII) - Interface inoculation

Bad:

Good:

When I say “no effect” I mean it didn’t kill the interface but it didn’t inoculate the interface either (more on that later). When I say the interface shutdown, well, remember my description of this issue - the interface went down. Hard.

With even more testing I discovered this issue with every version of Linux I could find, FreeBSD, and even when the machine was powered up complaining about missing boot media! It’s in the hardware; the OS has nothing to do with it. Wow.

To make matters worse, using Ostinato I was able to craft various versions of this packet - an HTTP POST, ICMP echo-request, etc. Pretty much whatever I wanted. With a modified HTTP server configured to generate the data at byte value (based on headers, host, etc) you could easily configure an HTTP 200 response to contain the packet of death - and kill client machines behind firewalls!

I know I’ve been pointing out how weird this whole issue is. The inoculation part is by far the strangest. It turns out that if the first packet received contains any value (that I can find) other than 1, 2, or 3 the interface becomes immune from any death packets (where the value is 2 or 3). Also, valid ptime attributes are defined in ~~powers~~ (edit: multiples) of 10 - 10, 20, 30, 40. Depending on Call-ID, tag, branch, IP, URI, etc (with this buggy SDP) these valid ptime attributes line up perfectly. Really, what are the chances?!?

All of a sudden it’s become clear why this issue was so sporadic. I’m amazed I tracked it down at all. I’ve been working with networks for over 15 years and I’ve never seen anything like this. I doubt I’ll ever see anything like it again. At least I hope I don’t...

I was able to get in touch with two engineers at Intel and send them a demo unit to reproduce the issue. After working with them for a couple of weeks they determined there was an issue with the EEPROM on our 82574L controllers.

They were able to provide new EEPROM and a tool to write it out. Unfortunately we weren’t able to distribute this tool and it required unloading and reloading the e1000e kernel module, so it wouldn’t be preferred in our environment. Fortunately (with a little knowledge of the EEPROM layout) I was able to work up some bash scripting and ethtool magic to save the “fixed” eeprom values and write them out on affected systems. We now have a way to detect and fix these problematic units in the field. We’ve communicated with our vendor to make sure this fix is applied to units before they are shipped to us. What isn’t clear, however, is just how many other affected Intel ethernet controllers are out there.

I guess we’ll just have to see...

Not Just AstLinux Stuff