There is a decent amount of documentation out there that details all of the tunable parameters on the Mac OSX IP stack. However, most of these documents either provide basic suggestions without much background on a particular setting or they discuss some of the implications of changing certain parameters but don’t give you very solid guidance or recommendations on the best configuration in a particular scenario. Many of the parameters are dependent upon others. So, the configuration should be addressed with that in mind. This document applies to OSX 10.5 Leopard, 10.6 Snow Leopard, 10.7 Lion, and 10.8 Mountain Lion.
The closest thing I have found to a full discussion and tutorial on the topic can be found here. Unfortunately, that link is now offline. For now, you can at least find it in the Wayback Machine archive here. Thank you to martineau(at)linxure.net for pointing this out.
The above document was a great reference to bookmark, but I thought I would also include my own thoughts on this topic to shed some additional light on the subject.
LAST UPDATED – 10/23/13
This post has generated a fair amount of feedback, due to issues encountered. I have attempted to provide an update to address certain strange connectivity behaviors. I believe that most issues were caused by the aggressive TCP keepalive timers. I have not been able to recreate any of the strange side effects, yet, with these updates. So far, so good. After over a year of using these settings on 10.6 Snow Leopard and the past year on 10.8 Mountain Lion, I can report that all local applications and systems are running well. I definitely notice more zip in webpage display inside Chrome and I am able to sustain higher throughputs on various speed tests compared to before.
For reference, here are the custom settings I have added to my own sysctl.conf file:
The easiest way to edit this file is to open a Terminal window and execute ‘sudo nano /etc/sysctl.conf’. The sudo command allows you to elevate your rights to admin. You will be prompted to enter your password if you have admin rights. nano is the name of the command line text editor program. The above entries just get added to this file one line at a time.
You can also update your running settings without rebooting by using the ‘sudo sysctl -w’ command. Just append each of the above settings one at a time after this command. kern.ipc.maxsockets and kern.ipc.nmbclusters can only be modified from the sysctl.conf file upon reboot.
Following you will find my explanations about each of the parameters I have customized or included in my sysctl.conf file:
- One suggestion out there is to set the kern.ipc.maxsockbuf value to the sum of the net.inet.tcp.sendspace and net.inet.tcp.recvspace variables. The key is that this value can’t be any less than the sum of those 2 values or it can cause some fatal errors with your system. The default value, at least what I found on 10.6.5, is 4194304. That is more than enough. In my case, I have just hard coded this value into my sysctl.conf file to ensure that it does not change to prevent problems. If you are trying to tune for high throughput with Gigabit connectivity you may want to increase this value as recommended in the TCP Tuning Guide for FreeBSD. Generally the suggestion seems to be to minimally set this value to twice the Bandwidth Delay Product. For the majority of the world that is using their Mac on a DSL or Cable connection, that value would be much less than what you would need to support local transfers on your LAN. Personally, I’d leave this at the default but hard code it just to be sure.
- kern.ipc.somaxconn limits the maximum number of sockets that can be open at any one time. The default here is just 128. If an attacker can flood you with a sufficiently high number of SYN packets in a short enough period of time, all of your possible network connections will be used up, thus successfully denying your users access to the service. Increasing this value is also beneficial if you run any automated programs like P2P clients that can drain your connection pool real quickly.
- kern.ipc.maxsockets is now a baseline setting and the ceiling is dynamically calculated based on available system memory. This setting is no longer relevant and is not user configurable even at boot time.
- kern.ipc.nmbclusters set the connection thresholds for the entire system. One socket is created per network connection, and one per Unix domain socket connection. While remote servers and clients will connect to you on the network, more and more local applications are taking advantage of using Unix domain sockets for inter-process communication. There is far less overhead as full TCP packets don’t have to be constructed. The speed of Unix domain socket communication is also much faster as data does not have to go over the network stack but can instead go almost directly to the application. The number of sockets you’ll need depends on what applications will be running. I would recommend starting with a value matching the number of network buffers, and then tuning it as appropriate. You can find out how many network buffer clusters in use with the command
netstat -m. The defaults are usually fine for most people. However, if you want to host Torrents, you will likely want to tune these values to 2 or 4 times the default of 512. The kern.ipc.nmbclusters value appears to default to 32768 on 10.7 Lion. So, this should not be something you have to tune going forward.
- I have hard-coded the enabling of RFC1323 (net.inet.tcp.rfc1323) which is the TCP High Performance Options (Window Scaling). This should be on by default on all future OSX builds. It should be noted that this setting also enables TCP Timestamps by default. This adds an additional 12 bytes to the TCP header, thus reducing the MSS to 1448 bytes. The default value, at this point, is arbitrarily set to 3. I have hard-coded the Window Scaling factor to 4 because it matches my need to fill up my existing Internet connection. Ensuring this value is set to 4 allows me the ability to fully utilize my 45Meg AT&T Uverse connection. I calculated this based on my Internet connection’s bandwidth-delay product. On average I should be able to achieve 45Mbps or 45 x 106bits per second. My average maximum roundtrip latency is somewhere around 50 milliseconds or 0.05 seconds. 45 x 106 x 0.05 = 2,250,000 bits. So, my network can sustain approximately 2,250,000 / 8 bits per byte = 281,250 Bytes of outstanding, unacknowledged data on the network if my aim is to fully utilize my bandwidth. The TCP window field is 16 bits wide yielding a maximum value of 65535 Bytes. A window scaling factor of 3 which is the same as saying 23 = 8 is more than enough to fill my connection. If the TCP window is set to 65535 with a window scale factor of 3, I would be able to transmit 23 x 65,535 Bytes = 524,280 Bytes on the network before requiring an ACK packet. So, a value of 3 for the Window Scale Factor setting should be more than adequate for the vast majority of individual’s Internet connections. Once you get beyond 100Mbps with an average peak latency around 50 milliseconds, you might want to consider bumping the Window Scale Factor up to 4.
If you notice unacceptable poor performance with key applications you use, I would suggest you disable this option and make sure your net.inet.tcp.sendspace and net.inet.tcp.recvspace values are set no higher than 65535. Any applications with load balanced servers that are using a Layer 5 type ruleset can exhibit performance problems with window scaling if the explicit window scaling configuration has not been properly addressed on the Load Balancer.
- The next option I have set is net.inet.tcp.sockthreshold. This parameter sets the number of open sockets at which the system will begin obeying the values you set in the net.inet.tcp.recvspace and net.inet.tcp.sendspace parameters. The default value is 64. Essentially think of this as a Quality of Service threshold. Prior to reaching this number of simultaneous sockets, the system will restrict itself to a max window of 65536 bytes per connection. As long as your window sizes are set above 65536, once you hit the socket threshold, the performance should always be better than anything to which you were previously accustomed. The higher you set this value, the more opportunity you give the system to “take over” all of your network resources.
- The net.inet.tcp.sendspace and net.inet.tcp.recvspace settings control the maximum TCP window size the system will allow sending from the machine or receiving to the machine when the connection counts are over a pre-defined threshold. Up until the latest releases of most operating systems, these values defaulted to 65535 bytes. This has been the de facto standard essentially from the beginning of the TCP protocol’s existence in the 1970′s. Now that the RFC1323 High Performance TCP Options are starting to be more widely accepted and configured, these values can be increased to improve performance. I have set mine both to 1042560 bytes. That is essentially 16 times the previous limit. I arrived at this value using the following calculation:
MSS x 45 x 24 = 1448 x 45 x 16 = 1042560
- The MSS I am using is 1448 because I have RFC1323 enabled which enables TCP Timestamps and reduces the default MSS of 1460 bytes by 12 bytes to 1448 bytes.
- 24 matches the Windows Scaling Factor I have chosen to configure.
- The value of 45 is a little bit more convoluted to figure out. This number is a multiple of the MSS that is less than or equal to the max TCP Window field value of 65535 bytes. So, 1448 x 45 = 65160. If you were using an MSS of 1460, this value would be set to 44. But, in the case of OSX, since TCP Timestamps are automatically enabled when you enable RFC1323, you shouldn’t set the MSS higher than 1448. It might be less if you have additional overhead on your line such as PPPoE on a DSL line etc.
You must have the RFC1323 options enabled, in order to set these values above 65535.
- The net.inet.tcp.mssdflt setting seems simple to configure on the surface. However, arriving at the optimum setting for your particular network setup and requirements can be a mathematical exercise that is not straightforward. The default MSS value that Apple has configured is a measly 512 bytes. That setting value is more targeted to be optimal for dial-up users. The impact is not really noticeable on a high speed LAN segment. But it can be a performance bottleneck across a typical residential broadband connection. This setting adjusts the Maximum Segment Size that your system can transmit. You need to understand the characteristics of your own network connection, in order to determine the appropriate value. For a machine that only communicates with other hosts across a normal Ethernet network, the answer is very simple. The value should be set to 1460 bytes, as this is the standard MSS on Ethernet networks. IP packets have a standard 40 byte header. With a standard MTU of 1500 bytes on Ethernet, that would leave 1460 bytes for payload in the IP packet. In my case, I had a DSL line that used PPPoE for its transport protocol. In order to get the most out of that DSL line and avoid wasteful protocol overhead, I wanted this value to be exactly equal to the amount of payload data I can attach within a single PPPoE frame to avoid fragmenting segments which causes additional PPPoE frames and ATM Cells to be created which adds to the overall overhead on my DSL line and reduces my effective bandwidth. There are quite a few references out there to help you determine the appropriate setting. So, to configure for a DSL line that uses PPPoE like mine, an appropriate MSS value would be 1452 bytes. 1460 bytes is the normal MSS on Ethernet for IP traffic, as I described earlier. With PPPoE you have to subtract an additional 8 bytes of overhead for the PPPoE header. That leaves you with an MSS of 1452 bytes. There is one other element to account for. ATM. Many DSL providers, like mine, use the ATM protocol as the underlying transport carrier for your PPPoE data. That used to be the only way it was done. ATM uses 53 byte cells of which each cell has a 5 byte header. That leaves 48 bytes for payload in each cell. If I set my MSS to 1452 bytes, that does not divide evenly across ATM’s 48 byte cell payloads. 1452/48 = 30.25 I am left with 12 bytes of additional data to send at the end. Ultimately ATM will fill the last cell with 36 bytes of null data in that scenario. To avoid this overhead, I reduce the MSS to 1440 bytes so that it will evenly fit into the ATM cells. 30 * 48 = 1440 < 1452
I now have AT&T Uverse which uses VDSL with Packet Transfer Mode (PTM) as the transport protocol. It provides an MTU of 1500. So this eliminates all the complexity of the above calculations and take things back to the default of 1460 bytes. However, if you have enabled the RFC1323 option for TCP Window Scaling, the MSS should be set to 1448 to account for the 12 byte TCP Timestamp headers that OSX includes when that option is enabled.
- net.inet.tcp.msl defines the Maximum Segment Life. This is the maximum amount of time to wait for an ACK in reply to a SYN-ACK or FIN-ACK, in milliseconds. If an ACK is not received in this time, the segment can be considered “lost” and the network connection is freed. This setting is primarily about DoS protection but it is also important when it comes to TCP sequence reuse or Twrap. There are two implications for this. When you are trying to close a connection, if the final ACK is lost or delayed, the socket will still close, and more quickly. However if a client is trying to open a connection to you and their ACK is delayed more than 7500ms, the connection will not form. RFC 753 defines the MSL as 120 seconds (120000ms), however this was written in 1979 and timing issues have changed slightly since then. Today, FreeBSD’s default is 30000ms. This is sufficient for most conditions, but for stronger DoS protection you will want to lower this. I have set mine to 15000 or 15 seconds. This will work best for speeds up to 1Gbps. See Section 1.2 on TCP Reliability starting on Page 4 of RFC1323 for a good description of the importance of TCP MSL as it relates to link bandwidth and TCP sequence reuse or Twrap. If you are using Gig links, you should set this value shorter than 17 seconds or 17000 milliseconds to prevent TCP sequence reuse issues.
- It appears that the aggressive TCP keepalive timers below are not well liked by quite a few built-in applications. I have removed these adjustments and kept the defaults for the time being.
net.inet.tcp.keepidle sets the interval in milliseconds when the system will send a keepalive packet to test an idle connection to see if it is still active. I set this to 120000 or 120 seconds which is a fairly common interval. The default is 2 hours. net.inet.tcp.keepinit sets the keepalive probe interval in milliseconds during initiation of a TCP connection. I have set mine to the same as the regular interval which is 1500 or 1.5 seconds net.inet.tcp.keepintvl sets the interval, in milliseconds, between keepalive probes sent to remote machines. After TCPTV_KEEPCNT (default 8) probes are sent, with no response, the (TCP) connection is dropped. I have set this value to 1500 or 1.5 seconds.
- net.inet.tcp.delayed_ack controls the behavior when sending TCP acknowledgements. Allowing delayed ACKs can cause pauses at the tail end of data transfers and used to be a known problem for Macs. This was due to a known poor interaction with the Nagle algorithm in the TCP stack when dealing with slow start and congestion control. I previously had recommended disabling this feature by setting it to “0″. I have learned that Apple has updated the behavior of Delayed ACK, since the release of OSX 10.5 Leopard to support Greg Minshall’s “Proposed Modification to Nagle’s Algorithm“. I have now reverted this setting back to the default and enabled this feature in auto-detect mode by setting the value to “3″. This effectively enables the Nagle algorithm but prevents the unacknowledged runt packet problem causing an ACK deadlock which can unnecessarily pause transfers and cause significant delays. For your reference, following are the available options:
- delayed_ack=0 responds after every packet (OFF)
- delayed_ack=1 always employs delayed ack, 6 packets can get 1 ack
- delayed_ack=2 immediate ack after 2nd packet, 2 packets per ack (Compatibility Mode)
- delayed_ack=3 should auto detect when to employ delayed ack, 4 packets per ack. (DEFAULT)
- net.inet.tcp.slowstart_flightsize sets the number of outstanding packets permitted with non-local systems during the slowstart phase of TCP ramp up. In order to more quickly overcome TCP slowstart, I have bumped this up to a value of 20. This allows my system to use up to 10% of my bandwidth during TCP ramp up. I calculated this by figuring my Bandwidth-Delay Product and taking 10% of that value divided by the max MSS of 1460 bytes to get rough packet count. So, taking the line rate at 45Mbps or 45 x 106 x 50 milliseconds or 0.05 seconds / 8 bits per byte / 1448 bytes per packet, I came up with roughly 20 packets.
- net.inet.tcp.blackhole defines what happens when a TCP packet is received on a closed port. When set to ’1′, SYN packets arriving on a closed port will be dropped without a RST packet being sent back. When set to ’2′, all packets arriving on a closed port are dropped without an RST being sent back. This saves both CPU time because packets don’t need to be processed as much, and outbound bandwidth as packets are not sent out.
- net.inet.udp.blackhole is similar to net.inet.tcp.blackhole in its function. As the UDP protocol does not have states like TCP, there is only a need for one choice when it comes to dropping UDP packets. When net.inet.udp.blackhole is set to ’1′, all UDP packets arriving on a closed port will be dropped.
- The name ‘net.inet.icmp.icmplim‘ is somewhat misleading. This sysctl controls the maximum number of ICMP “Unreachable” and also TCP RST packets that will be sent back every second. It helps curb the effects of attacks which generate a lot of reply packets. I have set mine to a value of 50.