Snabb Reference Manual

Luke Gorrie

Max Rottenkolber

Nikolay Nikolaev

Hans Huebner

Alexander Gall

Javier Guerra

Diego Pino Garcia

Andy Wingo

Antonio Nikishaev

Alexander Altshuler

Rolf Sommerhalder

Pete Kazmier

Mikhail Nazarov

Marcel Wiget

Justin Cormack

Katerina Barone-Adesi

Felix Geißler

Cosmin Apreutesei

Carlos Alberto Lopez Perez

Jeff Loughridge

Kacper Wysocki

Domen Kožar

Diego Pino

Vladimir Fedin

Tomas Korcak

Tim Upthegrove

Tim LaBerge

Stevan Markovic

Simon Leinen

Kristian Larsson

Kristian Kielhofner

Jon Olsson

Jay Fenton

James Cunningham

Hui Xiang

Gernot Nusshall

Edward Hope-Morley

Anshul Makkar

Andy Chong

Alex Kordic

Alexander Spyridakis

Adrián Pérez de Castro

Version 5c4d5dc, Tue May 3 17:46:45 2016 +0200

Introduction
Snabb API
Basic Apps (apps.basic.basic_apps)
- Source
- Join
- Split
- Sink
- Tee
- Repeater
Intel 82599 Ethernet Controller Apps
Solarflare Ethernet Controller Apps
- Solarflare (apps.solarflare.solarflare)
  - Configuration
RateLimiter App (apps.rate_limiter.rate_limiter)
- Configuration
- Performance
PcapFilter App (apps.packet_filter.pcap_filter)
- Configuration
IPv6 Apps
- Nd_light (apps.ipv6.nd_light)
  - Configuration
- SimpleKeyedTunnel (apps.keyed_ipv6_tunnel.tunnel)
  - Configuration
VhostUser App (apps.vhost.vhost_user)
- Configuration
PcapReader and PcapWriter Apps (apps.pcap.pcap)
- Configuration
VPWS App (apps.vpn.vpws)
- Configuration
RawSocket App (apps.socket.raw)
- Configuration
Libraries

Note: This reference manual is a draft. The API defined in this document is not guaranteed to be stable or complete and future versions of Snabb will introduce backwards incompatible changes. With that being said, discrepancies between this document and the actual Snabb Switch implementation are considered to be bugs. Please report them in order to help improve this document.

Introduction

Snabb is an extensible, virtualized, Ethernet networking toolkit. With Snabb you can implement networking applications using the Lua language. Snabb includes all the tools you need to quickly realize your network designs and its really fast too! Furthermore, Snabb is extensible and encourages you to grow the ecosystem to match your requirements.

Architecture

The Snabb Core forms a runtime environment (engine) which executes your design. A design is simply a Lua script used to drive the Snabb stack, you can think of it as your top-level “main” routine.

In order to add functionality to the Snabb stack you can load modules into the Snabb engine. These can be Lua modules as well as native code objects. We differentiate between two classes of modules, namely libraries and Apps. Libraries are simple collections of program utilities to be used in your designs, apps or other libraries, just as you might expect. Apps, on the other hand, are code objects that implement a specific interface, which is used by the Snabb engine to organize an App Network.

Network

Usually, a Snabb design will create a series of apps, interconnect these in a desired way using links and finally pass the resulting app network on to the Snabb engine. The engine’s job is to:

Pump traffic through the app network
Keep the app network running (e.g. restart failed apps)
Report on the network status

Snabb API

The core modules defined below can be loaded using Lua’s require. For example:

local config = require("core.config")

local c = config.new()
...

App

An app is an isolated implementation of a specific networking function. For example, a switch, a router, or a packet filter.

Apps receive packets on input ports, perform some processing, and transmit packets on output ports. Each app has zero or more input and output ports. For example, a packet filter may have one input and one output port, while a packet recorder may have only an input port. Every app must implement the interface below. Methods which may be left unimplemented are marked as “optional”.

— Method myapp:new arg

Required. Create an instance of the app with a given argument arg. Myapp:new must return an instance of the app. The handling of arg is up to the app but it is encouraged to use core.config’s parse_app_arg to parse arg.

— Field myapp.input

— Field myapp.output

Tables of named input and output links. These tables are initialized by the engine for use in processing and are read-only.

— Method myapp:pull

Optional. Pull packets into the network.

For example: Pull packets from a network adapter into the app network by transmitting them to output ports.

— Method myapp:push

Optional. Push packets through the system.

For example: Move packets from input ports to output ports or to a network adapter.

— Method myapp:reconfig arg

Optional. Reconfigure the app with a new arg. If this method is not implemented the app instance is discarded and a new instance is created.

— Method myapp:report

Optional. Print a report of the current app status.

— Method myapp:stop

Optional. Stop the app and release associated external resources.

— Field myapp.zone

Optional. Name of the LuaJIT profiling zone used for this app (descriptive string). The default is the module name.

Config (core.config)

A config is a description of a packet-processing network. The network is a directed graph. Nodes in the graph are apps that each process packets in a specific way. Each app has a set of named input and output ports—often called rx and tx. Edges of the graph are unidirectional links that carry packets from an output port to an input port.

The config is a purely passive data structure. Creating and manipulating a config object does not immediately affect operation. The config has to be activated using engine.configure.

— Function config.new

Creates and returns a new empty configuration.

— Function config.app config, name, class, arg

Adds an app of class with arg to the config where it will be assigned to name.

Example:

config.app(c, "nic", Intel82599, {pciaddr = "0000:00:00.0"})

— Function config.link config, linkspec

Add a link defined by linkspec to the config config. Linkspec must be a string of the format

app_name1.output_port->app_name2.input_port

where app_name1 and app_name2 are names of apps in config and output_port and input_port are valid output and input ports of the referenced apps respectively.

Example:

config.link(c, "nic1.tx->nic2.rx")

Engine (core.app)

The engine executes a config by initializing apps, creating links, and driving the flow of execution. The engine also performs profiling and reporting functions. It can be reconfigured during runtime. Within Snabb Switch scripts the core.app module is bound to the global engine variable.

— Function engine.configure config

Configure the engine to use a new config config. You can safely call this method many times to incrementally update the running app network. The engine updates the app network as follows:

Apps that did not exist in the old configuration are started.
Apps that do not exist in the new configuration are stopped. (The app stop() method is called if defined.)
Apps with unchanged configurations are preserved.
Apps with changed configurations are updated by calling their reconfig() method. If the reconfig() method is not implemented then the old instance is stopped a new one started.
Links with unchanged endpoints are preserved.

— Function engine.main options

Run the Snabb engine. Options is a table of key/value pairs. The following keys are recognized:

duration - Duration in seconds to run the engine for (as a floating point number). If this is set you cannot supply done.
done - A function to be called repeatedly by engine.main until it returns true. Once it returns true the engine will be stopped and engine.main will return. If this is set you cannot supply duration.
report - A table which configures the report printed before engine.main() returns. The keys showlinks and showapps can be set to boolean values to force or suppress link and app reporting individually. By default `engine.main()’ will report on links but not on apps.
measure_latency - By default, the breathe() loop is instrumented to record the latency distribution of running the app graph. This information can be processed by the snabb top program. Passing measure_latency=false in the options will disable this instrumentation.
no_report - A boolean value. If true no final report will be printed.

— Function engine.now

Returns monotonic time in seconds as a floating point number. Suitable for timers.

— Variable engine.busywait

If set to true then the engine polls continuously for new packets to process. This consumes 100% CPU and makes processing latency less vulnerable to kernel scheduling behavior which can cause pauses of more than one millisecond.

Default: false

— Variable engine.Hz

Frequency at which to poll for new input packets. The default value is ‘false’ which means to adjust dynamically up to 100us during low traffic. The value can be overridden with a constant integer saying how many times per second to poll.

This setting is not used when engine.busywait is true.

Link (core.link)

A link is a ring buffer used to store packets between apps. Links can be treated either like arrays—accessing their internal structure directly—or as streams of packets by using their API functions.

— Function link.empty link

Predicate used to test if a link is empty. Returns true if link is empty and false otherwise.

— Function link.full link

Predicate used to test if a link is full. Returns true if link is full and false otherwise.

— Function link.receive link

Returns the next available packet (and advances the read cursor) on link. If the link is empty an error is signaled.

— Function link.front link

Return the next available packet without advancing the read cursor on link. If the link is empty, nil is returned.

— Function link.transmit link, packet

Transmits packet onto link. If the link is full packet is dropped (and the drop counter increased).

— Function link.stats link

Returns a structure holding ring statistics for the link:

txbytes, rxbytes: Counts of transferred bytes.
txpackets, rxpackets: Counts of transferred packets.
txdrop: Count of packets dropped due to ring overflow.

Packet (core.packet)

A packet is a data structure describing one of the network packets that is currently being processed. The packet is used to explicitly manage the life cycle of the packet. Packets are explicitly allocated and freed by using packet.allocate and packet.free. When a packet is received using link.receive its ownership is acquired by the calling app. The app must then ensure to either transfer the packet ownership to another app by calling link.transmit on the packet or free the packet using packet.free. Apps may only use packets they own, e.g. packets that have not been transmitted or freed. The number of allocatable packets is limited by the size of the underlying “freelist”, e.g. a pool of unused packet objects from and to which packets are allocated and freed.

— Function packet.allocate

Returns a new empty packet. An an error is raised if there are no packets left on the freelist.

— Function packet.free packet

Frees packet and puts in back onto the freelist.

— Function packet.data packet

Returns a pointer to the payload of packet.

— Function packet.length packet

Returns the payload length of packet.

— Function packet.clone packet

Returns an exact copy of packet.

— Function packet.append packet, pointer, length

Appends length bytes starting at pointer to the end of packet. An error is raised if there is not enough space in packet to accomodate length additional bytes.

— Function packet.prepend packet, pointer, length

Prepends length bytes starting at pointer to the front of packet. An error is raised if there is not enough space in packet to accomodate length additional bytes.

— Function packet.shiftleft packet, length

Truncates packet by length bytes from the front.

— Function packet.from_pointer pointer, length

Allocate packet and fill it with length bytes from pointer.

— Function packet.from_string string

Allocate packet and fill it with the contents of string.

Memory (core.memory)

Snabb allocates special DMA memory that can be accessed directly by network cards. The important characteristic of DMA memory is being located in contiguous physical memory at a stable address.

— Function memory.dma_alloc bytes

Returns a pointer to bytes of new DMA memory.

— Function memory.virtual_to_physical pointer

Returns the physical address (uint64_t) the DMA memory at pointer.

— Variable memory.huge_page_size

Size of a single huge page in bytes. Read-only.

Lib (core.lib)

The core.lib module contains miscellaneous utilities.

— Function lib.equal x, y

Predicate to test if x and y are structurally similar (isomorphic).

— Function lib.can_open filename, mode

Predicate to test if file at filename can be successfully opened with mode.

— Function lib.can_read filename

Predicate to test if file at filename can be successfully opened for reading.

— Function lib.can_write filename

Predicate to test if file at filename can be successfully opened for writing.

— Function lib.readcmd command, what

Runs Unix shell command and returns what of its output. What must be a valid argument to file:read.

— Function lib.readfile filename, what

Reads and returns what from file at filename. What must be a valid argument to file:read.

— Function lib.writefile filename, value

Writes value to file at filename using file:write. Returns the value returned by file:write.

— Function lib.readlink filename

Returns the true name of symbolic link at filename.

— Function lib.dirname filename

Returns the dirname(3) of filename.

— Function lib.basename filename

Returns the basename(3) of filename.

— Function lib.firstfile directory

Returns the filename of the first file in directory.

— Function lib.firstline filename

Returns the first line of file at filename as a string.

— Function lib.files_in_directory directory

Returns an array of filenames in directory.

— Function lib.load_string string

Evaluates and returns the value of the Lua expression in string.

— Function lib.load_conf filename

Evaluates and returns the value of the Lua expression in file at filename.

— Function lib.store_conf filename, value

Writes value to file at filename as a Lua expression. Supports tables, strings and everything that can be readably printed using print.

— Function lib.bits bitset, basevalue

Returns a bitmask using the values of bitset as indexes. The keys of bitset are ignored (and can be used as comments).

Example:

bits({RESET=0,ENABLE=4}, 123) => 1<<0 | 1<<4 | 123

— Function lib.bitset value, n

Predicate to test if bit number n of value is set.

— Function lib.bitfield size, struct, member, offset, nbits, value

Combined accesor and setter function for bit ranges of integers in cdata structs. Sets nbits (number of bits) starting from offset to value. If value is not given the current value is returned.

Size may be one of 8, 16 or 32 depending on the bit size of the integer being set or read.

Struct must be a pointer to a cdata object and member must be the literal name of a member of struct.

Example:

local struct_t = ffi.typeof[[struct { uint16_t flags; }]]
-- Assuming `s' is an instance of `struct_t', set bits 4-7 to 0xF:
lib.bitfield(16, s, 'flags', 4, 4, 0xf)
-- Get the value:
lib.bitfield(16, s, 'flags', 4, 4) -- => 0xF

— Function string:split pattern

Returns an iterator over the string split by pattern. Pattern must be a valid argument to string:gmatch.

Example:

for word, sep in ("foo!bar!baz"):split("(!)") do
    print(word, sep)
end

> foo   !
> bar   !
> baz   nil

— Function lib.hexdump string

Returns hexadecimal string for bytes in string.

— Function lib.hexundump hexstring

Returns byte string for hexstring.

— Function lib.comma_value n

Returns a string for decimal number n with magnitudes separated by commas. Example:

comma_value(1000000) => "1,000,000"

— Function lib.random_data length

Returns a string of length bytes of random data.

— Function lib.bounds_checked type, base, offset, size

Returns a table that acts as a bounds checked wrapper around a C array of type and size starting at base plus offset. Type must be a ctype and the caller must ensure that the allocated memory region at base/offset is at least sizeof(type)*size bytes long.

— Function lib.timer s

Returns a function that accepts no parameters and acts as a predicate to test if ns nanoseconds have elapsed.

— Function lib.waitfor condition

Blocks until the function condition returns a true value.

— Function lib.waitfor2 name, condition, attempts, interval

Repeatedly calls the function condition in interval (milliseconds). If condition returns a true value waitfor2 returns. If condition does not return a true value after attempts waitfor2 raises an error identified by name.

— Function lib.yesno flag

Returns the string "yes" if flag is a true value and "no" otherwise.

— Function lib.align value, size

Return the next integer that is a multiple of size starting from value.

— Function lib.csum pointer, length

Computes and returns the “IP checksum” length bytes starting at pointer.

— Function lib.update_csum pointer, length, checksum

Returns checksum updated by length bytes starting at pointer. The default of checksum is 0LL.

— Function lib.finish_csum checksum

Returns the finalized checksum.

— Function lib.malloc etype

Returns a pointer to newly allocated DMA memory for etype.

— Function lib.deepcopy object

Returns a copy of object. Supports tables as well as ctypes.

— Function lib.array_copy array

Returns a copy of array. Array must not be a “sparse array”.

— Function lib.htonl n

— Function lib.htons n

Host to network byte order conversion functions for 32 and 16 bit integers n respectively.

— Function lib.ntohl n

— Function lib.ntohs n

Network to host byte order conversion functions for 32 and 16 bit integers n respectively.

Main

Snabb designs can be run either with:

snabb <snabb-arg>* <design> <design-arg>*

#!/usr/bin/env snabb <snabb-arg>*
...

The main module provides an interface for running Snabb scripts. It exposes various operating system functions to scripts.

— Field main.parameters

A list of command-line arguments to the running script. Read-only.

— Function main.exit status

Cleanly exists the process with status.

Basic Apps (apps.basic.basic_apps)

The module apps.basic.basic_apps provides apps with general functionality for use in you app networks.

Source

The Source app is a synthetic packet generator. On each breath it fills each attached output link with new packets. The packet data is uninitialized garbage and each packet is 60 bytes long.

Source

Join

The Join app joins together packets from N input links onto one output link. On each breath it outputs as many packets as possible from the inputs onto the output.

Join

Split

The Split app splits packets from multiple inputs across multiple outputs. On each breath it transfers as many packets as possible from the input links to the output links.

Split

Sink

The Sink app receives all packets from any number of input links and discards them. This can be handy in combination with a Source.

Sink

Tee

The Tee app receives all packets from any number of input links and transfers each received packet to all output links. It can be used to merge and/or duplicate packet streams

Tee

Repeater

The Repeater app collects all packets received from the input link and repeatedly transfers the accumulated packets to the output link. The packets are transmitted in the order they were received.

Repeater

Intel 82599 Ethernet Controller Apps

Intel10G (apps.intel.intel_app)

The Intel10G drives one port of an Intel 82599 Ethernet controller. Packets taken from the rx port are transmitted onto the network. Packets received from the network are put on the tx port.

Intel10G

— Method Intel10G.dev:get_rxstats

Returns a table with the following keys:

counter_id - Counter id
packets - Number of packets received
dropped - Number of packets dropped
bytes - Total bytes received

— Method Intel10G.dev:get_txstats

Returns a table with the following keys:

counter_id - Counter id
packets - Number of packets sent
bytes - Total bytes sent

Configuration

The Intel10G app accepts a table as its configuration argument. The following keys are defined:

— Key pciaddr

Required. The PCI address of the NIC as a string.

— Key macaddr

Optional. The MAC address to use as a string. The default is a wild-card (e.g. accept all packets).

— Key vlan

Optional. A twelve bit integer (0-4095). If set, incoming packets from other VLANs are dropped and outgoing packets are tagged with a VLAN header.

— Key vmdq

Optional. Boolean, defaults to false. Enables interface virtualization. Allows to have multiple Intel10G apps per port. If enabled, macaddr must be specified.

— Key mirror

Optional. A table. If set, this app will receive copies of all selected packets on the physical port. The selection is configured by setting keys of the mirror table. Either mirror.pool or mirror.port may be set.

If mirror.pool is true all pools defined on this physical port are mirrored. If mirror.pool is an array of pool numbers then the specified pools are mirrored.

If mirror.port is one of “in”, “out” or “inout” all incoming and/or outgoing packets on the port are mirrored respectively. Note that this does not include internal traffic which does not enter or exit through the physical port.

— Key rxcounter

— Key txcounter

Optional. Four bit integers (0-15). If set, incoming/outgoing packets will be counted in the selected statistics counter respectively. Multiple apps can share a counter. To retrieve counter statistics use Intel10G.dev:get_rxstats() and Intel10G.dev:get_txstats().

— Key rate_limit

Optional. Number. Limits the maximum Mbit/s to transmit. Default is 0 which means no limit. Only applies to outgoing traffic.

— Key priority

Optional. Floating point number. Weight for the round-robin algorithm used to arbitrate transmission when rate_limit is not set or adds up to more than the line rate of the physical port. Default is 1.0 (scaled to the geometric middle of the scale which goes from 1/128 to 128). The absolute value is not relevant, instead only the ratio between competing apps controls their respective bandwidths. Only applies to outgoing traffic.

For example, if two apps without rate_limit set have the same priority, both get the same output bandwidth. If the priorities are 3.0/1.0, the output bandwidth is split 75%/25%. Likewise, 1.0/0.333 or 1.5/0.5 yield the same result.

Note that even a low-priority app can use the whole line rate unless other (higher priority) apps are using up the available bandwidth.

Performance

The Intel10G app can transmit and receive at approximately 10 Mpps per processor core.

Hardware limits

Each physical Intel 82599 port supports the use of up to:

64 pools (virtualized Intel10G app instances)
127 MAC addresses (see the macaddr configuration option)
64 VLANs (see the vlan configuration option)
4 mirror pools (see the mirror configuration option)

Intel1G (apps.intel.intel1g.intel1g)

The intel1g app drives one port of an Intel Gigabit Ethernet controller.

Hardware support: - Intel i210, i350 (in progress)

Features: - Optionally attach to a pre-initialized NIC. - Optionally use specific TX and RX queue numbers (or none). - Configuration and statistics registers are mirrored to shared memory objects (NYI). - Receive and transmit links are optional and can have any name.

Intel1g

Configuration

— Key pciaddr

Required. The PCI address of the NIC as a string.

— Key attach

Optional. True means attach to a transmit and/or receive queue of an already-initialized NIC.

— Key txq

Optional. Transmit queue number to use. false means no transmit function. Default: 0.

— Key rxq

Optional. Receive queue number to use. false means no receive function. Default: 0.

— Key ndesc

Optional. Number of DMA descriptors to use i.e. size of the DMA transmit and receive queues. Must be a multiple of 128. Default is not specified but assumed to be broadly applicable.

— Key rxburst

Optional. Maximum number of packets to receive on one breath. Default is not specified but assumed to be broadly applicable.

+— Key loopback Optional. Set to "MAC" for MAC loopback, or to "PHY" for PHY loopback modes.

LoadGen (apps.intel.loadgen)

LoadGen is a load generator app based on the Intel 82599 Ethernet controller. It reads up to 32,000 packets from the input port and transmits them repeatedly onto the network. All incoming packets are dropped.

LoadGen

Configuration

The LoadGen app accepts a string as its configuration argument. The given string denotes the PCI address of the NIC to use.

Performance

The LoadGen app can transmit at line-rate (14 Mpps) without significant CPU usage.

Solarflare Ethernet Controller Apps

Solarflare (apps.solarflare.solarflare)

The Solarflare app drives one port of a Solarflare SFN7 Ethernet controller. Multiple instances of the Solarflare app can be instantiated on the same PCI device. Packets received from the network will be dispatched between apps based on destination MAC address and VLAN. Packets taken from the rx port are transmitted onto the network. Packets received from the network are put on the tx port.

Solarflare

The Solarflare app requires OpenOnload version 201502 to be installed and the sfc module to be loaded.

Configuration

The Solarflare app accepts a table as its configuration argument. The following keys are defined:

— Key pciaddr

Required. The PCI address of the NIC as a string.

— Key macaddr

Optional. The MAC address to use as a string. The default is a wild-card (e.g. accept all packets).

— Key vlan

Optional. A twelve bit integer (0-4095). If set, incoming packets from other VLANs are dropped and outgoing packets are tagged with a VLAN header.

RateLimiter App (apps.rate_limiter.rate_limiter)

The RateLimiter app implements a Token bucket algorithm with a single bucket dropping non-conforming packets. It receives packets on the input port and transmits conforming packets to the output port.

RateLimiter

— Method RateLimiter:snapshot

Returns throughput statistics in form of a table with the following fields:

rx - Number of packets received
tx - Number of packets transmitted
time - Current time in nanoseconds

Configuration

The RateLimiter app accepts a table as its configuration argument. The following keys are defined:

— Key rate

Required. Rate in bytes per second to which throughput should be limited.

— Key bucket_capacity

Required. Bucket capacity in bytes. Should be equal or greater than rate. Otherwise the effective rate may be limted.

— Key initial_capacity

Optional. Initial bucket capacity in bytes. Defaults to bucket_capacity.

Performance

The RateLimiter app is able to process more than 20 Mpps per CPU core. Refer to its selftest for details.

PcapFilter App (apps.packet_filter.pcap_filter)

The PcapFilter app receives packets on the input port and transmits conforming packets to the output port. In order to conform, a packet must match the pcap-filter expression of the PcapFilter instance and/or belong to a sanctioned connection. For a connection to be sanctioned it must be tracked in a state table by a PcapFilter app using the same state table. All PcapFilter apps share a global namespace of state table identifiers. Multiple PcapFilter apps—e.g. for inbound and outbound traffic—can refer to the same connection by sharing a state table identifer.

PcapFilter

Configuration

The PcapFilter app accepts a table as its configuration argument. The following keys are available:

— Key filter

Required. A string containing a pcap-filter expression.

— Key state_table

Optional. A string naming a state table. If set, packets passing any rule will be tracked in the specified state table and any packet that belongs to a tracked connection in the specified state table will be let pass.

IPv6 Apps

Nd_light (apps.ipv6.nd_light)

The nd_light app implements a small subset of IPv6 neighbor discovery (RFC4861). It has two duplex ports, north and south. The south port attaches to a network on which neighbor discovery (ND) must be performed. The north port attaches to an app that processes IPv6 packets (including full ethernet frames). Packets transmitted to the north port must be wrapped in full Ethernet frames (which may be empty).

The nd_light app replies to neighbor solicitations for which it is configured as a target and performs rudimentary address resolution for its configured next-hop address. If address resolution succeeds, the Ethernet headers of packets from the north port will be overwritten with headers containing the discovered destination address and the configured source address before they are transmitted over the south port. All packets from the north port are discarded as long as ND has not yet succeeded. Packets received from the south port are transmitted to the north port unaltered.

nd_light

Configuration

The nd_light app accepts a table as its configuration argument. The following keys are defined:

— Key local_mac

Required. Local MAC address as a string or in binary representation.

— Key local_ip

Required. Local IPv6 address as a string or in binary representation.

— Key next_hop

Required. IPv6 address of next hop as a string or in binary representation.

— Key delay

Optional. Neighbor solicitation retransmission delay in milliseconds. Default is 1,000ms.

— Key retrans

Optional. Number of neighbor solicitation retransmissions. Default is unlimited retransmissions.

SimpleKeyedTunnel (apps.keyed_ipv6_tunnel.tunnel)

The SimpleKeyedTunnel app implements “a simple L2 Ethernet over IPv6 tunnel encapsulation” as described in Keyed IPv6 Tunnel. It has two duplex ports, encapsulated and decapsulated. Packets transmitted on the decapsulated input port will be encapsulated and put on the encapsulated output port. Packets transmitted on the encapsulated input port will be decapsulated and put on the decapsulated output port.

SimpleKeyedTunnel

Configuration

The SimpleKeyedTunnel app accepts a table as its configuration argument. The following keys are defined:

— Key local_address

Required. Local IPv6 address as a string.

— Key remote_address

Required. Remote IPv6 address as a string.

— Key local_cookie

Required. Local cookie, 8 bytes encoded in a hexadecimal string.

— Key remote_cookie

Required. Remote cookie, 8 bytes encoded in a hexadecimal string.

— Key local_session

Optional. Unsigned integer, 32 bit. If set, the session_id field of the L2TPv3 header will be overwritten with this value.

— Key hop_limit

Optional. Unsigned integer. Sets the hop limit. Default is 64.

— Key default_gateway_MAC

Optional. Destination MAC as a string. Not required if overwritten by an app such as nd_light.

VhostUser App (apps.vhost.vhost_user)

The VhostUser app implements portions of the Virtio protocol for virtual ethernet I/O interfaces. In particular, VhostUser supports the virtio vring data structure for packet I/O in shared memory (DMA) and the Linux vhost API for creating vrings attached to tuntap devices.

With VhostUser SnabbSwitch can be used as a virtual ethernet interface by QEMU virtual machines. When connected via a UNIX socket, packets can be sent to the virtual machine by transmitting them on the rx port and packets send by the virtual machine will arrive on the tx port.

VhostUser

Configuration

The VhostUser app accepts a table as its configuration argument. The following keys are defined:

— Key socket_path

Optional. A string denoting the path to the UNIX socket to connect on. Unless given all incoming packets will be dropped.

— Key is_server

Optional. Listen and accept an incoming connection on socket_path instead of connecting to it.

PcapReader and PcapWriter Apps (apps.pcap.pcap)

The PcapReader and PcapWriter apps can be used to inject and log raw packet data into and out of the app network using the Libpcap File Format. PcapReaderreads raw packets from a PCAP file and transmits them on its output port while PcapWriter writes packets received on its input port to a PCAP file.

PcapReader

Configuration

Both PcapReader and PcapWriter expect a filename string as their configuration arguments to read from and write to respectively.

VPWS App (apps.vpn.vpws)

The VPWS app implements a so-called Virtual Private Wire Service (VPWS). It provides a L2 VPN on top of IP (v4/v6) and GRE.

This app has two duplex ports, customer and uplink. The former transports Ethernet frames while the latter transports Ethernet frames encapsulated in IP/GRE. Packets transmitted on the customer input port are encapsulated and put on the uplink output port. Packets transmitted on the uplink input port are decapsulated and put on the customer output port.

VPWS

Configuration

The vpws app accepts a table as its configuration argument. The following keys are defined:

— Key local_vpn_ip

Required. Local VPN IPv6 address as a string or in binary representation.

— Key remote_vpn_ip

Required. Remote VPN IPv6 address as a string or in binary representation.

— Key local_mac

Required. Local MAC address as a string or in binary representation.

— Key remote_mac

Optional. Remote MAC address as a string or in binary representation. Default is 02:00:00:00:00:00.

If remote_mac is not supplied, the uplink port needs to be connected to something that performs address resolution and overwrites the Ethernet header (e.g. the nd_light app).

— Key checksum

Optional. Enables GRE checksumming if set to a true value.

— Key label

Optional. A GRE key (32 bit unsigned integer). Enables GRE keying if supplied.

RawSocket App (apps.socket.raw)

The RawSocket app is a bridge between Linux network interfaces (eth0, lo, etc.) and a Snabb app network. Packets taken from the rx port are transmitted over the selected interface. Packets received on the interface are put on the tx port.

RawSocket

Configuration

The RawSocket app accepts a string as its configuration argument. The string denotes the interface to bridge to.

Libraries

IP checksum (lib.checksum)

The checksum module provides an optimized ones-complement checksum routine.

— Function ipsum pointer length initial

Return the ones-complement checksum for the given region of memory.

pointer is a pointer to an array of data to be checksummed. initial is an unsigned 16-bit number in host byte order which is used as the starting value of the accumulator. The result is the IP checksum over the data in host byte order.

The initial argument can be used to verify a checksum or to calculate the checksum in an incremental manner over chunks of memory. The synopsis to check whether the checksum over a block of data is equal to a given value is the following

if ipsum(pointer, length, value) == 0 then
  -- checksum correct
else
  -- checksum incorrect
end

To chain the calculation of checksums over multiple blocks of data together to obtain the overall checksum, one needs to pass the one’s complement of the checksum of one block as initial value to the call of ipsum() for the following block, e.g.

local sum1 = ipsum(data1, length1, 0)
local total_sum = ipsum(data2, length2, bit.bnot(sum1))

This function takes advantage of SIMD hardware when available.

Ctable (lib.ctable)

A ctable is a hash table whose keys and values are instances of FFI data types. In Lua parlance, an FFI value is a “cdata” value, hence the name “ctable”.

A ctable is parameterized for the specific types for its keys and values. This allows for the table to be stored in an efficient manner. Adding an entry to a ctable will copy the value into the table. Logically, the table “owns” the value. Lookup can either return a pointer to the value in the table, or copy the value into a user-supplied buffer, depending on what is most convenient for the user.

As an implementation detail, the table is stored as an open-addressed robin-hood hash table with linear probing. This means that to look up a key in the table, we take its hash value (using a user-supplied hash function), map that hash value to an index into the table by scaling the hash to the table size, and then scan forward in the table until we find an entry whose hash value is greater than or equal to the hash in question. Each entry stores its hash value, and empty entries have a hash of 0xFFFFFFFF. If the entry’s hash matches and the entry’s key is equal to the one we are looking for, then we have our match. If the entry’s hash is greater than our hash, then we have a failure. Hash collisions are possible as well of course; in that case we continue scanning forward.

The distance travelled while scanning for the matching hash is known as the displacement. The table measures its maximum displacement, for a number of purposes, but you might be interested to know that a maximum displacement for a table with 2 million entries and a 40% load factor is around 8 or 9. Smaller tables will have smaller maximum displacements.

The ctable has two lookup interfaces. One will perform the lookup as described above, scanning through the hash table in place. The other will fetch all entries within the maximum displacement into a buffer, then do a branchless binary search over that buffer. This second streaming lookup can also fetch entries for multiple keys in one go. This can amortize the cost of a round-trip to RAM, in the case where you expect to miss cache for every lookup.

To create a ctable, first create a parameters table specifying the key and value types, along with any other options. Then call ctable.new on those parameters. For example:

local ctable = require('lib.ctable')
local ffi = require('ffi')
local params = {
   key_type = ffi.typeof('uint32_t'),
   value_type = ffi.typeof('int32_t[6]'),
   hash_fn = ctable.hash_i32,
   max_occupancy_rate = 0.4,
   initial_size = math.ceil(occupancy / 0.4)
}
local ctab = ctable.new(params)

— Function ctable.new parameters

Create a new ctable. parameters is a table of key/value pairs. The following keys are required:

key_type: An FFI type (LuaJIT “ctype”) for keys in this table.
value_type: An FFI type (LuaJT “ctype”) for values in this table.
hash_fn: A function that takes a key and returns a hash value.

Hash values are unsigned 32-bit integers in the range [0, 0xFFFFFFFF). That is to say, 0xFFFFFFFF is the only unsigned 32-bit integer that is not a valid hash value. The hash_fn must return a hash value in the correct range.

Optional entries that may be present in the parameters table include:

initial_size: The initial size of the hash table, including free space. Defaults to 8 slots.
max_occupancy_rate: The maximum ratio of occupancy/size, where occupancy denotes the number of entries in the table, and size is the total table size including free entries. Trying to add an entry to a “full” table will cause the table to grow in size by a factor of

Defaults to 0.9, for a 90% maximum occupancy ratio.

min_occupancy_rate: Minimum ratio of occupancy/size. Removing an entry from an “empty” table will shrink the table.

Methods

Users interact with a ctable through methods. In these method descriptions, the object on the left-hand-side of the method invocation should be a ctable.

— Method :resize size

Resize the ctable to have size total entries, including empty space.

— Method :insert hash, key, value, updates_allowed

An internal helper method that does the bulk of updates to hash table. hash is the hash of key. This method takes the hash as an explicit parameter because it is used when resizing the table, and that way we avoid calling the hash function in that case. key and value are FFI values for the key and the value, of course.

updates_allowed is an optional parameter. If not present or false, then the :insert method will raise an error if the key is already present in the table. If updates_allowed is the string "required", then an error will be raised if key is not already in the table. Any other true value allows updates but does not require them. An update will replace the existing entry in the table.

Returns the index of the inserted entry.

— Method :add key, value, updates_allowed

Add an entry to the ctable, returning the index of the added entry. See the documentation for :insert for a description of the parameters.

— Method :update key, value

Update the entry in a ctable with the key key to have the new value value. Throw an error if key is not present in the table.

— Method :lookup_ptr key

Look up key in the table, and if found return a pointer to the entry. Return nil if the value is not found.

An entry pointer has three fields: the hash value, which must not be modified; the key itself; and the value. Access them as usual in Lua:

local ptr = ctab:lookup(key)
if ptr then print(ptr.value) end

Note that pointers are only valid until the next modification of a table.

— Method :lookup_and_copy key, entry

Look up key in the table, and if found, copy that entry into entry and return true. Otherwise return false.

— Method :remove_ptr entry

Remove an entry from a ctable. entry should be a pointer that points into the table. Note that pointers are only valid until the next modification of a table.

— Method :remove key, missing_allowed

Remove an entry from a ctable, keyed by key.

Return true if we actually do find a value and remove it. Otherwise if no entry is found in the table and missing_allowed is true, then return false. Otherwise raise an error.

— Method :selfcheck

Run an expensive internal diagnostic to verify that the table’s internal invariants are fulfilled.

— Method :dump

Print out the entries in a table. Can be expensive if the table is large.

— Method :iterate

Return an iterator for use by for in. For example:

for entry in ctab:iterate() do
   print(entry.key, entry.value)
end

Hash functions

Any hash function will do, as long as it produces values in the [0, 0xFFFFFFFF) range. In practice we include some functions for hashing byte sequences of some common small lengths.

— Function ctable.hash_32 number

Hash a 32-bit integer. As a hash_fn parameter, this will only work if your key type’s Lua representation is a Lua number. For example, use hash_32 on ffi.typeof('uint32_t'), but use hashv_32 on ffi.typeof('uint8_t[4]').

— Function ctable.hashv_32 ptr

Hash the first 32 bits of a byte sequence.

— Function ctable.hashv_48 ptr

Hash the first 48 bits of a byte sequence.

— Function ctable.hashv_64 ptr

Hash the first 64 bits of a byte sequence.

PMU (lib.pmu)

The CPU’s PMU (Performance Monitoring Unit) collects information about specific performance events such as cache misses, branch mispredictions, and utilization of internal CPU resources like execution units. This module provides an API for counting events with the PMU.

Hundreds of low-level counters are available. The exact list depends on CPU model. See pmu_cpu.lua for our definitions.

High-level interface

— Function is_available

If the PMU hardware is available then return true. Otherwise return two values: false and a string briefly explaining why. (Cooperation from the Linux kernel is required to acess the PMU.)

— Function profile function [event_list] [aux]

Call function, return the result, and print a human-readable report of the performance events that were counted during execution.

— Function measure function [event_list]

Call function and return two values: the result and a table of performance event counter tallies.

Low-level interface

— Function setup event_list

Setup the hardware performance counters to track a given list of events (in addition to the built-in fixed-function counters).

Each event is a Lua string pattern. This could be a full event name:

mem_load_uops_retired.l1_hit

or a more general pattern that matches several counters:

mem_load.*l._hit

Return the number of overflowed counters that could not be tracked due to hardware constraints. These will be the last counters in the list.

Example:

setup({"uops_issued.any",
       "uops_retired.all",
       "br_inst_retired.conditional",
       "br_misp_retired.all_branches"}) => 0

— Function new_counter_set

Return a counter_set object that can be used for accumulating events. The counter_set will be valid only until the next call to setup().

— Function switch_to counter_set

Switch to a new set of counters to accumulate events in. Has the side-effect of committing the current accumulators to the previous record.

If counter_set is nil then do not accumulate events.

— Function to_table counter_set

Return a table containing the values accumulated in counter_set.

Example:

to_table(cs) =>
  {
   -- Fixed-function counters
   instructions                 = 133973703,
   cycles                       = 663011188,
   ref-cycles                   = 664029720,
   -- General purpose counters selected with setup()
   uops_issued.any              = 106860997,
   uops_retired.all             = 106844204,
   br_inst_retired.conditional  =  26702830,
   br_misp_retired.all_branches =       419
  }

— Function report counter_set [aux]

Print a textual report on the values accumulated in a counter set. Optionally include auxiliary application-level counters. The ratio of each event to each auxiliary counter is also reported.

Example:

report(my_counter_set, {packet = 26700000, breath = 208593})

prints output approximately like:

EVENT                                   TOTAL     /packet     /breath
instructions                      133,973,703       5.000     642.000
cycles                            663,011,188      24.000    3178.000
ref-cycles                        664,029,720      24.000    3183.000
uops_issued.any                   106,860,997       4.000     512.000
uops_retired.all                  106,844,204       4.000     512.000
br_inst_retired.conditional        26,702,830       1.000     128.000
br_misp_retired.all_branches              419       0.000       0.000
packet                             26,700,000       1.000     128.000
breath                                208,593       0.008       1.000

Hardware

PCI (lib.hardware.pci)

The lib.hardware.pci module provides functions that abstract common operations on PCI devices on Linux. In order to drive a PCI device using Direct memory access (DMA) one must:

Unbind the PCI device using pci.unbind_device_from_linux.
Enable PCI bus mastering for device using pci.set_bus_master in order to enable DMA.
Memory map PCI device configuration space using pci.map_pci_memory.
Control the PCI device by manipulating the memory referenced by the pointer returned by pci.map_pci_memory.
Disable PCI bus master for device using pci.set_bus_master.
Unmap PCI device configuration space using pci.close_pci_resource.

The correct ordering of these steps is absolutely critical.

— Variable pci.devices

An array of supported hardware devices. Must be populated by calling pci.scan_devices. Each entry is a table as returned by pci.device_info.

— Function pci.canonical pciaddress

Returns the canonical representation of a PCI address. The canonical representation is preferred internally in Snabb and for presenting to users. It shortens addresses with leading zeros like this: 0000:01:00.0 becomes 01:00.0.

— Function pci.qualified pciaddress

Returns the fully qualified representation of a PCI address. Fully qualified addresses have the form 0000:01:00.0 and so this function undoes any abbreviation in the canonical representation.

— Function pci.scan_devices

Scans for available PCI devices and populates the pci.devices table.

— Function pci.device_info pciaddress

Returns a table containing information about the PCI device by pciaddress. The table has the following keys:

pciaddress—String denoting the PCI address of the device. E.g. "0000:83:00.1".
vendor—Identification string e.g. "0x8086" for Intel.
device—Identification string e.g. "0x10fb" for 82599 chip.
interface—Name of Linux interface using this device e.g. "eth0".
status—String denoting the Linux operational status, or nil if not known.
driver—String denoting the Lua module that supports this hardware e.g. "apps.intel.intel10g".
usable—String denoting if the device was suitable to use when scanned. One of "yes" or "no".

— Function pci.which_driver vendor, model

Returns the module name for a suitable device driver (if available) for a device of model from vendor.

— Function pci.unbind_device_from_linux pciaddress

Forces Linux to unbind the device identified by pciaddress from any kernel drivers.

— Function pci.set_bus_master pciaddress, enable

Enables or disables PCI bus mastering for device identified by pciaddress depending on whether enable is a true or a false value. PCI bus mastering must be enabled in order to perform DMA on the PCI device.

— Function pci.map_pci_memory pciaddress, n

Memory maps configuration space n of PCI device identified by pciaddress. Returns a pointer to the memory mapped region and a file descriptor of the opened sysfs resource file. PCI bus mastering must be enabled on device identified by pciaddress before calling his function.

— Function pci.close_pci_resource file_descriptor, pointer

Closes memory mapped file_descriptor of sysfs resource file and unmaps it from pointer as returned by pci.map_pci_memory.

Register (lib.hardware.register)

The lib.hardware.register module provides an abstraction for hardware device registers. This abstraction can be used to declaratively specify and conveniently manipulate structured memory regions via DMA. The functions register.define and register.define_array construct Register objects based on a register description string. The resulting Register objects can be used to manipulate the defined registers using the methods Register:read, Register:write, Register:set, Register:clr, Register:wait and Register:reset (exact set depends on the register mode).

A register description is a string with one Register object definition per line. A Register object definition must be expressed using the following grammar:

Register   ::= Name Offset Indexing Mode Longname
Name       ::= <identifier>
Indexing   ::= "-"
           ::= "+" OffsetStep "*" Min ".." Max
Mode       ::= "RO" | "RW" | "RC"
Longname   ::= <string>
Offset ::= OffsetStep ::= Min ::= Max ::= <number>

A Register object definition is made up of the following properties:

Name—A string to be used to refer to the Register object. Must be a valid Lua identifier, e.g. "foo", "foo_bar", "FOO" etc.
Offset—Integer specifying the offset from the base pointer (as supplied to register.define and register.define_array).
Indexing—Optional. Three integers specifying the offset step as well as minimum and maximum indexes in bytes.
Mode—One of "RO", "RW" or "RC" standing for read-only, read-write and counter modes respectively. Counter mode is for counter registers that clear back to zero when read.
Longname—A string describing the register (used for self-documentation).

For instance, the following Register object definition defines a register range “TXDCTL” in read-write mode starting at offset 0x06028 with 128 registers each of length 0x40.

TXDCTL 0x06028 +0x40*0..127 RW Transmit Descriptor Control

The next example defines a singular register “TPT” in counter mode located at offset 0x01428.

TPT 0x01428 - RC Total Packets Transmitted

— Function register.define description, table, base_pointer, n

Creates Register objects for description relative to base_pointer. The resulting Register objects will become a named entries in table using the names defined in description. If an entry in description defines an indexing range then n specifies the index of the register within that range. N defaults to 0.

— Function register.define_array description, table, base_pointer

Creates Register objects for description relative to base_pointer. The resulting Register objects will become a named entries in table using the names defined in description. If an entry in description defines an indexing range, an array of Register objects will be created instead of a singular Register object.

— Function register.dump table

Prints a pretty-printed register dump of a table of registers.

— Method Register:read

Returns the value of register. For convenience register objects can be called without arguments instead of calling Register:read. E.g. reg:read() is equivalent to reg().

— Method Register:write value

Sets the value of register to value. Only available on registers in read-write mode. For convenience register objects can be called with an argument instead of calling Register:write. E.g. reg:write(value) is equivalent to reg(value).

If register is in counter mode it is assumed that the register will be reset to zero upon reading. The read value is added to a register accumulator and the sum of all reads is returned.

— Method Register:set bitmask

Sets bits of register according to bitmask. Only available on registers in read-write mode.

— Method Register:clr bitmask

Clears bits of register according to bitmask. Only available on registers in read-write mode.

— Method Register:wait bitmask, value

Blocks until applying bitmask to the register equals value. If value is not supplied blocks until all bits in the mask are set instead. Only available on registers in read-write and read-only modes.

— Method Register:reset

Reset the register accumulator to 0. Only available on registers in counter mode.

— Method Register:print

Prints the register state to standard output.

Protocols

Protocol Header (lib.protocol.header)

The lib.protocol.header module contains the base class from which the supported protocol classes are derived. It defines generic methods on all protocol subclasses.

— Method header:new_from_mem memory, length

Creates and returns a header object by “overlaying” the respective header structure over length bytes of memory.

— Method header:header

Returns the raw header as a cdata object.

— Method header:sizeof

Returns the byte size of header.

— Method header:eq header

Generic equality predicate. Returns true if header is equal to self and false otherwise.

— Method header:copy destination, relocate

Copies the header to destination. The caller must ensure that there is enough space at destination. If relocate is a true value, destination is promoted to be the active storage for the header.

— Method header:clone

Returns a copy of the header object.

— Method header:upper_layer

Returns the protocol class that can handle the “upper layer protocol” or nil if the protocol is not supported or the protocol has no upper layer.

For instance, on an Ethernet header object this method might return a IPv4 or IPv6 header class.

Ethernet (lib.protocol.ethernet)

The lib.protocol.ethernet module contains a class for representing Ethernet headers. The ethernet protocol class supports two upper layer protocols: lib.protocol.ipv4 and lib.protocol.ipv6.

— Method ethernet:new config

Returns a new Ethernet header for config. Config must a be a table which may contain the following keys:

dst - Destination MAC (binary representation). Default is 00:00:00:00:00:00.
src - Source MAC (binary representation). Default is 00:00:00:00:00:00.
type - Either 0x0800 or 0x86dd for IPv4/6 individually. Default is 0x0.

— Method ethernet:src mac

— Method ethernet:dst mac

— Method ethernet:type type

Combined accessor and setter methods. These methods set the values of the source, destination and type fields of an Ethernet header. If no argument is given the current value is returned.

Example:

local eth = ethernet:new({src = ethernet:pton("00:00:00:00"),
                          dst = ethernet:pton("00:00:00:00"),
                          type = 0x86dd})
eth:dst(ethernet:pton("54:52:00:01"))
ethernet:ntop(eth:dst()) => "54:52:00:01"

— Method ethernet:src_eq mac

— Method ethernet:dst_eq mac

Predicate methods to test if mac is equal to the source or destination addresses individually.

— Method ethernet:swap

Swaps the values of the source and destination fields.

— Function ethernet:pton string

Returns the binary representation of MAC address denoted by string.

— Function ethernet:ntop mac

Returns the string representation of mac address.

— Function ethernet:ipv6_mcast ip

Returns the MAC address for IPv6 multicast ip as defined by RFC2464, section 7.

IPv4 (lib.protocol.ipv4)

The lib.protocol.ipv4 module contains a class for representing IPv4 headers. The ipv4 protocol class supports four upper layer protocols: lib.protocol.tcp, lib.protocol.udp, lib.protocol.gre and lib.protocol.icmp.header.

— Method ipv4:new config

Returns a new IPv4 header for config. Config must a be a table which may contain the following keys:

dst - Destination IPv4 address (binary representation). Default is 0.0.0.0.
src - Source IPv4 address (binary representation). Default is 0.0.0.0.
protocol - The upper layer protocol, can be 6 (TCP), 17 (UDP), 47 (GRE) or 58 (ICMP). Default is 255.
dscp - “Differentiated Services Code Point” field (6 bit unsigned integer). Default is 0.
ecn - “Explicit Congestion Notification” field (2 bit unsigned integer). Default is 0.
id - “Identification” field (16 bit unsigned integer). Default is 0.
flags - “Don’t Fragment (DF)” and “More Fragments (MF)” fields (3 bit unsigned integer). Default is 0.
frag_off - “Fragment Offset” field (13 bit unsigned integer). Default is 0.
ttl - “Time To Live” field (8 bit unsigned integer). Default is 0.

— Method ipv4:dst ip

— Method ipv4:src ip

— Method ipv4:protocol protocol

— Method ipv4:dscp dscp

— Method ipv4:ecn ecn

— Method ipv4:id id

— Method ipv4:flags flags

— Method ipv4:frag_off frag_off

— Method ipv4:ttl ttl

Combined accessor and setter methods. These methods set the values of the instance fields (see new) of an IPv4 header. If no argument is given the current value is returned.

— Method ipv4:version version

Combined accessor and setter method for the “Version” field (4 bit unsigned integer). Defaults to 4 (set automatically by new). Sets the “Version” field to version. If no argument is given the current value is returned.

— Method ipv4:ihl ihl

Combined accessor and setter method for the “Internet Header Length” field (4 bit unsigned integer). Set automatically by new. Sets the “Internet Header Length” field to ihl. If no argument is given the current value is returned.

— Method ipv4:total_length length

Combined accessor and setter method for the “Total Length” field (16 bit unsigned integer). Defaults to header length (set automatically by new). Sets the “Total Length” field to length. If no argument is given the current value is returned.

— Method ipv4:checksum

Computes and sets the IPv4 header checksum. Its called automatically by new but must be called after the header is changed.

— Method ipv4:dst_eq ip

— Method ipv4:src_eq ip

Predicate methods to test if ip is equal to the source or destination addresses individually.

— Function ipv4:pton string

Returns the binary representation of IPv4 address denoted by string.

— Function ipv4:ntop ip

Returns the string representation of ip address.

IPv6 (lib.protocol.ipv6)

The lib.protocol.ipv6 module contains a class for representing IPv6 headers. The ipv6 protocol class supports four upper layer protocols: lib.protocol.tcp, lib.protocol.udp, lib.protocol.gre and lib.protocol.icmp.header.

— Method ipv6:new config

Returns a new IPv6 header for config. Config must a be a table which may contain the following keys:

dst - Destination IPv6 address (binary representation). Default is 0::0.
src - Source IPv6 address (binary representation). Default is 0::0.
traffic_class - “Traffic Class” field (8 bit unsigned integer). Default is 0.
flow_label - “Flow Label” field (20 bit unsigned integer). Default is 0.
next_header - “Next Header” field (8 bit unsigned integer). Default is 0.
hop_limit - “Hop Limit” field (8 bit unsigned integer). Default is 0.

— Method ipv6:dst ip

— Method ipv6:src ip

— Method ipv6:traffic_class traffic_class

— Method ipv6:flow_label flow_label

— Method ipv6:next_header next_header

— Method ipv6:hop_limit hop_limit

Combined accessor and setter methods. These methods set the values of the instance fields (see new) of an IPv6 header. If no argument is given the current value is returned.

— Method ipv6:version version

Combined accessor and setter method for the version field (4 bit unsigned integer). Defaults to 4 (set automatically by new). Sets the “Version” field to version. If no argument is given the current value is returned.

— Method ipv6:dscp dscp

Combined accessor and setter method for the “Differentiated Services Code Point” field (6 bit unsigned integer). Default is 0. This is a sub-field of the “Traffic Class” field. Sets the “Differentiated Services Code Point” field to dscp. If no argument is given the current value is returned.

— Method ipv6:ecn ecn

Combined accessor and setter method for the “Explicit Congestion Notification” (2 bit unsigned integer). Default is 0. This is a sub-field of the “Traffic Class” field. Sets the “Explicit Congestion Notification” field to ecn. If no argument is given the current value is returned.

— Method ipv6:payload_length length

Combined accessor and setter method for the “Payload Length” field (16 bit unsigned integer). Default is 0. Sets the “Payload Length” field to length. If no argument is given the current value is returned.

— Method ipv6:dst_eq ip

— Method ipv6:src_eq ip

Predicate methods to test if ip is equal to the source or destination addresses individually.

— Function ipv6:pton string

Returns the binary representation of IPv6 address denoted by string.

— Function ipv6:ntop ip

Returns the string representation of ip address.

— Function ipv6:solicited_node_mcast ip

Returns the solicited-node multicast address from the given unicast ip.

TCP (lib.protocol.tcp)

The lib.protocol.tcp module contains a class for representing TCP headers.

— Method tcp:new config

Returns a new TCP header for config. Config must a be a table which may contain the following keys:

src_port - “Source Port Number” field (16 bit unsigned integer). Default is 0.
dst_port - “Destination Port Number” field (16 bit unsigned integer). Default is 0.
seq_num - “Sequence Number” field (32 bit unsigned integer). Default is 0.
ack_num - “Acknowledgement Number” field (32 bit unsigned integer). Default is 0.
window_size - “Window Size” field (16 bit unsigned integer). Default is 0.
offset - “Data Offset” field (4 bit unsigned integer). Default is 0.
ns - “NS” flag (1 bit). Default is 0.
cwr - “CWR” flag (1 bit). Default is 0.
ece - “ECE” flag (1 bit). Default is 0.
urg - “URG” flag (1 bit). Default is 0.
ack - “ACK” flag (1 bit). Default is 0.
psh - “PSH” flag (1 bit). Default is 0.
rst - “RST” flag (1 bit). Default is 0.
syn - “SYN” flag (1 bit). Default is 0.
fin - “FIN” flag (1 bit). Default is 0.

— Method tcp:src_port port

— Method tcp:dst_port port

— Method tcp:seq_num seq_num

— Method tcp:ack_num ack_num

— Method tcp:window_size window_size

— Method tcp:offset offset

— Method tcp:ns ns

— Method tcp:cwr cwr

— Method tcp:ece ece

— Method tcp:urg urg

— Method tcp:ack ack

— Method tcp:psh psh

— Method tcp:rst rst

— Method tcp:syn syn

— Method tcp:fin fin

Combined accessor and setter methods. These methods set the values of the instance fields (see new) of a TCP header. If no argument is given the current value is returned.

— Method tcp:flags flags

Combined accessor and setter method for the TCP header flags (NS, CRW, ECE, URG, ACK, PSH, RST, SYN and FIN). Sets the header’s flags accoring to flags (9 bit unsigned intetger). If no argument is given the current flags are returned.

— Method tcp:checksum payload, length, ip

Computes and sets the “Checksum” field for length bytes of payload and optionally ip. If no argument is given the current value of the “Checksum” field is returned.

UDP (lib.protocol.udp)

The lib.protocol.udp module contains a class for representing UDP headers.

— Method udp:new config

Returns a new UDP header for config. Config must a be a table which may contain the following keys:

src_port - “Source Port Number” field (16 bit unsigned integer). Default is 0.
dst_port - “Destination Port Number” field (16 bit unsigned integer). Default is 0.

— Method udp:src_port port

— Method udp:dst_port port

Combined accessor and setter methods for the source and destination port fields. Sets the source or destination port individually. Returns the current port if called without arguments. Default is 8 (the UDP header length).

— Method udp:length length

Combined accessor and setter method for the “Length” field. Sets the “Length” field* to length (a 16 bit unsigned integer). If no argument is given the current value of the “Length” field is returned.

— Method udp:checksum payload, length, ip

Computes and sets the “Checksum” field for length bytes of payload and optionally ip. If no argument is given the current value of the “Checksum” field is returned.

GRE (lib.protocol.gre)

The lib.protocol.gre module contains a class for representing GRE headers. The gre protocol class only supports the checksum and key extensions and the lib.protocol.ethernet upper layer protocol.

— Method gre:new config

Returns a new GRE header for config. Config must a be a table which may contain the following keys:

protocol - Upper layer protocol. May be 0x6558 (Ethernet). Default is nil.
checksum - Set to true to enable checksumming. Default is false.
key - 32 bit unsigned integer. Enables keying if supplied. Default is nil.

— Method gre:checksum payload, length

Combined accessor and setter method for the checksum field. Computes and sets the checksum field for length bytes of payload. If no argument is given the current checksum is returned. Returns nil if checksumming is disabled.

— Method gre:checksum_check payload, length

Predicate to verify length bytes of payload against the header checkum. Return nil if checksumming is disabled.

— Method gre:key key

Combined accessor and setter method for the key field. Sets the key field to key. If no argument is given the current key is returned. Returns nil if keying is disabled.

— Method gre:protocol protocol

Combined accessor and setter method for the upper layer protocol. Sets the upper layer protocol to protocol. If no argument is given the current upper layer protocol is returned.

The lib.protocol.icmp.header module contains a class for representing ICMP headers. The icmp protocol class currently supports two upper layer protocols: lib.protocol.icmp.nd.ns and lib.protocol.icmp.nd.na. These upper layer protocols implement the headers necessary to perform “Neighbor Discovery”.

— Method icmp:new type, code

Returns a new ICMP header of type which may be either 135 or 136 for lib.protocol.icmp.nd.ns or lib.protocol.icmp.nd.na respectively. Optionally code can be supplied to set the “Code” field for the type.

— Method icmp:type type

— Method icmp:code code

Combined accessor and setter methods. These methods set the values of the instance fields (see new) of an ICMP header. If no argument is given the current value is returned.

— Method icmp:checksum payload, length, ipv6

Computes and sets the “Checksum” field for length bytes of payload. If the lower protocol layer is lib.protocol.ipv6 then ipv6 must be set to a true value.

— Method icmp:checksum_check payload, length, ipv6

Predicate to test if the header’s “Checksum” field matches length bytes of payload. If the lower protocol layer is lib.protocol.ipv6 then ipv6 must be set to a true value.

Neighbor Solicitation (lib.protocol.icmp.nd.ns)

— Method ns:new target

Returns a new Neighbor Solicitation header. Target is the IP address used for the “Target Address” field.

— Method ns:target target

Combined accessor and setter method for the “Target Address” field. Sets the “Target Address” field to target. If no argument is given the current value is returned.

— Method ns:target_eq target

Predicate to test if the header’s value in the “Target Address” field is equivalent to target.

Neighbor Advertisement (lib.protocol.icmp.nd.na)

— Method na:new target, router, solicited, override

Returns a new Neighbor Advertisement header. Target is the IP address used for the “Target Address” field. Router, solicited and override can be boolean values to set the “Router”, “Solicited” and “Override” flags respectively. The default for the flags is 0.

— Method ns:target target

— Method ns:router router

— Method ns:solicited solicited

— Method ns:override override

Combined accessor and setter methods. These methods set the values of the instance fields (see new) of an Neighbor Advertisement header. If no argument is given the current value is returned.

— Method ns:target_eq target

Predicate to test if the header’s value in the “Target Address” field is equivalent to target.

Both Neighbor Solicitation and Advertisement (lib.protocol.icmp.nd.ns and lib.protocol.icmp.nd.na) headers implement an options method for parsing TLV Options contained in the their payloads.

Example:

 -- Parse datagram with ICMP/NA packet
local na = dgram:parse()
 -- Parse TLV Options
local options = na:options(dgram:payload())

— Method nd:options payload, length

Parses and returns an array of TLV Options (see lib.protocol.icmp.nd.options.tlv) from length bytes of payload.

TLV Option (lib.protocol.icmp.nd.options.tlv)

The lib.protocol.icmp.nd.options.tlv module contains a class for representing TLV Options. Currently only two types of options are implemented: “Source Link-Layer Address” ("src_ll_addr") and “Target Link-Layer Address” ("tgt_ll_address"). Both are represented by the lladdr class (see lib.protocol.icmp.nd.options.lladdr).

— Method tlv:new type, data

Returns a new TLV Option object for data of type. Type may be either 1 for “Source Link-Layer Address” or 2 for “Target Link-Layer Address”. Data must be a lladdr object.

— Method tlv:name

Returns a string denoting the type of the option. Either "src_ll_addr" for “Source Link-Layer Address” or "tgt_ll_address" for “Target Link-Layer Address”.

— Method tlv:length

Returns the the size of the TLV Option as multiples of 8 bytes.

— Method tlv:type type

Combined accessor and setter method. Sets the type field (see new) to type. If no argument is given the current value of the type field is returned.

— Method tlv:option

Returns an object of the class denoted by the type field. Currently that only includes lladdr instances.

Link-Layer Address Option (lib.protocol.icmp.nd.options.lladdr)

The lib.protocol.icmp.nd.options.lladdr module contains a class for representing Link-Layer Address Options.

— Method lladdr:new address

Returns a new Link-Layer Option object for MAC address in binary representation.

— Method lladdr:name

Returns the string "ll_addr".

— Method lladdr:addr address

Combined accessor and setter method. Sets the address field (see new) to address. If no argument is given the current value of the address field is returned.

Datagram (lib.protocol.datagram)

The lib.protocol.datagram module provides basic mechanisms for parsing, building and manipulating a hierarchy of protocol headers and the associated payload contained in a data packet. In particular, it supports:

Parsing and in-place manipulation of protocol headers in a received packet
In-place decapsulation by removing leading protocol headers
Adding headers to an existing packet
Creation of a new packet
Appending payload to a packet

It mediates between packets as defined in core.packet and protocol classes which are defined as classes derived from the protocol header base class in the lib.protocol.header module.

The contents of a datagram instance are logically divided into three areas: The payload, parsed headers and pushed headers. The datagram payload is a sequence of bytes either inherited from the packet given to datagram:new or appended using datagram:payload. The headers in the payload can be parsed using datagram:parse_match, which will shrink the payload by the header. Finally, synthetic headers can be prepended to the datagram using datagram:push. To get the whole datagram as a packet use datagram:packet.

Datagram

A datagram can be used in two modes of operation, called “immediate commit” and “delayed commit”. In immediate commit mode, the push and pop methods immediately modify the underlying packet. However, this can be undesireable.

Even though the manipulations are relatively fast by using SIMD instructions to move and copy data when possible, performance-aware applications usually try to avoid as much of them as possible. This creates a conflict if the caller performs operations to push or parse a sequence of protocol headers in immediate commit mode.

This problem can be avoided by using delayed commit mode. In this mode, the push methods add the data to a separate buffer as intermediate storage. The buffer is prepended to the actual packet in a single operation by calling datagram:commit.

The pop methods are made light-weight in delayed commit mode as well by keeping track of an additional offset that indicates where the actual packet starts in the packet buffer. Each call to one of the pop methods simply increases the offset by the size of the popped piece of data. The accumulated actions will be applied as a single operation by datagram:commit.

The push and pop methods can be freely mixed in delayed commit mode.

Due to the destructive nature of these methods in immediate commit mode, they cannot be applied when the parse stack is not empty, because moving the data in the packet buffer will invalidate the parsed headers. The push and pop methods will raise an error in that case.

The buffer used in delayed commit mode has a fixed size of 512 bytes. This limits the size of data that can be pushed in a single operation. A sequence of push/commit operations can be used to push an arbitrary amount of data in chunks of up to 512 bytes.

— Method datagram:new packet, protocol, options

Creates a datagram for packet or from scratch if packet is nil. Protocol will be used by parse_match to parse the packet payload. If protocol is not nil it is set as the initial upper layer protocol. If options is not nil it must be a table that selects configurable properties of the class. Currently, the only option is the selection of immediate or delayed commit mode by setting the key delayed_commit to false or true, respectively. The default is immediate commit mode.

— Method datagram:push header

Prepends header to the front of the datagram. This method destructively modifies the underlying packet in immediate commit mode and raises an error if the parse stack is not empty.

In delayed commit mode, header is prepended to an intermediate buffer.

— Method datagram:push_raw data, length

This method behaves like the datagram:push method for an arbitrary chunk of memory of length length located at the address pointed to by data.

— Method datagram:parse_match protocol, check

Attempts to parse the next header in the datagram, thereby removing it from the payload. Returns a header instance of class protocol on success. If protocol is nil the current upper layer protocol as set by datagram:new or previous calls to parse_match is used.

If neither protocol nor the upper layer protocol is set or the constructor of the protocol class returns nil, the parsing operation has failed and parse_match returns nil. The datagram remains unchanged.

If the protocol class instance has been created successfully, it is passed as single argument to the anonymous function check.

If check returns a false value, the parsing has failed and parse_match returns nil. The packet remains unchanged.

If check is not supplied or if it returned a true value, the parsing has succeeded and the current upper layer protocol of the datagram is set to the value returned by header:upper_layer.

— Method datagram:parse protocols_and_checks

A wrapper around parse_match that allows parsing of a sequence of headers with a single method call.

If protocols_and_checks is a sequence of protocol class and check function pairs, parse_match is called for each pair. Returns the header object of the last header parsed or nil if any of the calls to parse_match return nil.

If called with a nil argument, this method is equivalent to parse_match called without arguments.

— Method datagram:parse_n n

A wrapper around parse_match that parses the next n protocol headers using the current upper layer protocol and subsequent values of header:upper_layer. It returns the last header object or nil if less than n headers could be parsed successfully.

— Method datagram:unparse n

Undoes the last n calls to parse_match on the datagram. E.g. prepends n parsed headers back to the payload. The sequence of parsed headers can be obtained by calling stack.

— Method datagram:pop n

Removes the leading n parsed headers from the datagram. Note that headers added via push can not be removed using pop. The caller has to ensure that the datagram contains at least n headers that were parsed using parse_match. The sequence of parsed headers can be obtained by calling stack. This method destructively modifies the underlying packet in immediate commit mode and raises an error if the parse stack is not empty.

In delayed commit mode, the packet is not modified and the parse stack remains valid.

For instance let d be an datagram with an Ethernet header followed by an IPv6 header. Assuming we have parsed both headers using d:parse_n(2), we could call d:pop(1) to decapsulate the IPv6 packet from its Ethernet header.

— Method datagram:pop_raw length, ulp

Removes length bytes from the beginning of the datagram. If ulp is given it is set as the current upper layer protocol. This method destructively modifies the underlying packet in immediate commit mode and raises an error if the parse stack is not empty.

In delayed commit mode, the packet is not modified and the parse stack remains valid.

— Method datagram:stack

Returns the parsed header objects as a sequence.

— Method datagram:packet

Returns a packet (see core.packet) containing the datagram (including pushed headers).

— Method datagram:payload pointer, length

Combined payload accessor and setter method. Returns a pointer to the datagram payload and its byte size.

If pointer and length are supplied then length bytes starting from pointer are appended to the datagram’s payload.

— Method datagram:data

Returns the location and size of the buffer of the underlying packet. This is a shortcut to datagram:packet followed by calls to packet.data and pakcet.length.

Method datagram:commit

If called in delayed commit mode, the operations accumulated by the push and pop methods since the creation of the datagram or the last invocation of datagram:commit are commited to the underlying packet. An error is raised if the parse stack is not empty.

The method can be safely called in immediate commit mode.

Snabb NFV

NFV config (program.snabbnfv.nfvconfig)

The program.snabbnfv.nfvconfig module implements a Network Functions Virtualization component based on Snabb. It introduces a simple configuration file format to describe NFV configurations which it then compiles to app networks. This NFV component is compatible with OpenStack Neutron.

NFV

— Function nfvconfig.load file, pci_address, socket_path

Loads the NFV configuration from file and compiles an app network using pci_address and socket_path for the underlying NIC driver and VhostUser apps. Returns the resulting engine configuration.

NFV Configuration Format

The configuration file format understood by program.snabbnfv.nfvconfig is based on Lua expressions. Initially, it contains a list of NFV ports:

return { <port-1>, ..., <port-n> }

Each port is defined by a range of properties which correspond to the configuration parameters of the underlying apps (NIC driver, VhostUser, PcapFilter, RateLimiter, nd_light and SimpleKeyedTunnel):

port := { port_id        = <id>,          -- A unique string
          mac_address    = <mac-address>, -- MAC address as a string
          vlan           = <vlan-id>,     -- ..
          ingress_filter = <filter>,       -- A pcap-filter(7) expression
          egress_filter  = <filter>,       -- ..
          tunnel         = <tunnel-conf>,
          rx_police_gbps = <n>,           -- Allowed input rate in Gbps
          tx_police_gbps = <n> }          -- Allowed output rate in Gbps

The tunnel section deviates a little from SimpleKeyedTunnel’s terminology:

tunnel := { type          = "L2TPv3",     -- The only type (for now)
            local_cookie  = <cookie>,     -- As for SimpleKeyedTunnel
            remote_cookie = <cookie>,     -- ..
            next_hop      = <ip-address>, -- Gateway IP
            local_ip      = <ip-address>, -- ~ `local_address'
            remote_ip     = <ip-address>, -- ~ `remote_address'
            session       = <32bit-int>   -- ~ `session_id' }

snabbnfv traffic

The snabbnfv traffic program loads and runs a NFV configuration using program.snabbnfv.nfvconfig. It can be invoked like so:

./snabb snabbnfv traffic <file> <pci-address> <socket-path>

snabbnfv traffic runs the loaded configuration indefinitely and automatically reloads the configuration file if it changes (at most once every second).

snabbnfv neutron2snabb

The snabbnfv neutron2snabb program converts Neutron database CSV dumps to the format used by program.snabbnfv.nfvconfig. For more info see Snabb NFV Architecture. It can be invoked like so:

./snabb snabbnfv neutron2snabb <csv-directory> <output-directory> [<hostname>]

snabbnfv neutron2snabb reads the Neutron configuration csv-directory and translates them to one lib.nfv.conig configuration file per physical network. If hostname is given, it overrides the hostname provided by hostname(1).

Watchdog (lib.watchdog.watchdog)

The lib.watchdog.watchdog module implements a per-thread watchdog functionality. Its purpose is to watch and kill processes which fail to call the watchdog periodically (e.g. hang).

It does so by using alarm(3) and ualarm(3) to have the OS send a SIGALRM to the process after a specified timeout. Because the process does not handle the signal it will be killed and exit with status 142.

— Function watchdog.set milliseconds

Set watchdog timeout to milliseconds. Values for milliseconds greater than 1,000 are truncated to the next second. For example:

watchdog.set(1100) == watchdog.set(2000)

— Function watchdog.reset

Starts the timout if the watchdog has not yet been started and resets the timeout otherwise. If the timeout is reached the process will be killed.

— Function watchdog.stop

Disables the timeout.