Note: This reference manual is a draft. The API defined in this document is not guaranteed to be stable or complete and future versions of Snabb will introduce backwards incompatible changes. With that being said, discrepancies between this document and the actual Snabb Switch implementation are considered to be bugs. Please report them in order to help improve this document.
Snabb is an extensible, virtualized, Ethernet networking toolkit. With Snabb you can implement networking applications using the Lua language. Snabb includes all the tools you need to quickly realize your network designs and its really fast too! Furthermore, Snabb is extensible and encourages you to grow the ecosystem to match your requirements.
Architecture
The Snabb Core forms a runtime environment (engine) which executes your design. A design is simply a Lua script used to drive the Snabb stack, you can think of it as your top-level “main” routine.
In order to add functionality to the Snabb stack you can load modules into the Snabb engine. These can be Lua modules as well as native code objects. We differentiate between two classes of modules, namely libraries and Apps. Libraries are simple collections of program utilities to be used in your designs, apps or other libraries, just as you might expect. Apps, on the other hand, are code objects that implement a specific interface, which is used by the Snabb engine to organize an App Network.
Network
Usually, a Snabb design will create a series of apps, interconnect these in a desired way using links and finally pass the resulting app network on to the Snabb engine. The engine’s job is to:
The core modules defined below can be loaded using Lua’s require
. For example:
local config = require("core.config")
local c = config.new()
...
An app is an isolated implementation of a specific networking function. For example, a switch, a router, or a packet filter.
Apps receive packets on input ports, perform some processing, and transmit packets on output ports. Each app has zero or more input and output ports. For example, a packet filter may have one input and one output port, while a packet recorder may have only an input port. Every app must implement the interface below. Methods which may be left unimplemented are marked as “optional”.
— Method myapp:new arg
Required. Create an instance of the app with a given argument arg. Myapp:new
must return an instance of the app. The handling of arg is up to the app but it is encouraged to use core.config
’s parse_app_arg
to parse arg.
— Field myapp.input
— Field myapp.output
Tables of named input and output links. These tables are initialized by the engine for use in processing and are read-only.
— Field myapp.appname
Name of the app. Read-only.
— Field myapp.shm
Can be set to a specification for core.shm.create_frame
. When set, this field will be initialized to a frame of shared memory objects by the engine.
— Field myapp.config
Can be set to a specification for core.lib.parse
. When set, the specification will be used to validate the app’s arg when it is configured using config.app
.
— Method myapp:link
Optional. Called any time the app’s links may have been changed (including on start-up). Guaranteed to be called before pull
and push
are called with new links.
— Method myapp:pull
Optional. Pull packets into the network.
For example: Pull packets from a network adapter into the app network by transmitting them to output ports.
— Method myapp:push
Optional. Push packets through the system.
For example: Move packets from input ports to output ports or to a network adapter.
— Method myapp:reconfig arg
Optional. Reconfigure the app with a new arg. If this method is not implemented the app instance is discarded and a new instance is created.
— Method myapp:report
Optional. Print a report of the current app status.
— Method myapp:stop
Optional. Stop the app and release associated external resources.
— Field myapp.zone
Optional. Name of the LuaJIT profiling zone used for this app (descriptive string). The default is the module name.
A config is a description of a packet-processing network. The network is a directed graph. Nodes in the graph are apps that each process packets in a specific way. Each app has a set of named input and output ports—often called rx and tx. Edges of the graph are unidirectional links that carry packets from an output port to an input port.
The config is a purely passive data structure. Creating and manipulating a config object does not immediately affect operation. The config has to be activated using engine.configure
.
— Function config.new
Creates and returns a new empty configuration.
— Function config.app config, name, class, arg
Adds an app of class with arg to the config where it will be assigned to name.
Example:
config.app(c, "nic", Intel82599, {pciaddr = "0000:00:00.0"})
— Function config.link config, linkspec
Add a link defined by linkspec to the config config. Linkspec must be a string of the format
app_name1.output_port->app_name2.input_port
where app_name1
and app_name2
are names of apps in config and output_port
and input_port
are valid output and input ports of the referenced apps respectively.
Example:
config.link(c, "nic1.tx->nic2.rx")
The engine executes a config by initializing apps, creating links, and driving the flow of execution. The engine also performs profiling and reporting functions. It can be reconfigured during runtime. Within Snabb Switch scripts the core.app
module is bound to the global engine
variable.
— Function engine.configure config
Configure the engine to use a new config config. You can safely call this method many times to incrementally update the running app network. The engine updates the app network as follows:
stop()
method is called if defined.)reconfig()
method. If the reconfig()
method is not implemented then the old instance is stopped a new one started.— Function engine.main options
Run the Snabb engine. Options is a table of key/value pairs. The following keys are recognized:
duration
- Duration in seconds to run the engine for (as a floating point number). If this is set you cannot supply done
.done
- A function to be called repeatedly by engine.main
until it returns true
. Once it returns true
the engine will be stopped and engine.main
will return. If this is set you cannot supply duration
.report
- A table which configures the report printed before engine.main()
returns. The keys showlinks
and showapps
can be set to boolean values to force or suppress link and app reporting individually. By default `engine.main()’ will report on links but not on apps.measure_latency
- By default, the breathe()
loop is instrumented to record the latency distribution of running the app graph. This information can be processed by the snabb top
program. Passing measure_latency=false
in the options will disable this instrumentation.no_report
- A boolean value. If true
no final report will be printed.— Function engine.now
Returns monotonic time in seconds as a floating point number. Suitable for timers.
— Variable engine.busywait
If set to true then the engine polls continuously for new packets to process. This consumes 100% CPU and makes processing latency less vulnerable to kernel scheduling behavior which can cause pauses of more than one millisecond.
Default: false
— Variable engine.Hz
Frequency at which to poll for new input packets. The default value is ‘false’ which means to adjust dynamically up to 100us during low traffic. The value can be overridden with a constant integer saying how many times per second to poll.
This setting is not used when engine.busywait is true.
A link is a ring buffer used to store packets between apps. Links can be treated either like arrays—accessing their internal structure directly—or as streams of packets by using their API functions.
— Function link.empty link
Predicate used to test if a link is empty. Returns true if link is empty and false otherwise.
— Function link.full link
Predicate used to test if a link is full. Returns true if link is full and false otherwise.
— Function link.nreadable link
Returns the number of packets on link.
— Function link.nwriteable link
Returns the remaining number of packets that fit onto link.
— Function link.receive link
Returns the next available packet (and advances the read cursor) on link. If the link is empty an error is signaled.
— Function link.front link
Return the next available packet without advancing the read cursor on link. If the link is empty, nil
is returned.
— Function link.transmit link, packet
Transmits packet onto link. If the link is full packet is dropped (and the drop counter increased).
— Function link.stats link
Returns a structure holding ring statistics for the link:
txbytes
, rxbytes
: Counts of transferred bytes.txpackets
, rxpackets
: Counts of transferred packets.txdrop
: Count of packets dropped due to ring overflow.A packet is an FFI object of type struct packet
representing a network packet that is currently being processed. The packet is used to explicitly manage the life cycle of the packet. Packets are explicitly allocated and freed by using packet.allocate
and packet.free
. When a packet is received using link.receive
its ownership is acquired by the calling app. The app must then ensure to either transfer the packet ownership to another app by calling link.transmit
on the packet or free the packet using packet.free
. Apps may only use packets they own, e.g. packets that have not been transmitted or freed. The number of allocatable packets is limited by the size of the underlying “freelist”, e.g. a pool of unused packet objects from and to which packets are allocated and freed.
— Type struct packet
struct packet {
uint16_t length;
uint8_t data[packet.max_payload];
};
— Constant packet.max_payload
The maximum payload length of a packet.
— Function packet.allocate
Returns a new empty packet. An an error is raised if there are no packets left on the freelist. Initially the length
of the allocated is 0, and its data
is uninitialized garbage.
— Function packet.free packet
Frees packet and puts in back onto the freelist.
— Function packet.clone packet
Returns an exact copy of packet.
— Function packet.resize packet, length
Sets the payload length of packet, truncating or extending its payload. In the latter case the contents of the extended area at the end of the payload are filled with zeros.
— Function packet.append packet, pointer, length
Appends length bytes starting at pointer to the end of packet. An error is raised if there is not enough space in packet to accomodate length additional bytes.
— Function packet.prepend packet, pointer, length
Prepends length bytes starting at pointer to the front of packet, taking ownership of the packet and returning a new packet. An error is raised if there is not enough space in packet to accomodate length additional bytes.
— Function packet.shiftleft packet, length
Take ownership of packet, truncate it by length bytes from the front, and return a new packet. Length must be less than or equal to length
of packet.
— Function packet.shiftright packet, length
Take ownership of packet, moves packet payload to the right by length bytes, growing packet by length. Returns a new packet. The sum of length and length
of packet must be less than or equal to packet.max_payload
.
— Function packet.from_pointer pointer, length
Allocate packet and fill it with length bytes from pointer.
— Function packet.from_string string
Allocate packet and fill it with the contents of string.
— Function **packet.clone_to_memory* pointer packet
Creates an exact copy of at memory pointed to by pointer. Pointer must point to a packet.packet_t
.
Snabb allocates special DMA memory that can be accessed directly by network cards. The important characteristic of DMA memory is being located in contiguous physical memory at a stable address.
— Function memory.dma_alloc bytes, [alignment]
Returns a pointer to bytes of new DMA memory.
Optionally a specific alignment requirement can be provided (in bytes). The default alignment is 128.
— Function memory.virtual_to_physical pointer
Returns the physical address (uint64_t
) the DMA memory at pointer.
— Variable memory.huge_page_size
Size of a single huge page in bytes. Read-only.
This module facilitates creation and management of named shared memory objects. Objects can be created using shm.create
similar to ffi.new
, except that separate calls to shm.open
for the same name will each return a new mapping of the same shared memory. Different processes can share memory by mapping an object with the same name (and type). Each process can map any object any number of times.
Mappings are deleted on process termination or with an explicit shm.unmap
. Names are unlinked from objects that are no longer needed using shm.unlink
. Object memory is freed when the name is unlinked and all mappings have been deleted.
Names can be fully qualified or abbreviated to be within the current process. Here are examples of names and how they are resolved where <pid>
is the PID of this process:
foo/bar
⇒ /var/run/snabb/<pid>/foo/bar
/1234/foo/bar
⇒ /var/run/snabb/1234/foo/bar
Behind the scenes the objects are backed by files on ram disk (/var/run/snabb/<pid>
) and accessed with the equivalent of POSIX shared memory (shm_overview(7)
).
The practical limit on the number of objects that can be mapped will depend on the operating system limit for memory mappings. On Linux the default limit is 65,530 mappings:
$ sysctl vm.max_map_count vm.max_map_count = 65530
— Function shm.create name, type
Creates and maps a shared object of type into memory via a hierarchical name. Returns a pointer to the mapped object.
— Function shm.open name, type, [readonly]
Maps an existing shared object of type into memory via a hierarchical name. If readonly is non-nil the shared object is mapped in read-only mode. Readonly defaults to nil. Fails if the shared object does not already exist. Returns a pointer to the mapped object.
— Function shm.alias new-path existing-path
Create an alias (symbolic link) for an object.
— Function shm.exists name
Returns a true value if shared object by name exists.
— Function shm.unmap pointer
Deletes the memory mapping for pointer.
— Function shm.unlink path
Unlinks the subtree of objects designated by path from the filesystem.
— Function shm.children path
Returns an array of objects in the directory designated by path.
— Function shm.register type, module
Registers an abstract shared memory object type implemented by module in shm.types
. Module must provide the following functions:
and can optionally provide the function:
The module’s type
variable must be bound to type. To register a new type a module might invoke shm.register
like so:
type = shm.register('mytype', getfenv())
-- Now the following holds true:
-- shm.types[type] == getfenv()
— Variable shm.types
A table that maps types to modules. See shm.register
.
— Function shm.create_frame path, specification
Creates and returns a shared memory frame by specification under path. A frame is a table of mapped—possibly abstract‑shared memory objects. Specification must be of the form:
{ <name> = {<module>, ...},
... }
Module must implement an abstract type registered with shm.register
, and is followed by additional initialization arguments to its create
function. Example usage:
local counter = require("core.counter")
-- Create counters foo/bar/{dtime,rxpackets,txpackets}.counter
local f = shm.create_frame(
"foo/bar",
{dtime = {counter, C.get_unix_time()},
rxpackets = {counter},
txpackets = {counter}})
counter.add(f.rxpackets)
counter.read(f.dtime)
— Function shm.open_frame path
Opens and returns the shared memory frame under path for reading.
— Function shm.delete_frame frame
Deletes/unmaps a shared memory frame. The frame directory is unlinked if frame was created by shm.create_frame
.
Double-buffered shared memory counters. Counters are 64-bit unsigned values. Registered with core.shm
as type counter
.
— Function counter.create name, [initval]
Creates and returns a counter
by name, initialized to initval. Initval defaults to 0.
— Function counter.open name
Opens and returns the counter by name for reading.
— Function counter.delete name
Deletes and unmaps the counter by name.
— Function counter.commit
Commits buffered counter values to public shared memory.
— Function counter.set counter, value
Sets counter to value.
— Function counter.add counter, [value]
Increments counter by value. Value defaults to 1.
— Function counter.read counter
Returns the value of counter.
Shared memory histogram with logarithmic buckets. Registered with core.shm
as type histogram
.
— Function histogram.new min, max
Returns a new histogram
, with buckets covering the range from min to max. The range between min and max will be divided logarithmically.
— Function histogram.create name, min, max
Creates and returns a histogram
as in histogram.new
by name. If the file exists already, it will be cleared.
— Function histogram.open name
Opens and returns histogram
by name for reading.
— Method histogram:add measurement
Adds measurement to histogram.
— Method histogram:iterate prev
When used as for count, lo, hi in histogram:iterate()
, visits all buckets in histogram in order from lowest to highest. Count is the number of samples recorded in that bucket, and lo and hi are the lower and upper bounds of the bucket. Note that count is an unsigned 64-bit integer; to get it as a Lua number, use tonumber
.
If prev is given, it should be a snapshot of the previous version of the histogram. In that case, the count values will be returned as a difference between their values in histogram and their values in prev.
— Method histogram:snapshot [dest]
Copies out the contents of histogram into the histogram
dest and returns dest. If dest is not given, the result will be a fresh histogram
.
— Method histogram:clear
Clears the buckets of histogram.
— Method **histogram:wrap_thunk* thunk, now
Returns a closure that wraps thunk, measuring and recording the difference between calls to now before and after thunk into histogram.
The core.lib
module contains miscellaneous utilities.
— Function lib.equal x, y
Predicate to test if x and y are structurally similar (isomorphic).
— Function lib.can_open filename, mode
Predicate to test if file at filename can be successfully opened with mode.
— Function lib.can_read filename
Predicate to test if file at filename can be successfully opened for reading.
— Function lib.can_write filename
Predicate to test if file at filename can be successfully opened for writing.
— Function lib.readcmd command, what
Runs Unix shell command and returns what of its output. What must be a valid argument to file:read
.
— Function lib.readfile filename, what
Reads and returns what from file at filename. What must be a valid argument to file:read
.
— Function lib.writefile filename, value
Writes value to file at filename using file:write
. Returns the value returned by file:write
.
— Function lib.readlink filename
Returns the true name of symbolic link at filename.
— Function lib.dirname filename
Returns the dirname(3)
of filename.
— Function lib.basename filename
Returns the basename(3)
of filename.
— Function lib.firstfile directory
Returns the filename of the first file in directory.
— Function lib.firstline filename
Returns the first line of file at filename as a string.
— Function lib.files_in_directory directory
Returns an array of filenames in directory.
— Function lib.load_string string
Evaluates and returns the value of the Lua expression in string.
— Function lib.load_conf filename
Evaluates and returns the value of the Lua expression in file at filename.
— Function lib.store_conf filename, value
Writes value to file at filename as a Lua expression. Supports tables, strings and everything that can be readably printed using print
.
— Function lib.bits bitset, basevalue
Returns a bitmask using the values of bitset as indexes. The keys of bitset are ignored (and can be used as comments).
Example:
bits({RESET=0,ENABLE=4}, 123) => 1<<0 | 1<<4 | 123
— Function lib.bitset value, n
Predicate to test if bit number n of value is set.
— Function lib.bitfield size, struct, member, offset, nbits, value
Combined accesor and setter function for bit ranges of integers in cdata structs. Sets nbits (number of bits) starting from offset to value. If value is not given the current value is returned.
Size may be one of 8, 16 or 32 depending on the bit size of the integer being set or read.
Struct must be a pointer to a cdata object and member must be the literal name of a member of struct.
Example:
local struct_t = ffi.typeof[[struct { uint16_t flags; }]]
-- Assuming `s' is an instance of `struct_t', set bits 4-7 to 0xF:
lib.bitfield(16, s, 'flags', 4, 4, 0xf)
-- Get the value:
lib.bitfield(16, s, 'flags', 4, 4) -- => 0xF
— Function string:split pattern
Returns an iterator over the string split by pattern. Pattern must be a valid argument to string:gmatch
.
Example:
for word, sep in ("foo!bar!baz"):split("(!)") do
print(word, sep)
end
> foo !
> bar !
> baz nil
— Function lib.hexdump string
Returns hexadecimal string for bytes in string.
— Function lib.hexundump hexstring
Returns byte string for hexstring.
— Function lib.comma_value n
Returns a string for decimal number n with magnitudes separated by commas. Example:
comma_value(1000000) => "1,000,000"
— Function lib.random_data length
Returns a string of length bytes of random data.
— Function lib.bounds_checked type, base, offset, size
Returns a table that acts as a bounds checked wrapper around a C array of type and size starting at base plus offset. Type must be a ctype and the caller must ensure that the allocated memory region at base/offset is at least sizeof(type)*size
bytes long.
— Function lib.throttle seconds
Return a closure that returns true
at most once during any seconds (a floating point value) time interval, otherwise false.
— Function lib.timeout seconds
Returns a closure that returns true
if seconds (a floating point value) have elapsed since it was created, otherwise false.
— Function lib.waitfor condition
Blocks until the function condition returns a true value.
— Function lib.waitfor2 name, condition, attempts, interval
Repeatedly calls the function condition in interval (milliseconds). If condition returns a true value waitfor2
returns. If condition does not return a true value after attempts waitfor2
raises an error identified by name.
— Function lib.yesno flag
Returns the string "yes"
if flag is a true value and "no"
otherwise.
— Function lib.align value, size
Return the next integer that is a multiple of size starting from value.
— Function lib.csum pointer, length
Computes and returns the “IP checksum” length bytes starting at pointer.
— Function lib.update_csum pointer, length, checksum
Returns checksum updated by length bytes starting at pointer. The default of checksum is 0LL
.
— Function lib.finish_csum checksum
Returns the finalized checksum.
— Function lib.malloc etype
Returns a pointer to newly allocated DMA memory for etype.
— Function lib.deepcopy object
Returns a copy of object. Supports tables as well as ctypes.
— Function lib.array_copy array
Returns a copy of array. Array must not be a “sparse array”.
— Function lib.htonl n
— Function lib.htons n
Host to network byte order conversion functions for 32 and 16 bit integers n respectively. Unsigned.
— Function lib.ntohl n
— Function lib.ntohs n
Network to host byte order conversion functions for 32 and 16 bit integers n respectively. Unsigned.
— Function lib.parse arg, config
Validates arg against the specification in config, and returns a fresh table containing the parameters in arg and any omitted optional parameters with their default values. Given arg, a table of parameters or nil
, assert that from config all of the required keys are present, fill in any missing values for optional keys, and error if any unknown keys are found. Config has the following format:
config := { key = {[required=boolean], [default=value]}, ... }
Each key is optional unless required
is set to a true value, and its default value defaults to nil
.
Example:
lib.parse({foo=42, bar=43}, {foo={required=true}, bar={}, baz={default=44}})
=> {foo=42, bar=43, baz=44}
Snabb can operate as a group of cooperating processes. The main process is the initial one that you start directly. The optional worker processes are children spawned when the main process calls the core.worker
module.
Multiprocessing
Each worker is a complete Snabb process. They can define app networks, run the engine, and do everything else that ordinary Snabb processes do. The exact behavior of each worker is determined by a Lua expression provided upon creation.
Groups of Snabb processes each have the following special properties:
kill -9
.memory.dma_alloc()
are usable by all processes in the group. This means that you can share DMA memory pointers between processes, for example via shm
shared memory objects, and reference them from any process. (The memory is automatically mapped at the expected address via a SEGV
signal handler.)The core.worker
API functions are available in the main process only:
— Function worker.start name luacode
Start a named worker process. The worker starts with a completely fresh Snabb process image (fork()+execve()
) and then executes the string luacode as a Lua source code expression.
Example:
worker.start("myworker", [[
print("hello world, from a Snabb worker process!")
print("could configure and run the engine now...")
]])
— Function worker.stop name
Stop a named worker process. The worker is abruptly killed.
Example:
worker.stop("myworker")
— Function worker.status
Return a table summarizing the status of all workers. The table key is the worker name and the value is a table with pid
and alive
attributes.
Example:
for w, s in pairs(worker.status()) do
print((" worker %s: pid=%s alive=%s"):format(
w, s.pid, s.alive))
end
Output:
worker w3: pid=21949 alive=true
worker w1: pid=21947 alive=true
worker w2: pid=21948 alive=true
Snabb designs can be run either with:
snabb <snabb-arg>* <design> <design-arg>*
or
#!/usr/bin/env snabb <snabb-arg>*
...
The main module provides an interface for running Snabb scripts. It exposes various operating system functions to scripts.
— Field main.parameters
A list of command-line arguments to the running script. Read-only.
— Function main.exit status
Cleanly exits the process with status.
The module apps.basic.basic_apps provides apps with general functionality for use in you app networks.
The Source
app is a synthetic packet generator. On each breath it fills each attached output link with new packets. It accepts a number as its configuration argument which is the byte size of the generated packets. By default, each packet is 60 bytes long. The packet data is initialized with zero bytes.
Source
The Join
app joins together packets from N input links onto one output link. On each breath it outputs as many packets as possible from the inputs onto the output.
Join
The Split
app splits packets from multiple inputs across multiple outputs. On each breath it transfers as many packets as possible from the input links to the output links.
Split
The Sink
app receives all packets from any number of input links and discards them. This can be handy in combination with a Source
.
Sink
The Tee
app receives all packets from any number of input links and transfers each received packet to all output links. It can be used to merge and/or duplicate packet streams
Tee
The Repeater
app collects all packets received from the input
link and repeatedly transfers the accumulated packets to the output
link. The packets are transmitted in the order they were received.
Repeater
The Truncate
app sends all packets received from the input
to the output
link and truncates or zero pads each packet to a given length. It accepts a number as its configuration argument which is the length of the truncated or padded packets.
Truncate
The Sample
app forwards packets every nth packet from the input
link to the output
link, and drops all others packets. It accepts a number as its configuration argument which is n.
Sample
The Intel10G
drives one port of an Intel 82599 Ethernet controller. Packets taken from the rx
port are transmitted onto the network. Packets received from the network are put on the tx
port.
Intel10G
— Method Intel10G.dev:get_rxstats
Returns a table with the following keys:
counter_id
- Counter idpackets
- Number of packets receiveddropped
- Number of packets droppedbytes
- Total bytes received— Method Intel10G.dev:get_txstats
Returns a table with the following keys:
counter_id
- Counter idpackets
- Number of packets sentbytes
- Total bytes sentThe Intel10G
app accepts a table as its configuration argument. The following keys are defined:
— Key pciaddr
Required. The PCI address of the NIC as a string.
— Key macaddr
Optional. The MAC address to use as a string. The default is a wild-card (e.g. accept all packets).
— Key vlan
Optional. A twelve bit integer (0-4095). If set, incoming packets from other VLANs are dropped and outgoing packets are tagged with a VLAN header.
— Key vmdq
Optional. Boolean, defaults to false. Enables interface virtualization. Allows to have multiple Intel10G
apps per port. If enabled, macaddr must be specified.
— Key mirror
Optional. A table. If set, this app will receive copies of all selected packets on the physical port. The selection is configured by setting keys of the mirror table. Either mirror.pool or mirror.port may be set.
If mirror.pool is true
all pools defined on this physical port are mirrored. If mirror.pool is an array of pool numbers then the specified pools are mirrored.
If mirror.port is one of “in”, “out” or “inout” all incoming and/or outgoing packets on the port are mirrored respectively. Note that this does not include internal traffic which does not enter or exit through the physical port.
— Key rxcounter
— Key txcounter
Optional. Four bit integers (0-15). If set, incoming/outgoing packets will be counted in the selected statistics counter respectively. Multiple apps can share a counter. To retrieve counter statistics use Intel10G.dev:get_rxstats()
and Intel10G.dev:get_txstats()
.
— Key rate_limit
Optional. Number. Limits the maximum Mbit/s to transmit. Default is 0 which means no limit. Only applies to outgoing traffic.
— Key priority
Optional. Floating point number. Weight for the round-robin algorithm used to arbitrate transmission when rate_limit is not set or adds up to more than the line rate of the physical port. Default is 1.0 (scaled to the geometric middle of the scale which goes from 1/128 to 128). The absolute value is not relevant, instead only the ratio between competing apps controls their respective bandwidths. Only applies to outgoing traffic.
For example, if two apps without rate_limit set have the same priority, both get the same output bandwidth. If the priorities are 3.0/1.0, the output bandwidth is split 75%/25%. Likewise, 1.0/0.333 or 1.5/0.5 yield the same result.
Note that even a low-priority app can use the whole line rate unless other (higher priority) apps are using up the available bandwidth.
The Intel10G
app can transmit and receive at approximately 10 Mpps per processor core.
Each physical Intel 82599 port supports the use of up to:
Intel10G
app instances)macaddr
configuration option)vlan
configuration option)mirror
configuration option)LoadGen
is a load generator app based on the Intel 82599 Ethernet controller. It reads up to 32,000 packets from the input
port and transmits them repeatedly onto the network. All incoming packets are dropped.
LoadGen
The LoadGen
app accepts a string as its configuration argument. The given string denotes the PCI address of the NIC to use.
The LoadGen
app can transmit at line-rate (14 Mpps) without significant CPU usage.
The intel_mp.Intel
app provides drivers for Intel i210/i250/82599 based network cards. The driver exposes multiple receive and transmit queues that can be attached to separate instances of the app on different processes.
The links are named input
and output
.
If attaching multiple processes to a single NIC, performance appears better with engine.busywait = false
.
The intel_mp.Intel
app can drive an Intel 82599 NIC at 14 million pps.
— Key pciaddr
Required. The PCI address of the NIC as a string.
— Key ndesc
Optional. Number of DMA descriptors to use i.e. size of the DMA transmit and receive queues. Must be a multiple of 128. Default is not specified but assumed to be broadly applicable.
— Key rxq
Optional. The receive queue to attach to, numbered from 0.
— Key txq
Optional. The transmit queue to attach to, numbered from 0.
— Key rsskey
Optional. The rsskey is a 32 bit integer that seeds the hash used to distribute packets across queues. If there are multiple levels of RSS snabb devices in the packet flow making this unique will help packet distribution.
— Key wait_for_link
Optional. Boolean that indicates if new
should block until there is a link light or not. The default is false
.
— Key linkup_wait
Optional Number of seconds new
waits for the device to come up. The default is 120.
— Key mtu
Optional The maximum packet length sent or received, excluding the trailing 4 byte CRC. The default is 9014.
— Key master_stats
Optional Boolean indicating whether to elect an arbitrary app (the master) to collect device statistics. The default is true.
— Key run_stats
Optional Boolean indicating if this app instance should collect device statistics. One per physical NIC (conflicts with master_stats
). There is a small but detectable run time performance hit incurred. The default is false.
RSS will distribute packets based on as many of the fields below as are present in the packet:
Packets that are not IPv4 or IPv6 will be delivered to receive queue 0.
Each chipset supports a differing number of receive / transmit queues:
The Solarflare
app drives one port of a Solarflare SFN7 Ethernet controller. Multiple instances of the Solarflare app can be instantiated on the same PCI device. Packets received from the network will be dispatched between apps based on destination MAC address and VLAN. Packets taken from the rx
port are transmitted onto the network. Packets received from the network are put on the tx
port.
Solarflare
The Solarflare
app requires OpenOnload version 201502 to be installed and the sfc
module to be loaded.
The Solarflare
app accepts a table as its configuration argument. The following keys are defined:
— Key pciaddr
Required. The PCI address of the NIC as a string.
— Key macaddr
Optional. The MAC address to use as a string. The default is a wild-card (e.g. accept all packets).
— Key vlan
Optional. A twelve bit integer (0-4095). If set, incoming packets from other VLANs are dropped and outgoing packets are tagged with a VLAN header.
The RateLimiter
app implements a Token bucket algorithm with a single bucket dropping non-conforming packets. It receives packets on the input
port and transmits conforming packets to the output
port.
RateLimiter
— Method RateLimiter:snapshot
Returns throughput statistics in form of a table with the following fields:
rx
- Number of packets receivedtx
- Number of packets transmittedtime
- Current time in nanosecondsThe RateLimiter
app accepts a table as its configuration argument. The following keys are defined:
— Key rate
Required. Rate in bytes per second to which throughput should be limited.
— Key bucket_capacity
Required. Bucket capacity in bytes. Should be equal or greater than rate. Otherwise the effective rate may be limted.
— Key initial_capacity
Optional. Initial bucket capacity in bytes. Defaults to bucket_capacity.
The RateLimiter
app is able to process more than 20 Mpps per CPU core. Refer to its selftest for details.
The PcapFilter
app receives packets on the input
port and transmits conforming packets to the output
port. In order to conform, a packet must match the pcap-filter expression of the PcapFilter
instance and/or belong to a sanctioned connection. For a connection to be sanctioned it must be tracked in a state table by a PcapFilter
app using the same state table. All PcapFilter
apps share a global namespace of state table identifiers. Multiple PcapFilter
apps—e.g. for inbound and outbound traffic—can refer to the same connection by sharing a state table identifer.
PcapFilter
The PcapFilter
app accepts a table as its configuration argument. The following keys are available:
— Key filter
Required. A string containing a pcap-filter expression.
— Key state_table
Optional. A string naming a state table. If set, packets passing any rule will be tracked in the specified state table and any packet that belongs to a tracked connection in the specified state table will be let pass.
— Key sessions_established
Total number of sessions established.
The nd_light
app implements a small subset of IPv6 neighbor discovery (RFC4861). It has two duplex ports, north
and south
. The south
port attaches to a network on which neighbor discovery (ND) must be performed. The north
port attaches to an app that processes IPv6 packets (including full ethernet frames). Packets transmitted to the north
port must be wrapped in full Ethernet frames (which may be empty).
The nd_light
app replies to neighbor solicitations for which it is configured as a target and performs rudimentary address resolution for its configured next-hop address. If address resolution succeeds, the Ethernet headers of packets from the north
port will be overwritten with headers containing the discovered destination address and the configured source address before they are transmitted over the south
port. All packets from the north
port are discarded as long as ND has not yet succeeded. Packets received from the south
port are transmitted to the north
port unaltered.
nd_light
The nd_light
app accepts a table as its configuration argument. The following keys are defined:
— Key local_mac
Required. Local MAC address as a string or in binary representation.
— Key local_ip
Required. Local IPv6 address as a string or in binary representation.
— Key next_hop
Required. IPv6 address of next hop as a string or in binary representation.
— Key delay
Optional. Neighbor solicitation retransmission delay in milliseconds. Default is 1,000ms.
— Key retrans
Optional. Number of neighbor solicitation retransmissions. Default is unlimited retransmissions.
— Key ns_checksum_errors
Neighbor solicitation requests dropped due to invalid ICMP checksum.
— Key ns_target_address_errors
Neighbor solicitation requests dropped due to invalid target address.
— Key na_duplicate_errors
Neighbor advertisement requests dropped because next-hop is already resolved.
— Key na_target_address_errors
Neighbor advertisement requests dropped due to invalid target address.
— Key nd_protocol_errors
Neighbor discovery requests dropped due to protocol errors (invalid IPv6 hop-limit or invalid neighbor solicitation request options).
The SimpleKeyedTunnel
app implements “a simple L2 Ethernet over IPv6 tunnel encapsulation” as described in Keyed IPv6 Tunnel. It has two duplex ports, encapsulated
and decapsulated
. Packets transmitted on the decapsulated
input port will be encapsulated and put on the encapsulated
output port. Packets transmitted on the encapsulated
input port will be decapsulated and put on the decapsulated
output port.
SimpleKeyedTunnel
The SimpleKeyedTunnel
app accepts a table as its configuration argument. The following keys are defined:
— Key local_address
Required. Local IPv6 address as a string.
— Key remote_address
Required. Remote IPv6 address as a string.
— Key local_cookie
Required. Local cookie, 8 bytes encoded in a hexadecimal string.
— Key remote_cookie
Required. Remote cookie, 8 bytes encoded in a hexadecimal string.
— Key local_session
Optional. Unsigned integer, 32 bit. If set, the session_id
field of the L2TPv3 header will be overwritten with this value.
— Key hop_limit
Optional. Unsigned integer. Sets the hop limit. Default is 64.
— Key default_gateway_MAC
Optional. Destination MAC as a string. Not required if overwritten by an app such as nd_light
.
— Key length_errors
Ingress packets dropped due to invalid length (packet too short).
— Key protocol_errors
Ingress packets dropped due to unrecognized IPv6 protocol ID.
— Key cookie_errors
Ingress packets dropped due to wrong cookie value.
— Key remote_address_errors
Ingress packets dropped due to wrong remote IPv6 endpoint address.
— Key local_address_errors
Ingress packets dropped due to wrong local IPv6 endpoint address.
The VhostUser
app implements portions of the Virtio protocol for virtual ethernet I/O interfaces. In particular, VhostUser
supports the virtio vring data structure for packet I/O in shared memory (DMA) and the Linux vhost API for creating vrings attached to tuntap devices.
With VhostUser
SnabbSwitch can be used as a virtual ethernet interface by QEMU virtual machines. When connected via a UNIX socket, packets can be sent to the virtual machine by transmitting them on the rx
port and packets sent by the virtual machine will arrive on the tx
port.
VhostUser
The VhostUser
app accepts a table as its configuration argument. The following keys are defined:
— Key socket_path
Optional. A string denoting the path to the UNIX socket to connect on. Unless given all incoming packets will be dropped.
— Key is_server
Optional. Listen and accept an incoming connection on socket_path instead of connecting to it.
The VirtioNet
app implements a subset of the driver part of the virtio-net specification. It can connect to a virtio-net device from within a QEMU virtual machine. Packets can be sent out of the virtual machine by transmitting them on the rx
port, and packets sent to the virtual machine will arrive on the tx
port.
VirtioNet
The VirtioNet
app accepts a table as its configuration argument. The following keys are defined:
— Key pciaddr
Required. The PCI address of the virtio-net device.
— Key use_checksum
Optional. Boolean value to enable the checksum offloading pre-calculations applied on IPv4/IPv6 TCP and UDP packets.
The PcapReader
and PcapWriter
apps can be used to inject and log raw packet data into and out of the app network using the Libpcap File Format. PcapReader
reads raw packets from a PCAP file and transmits them on its output
port while PcapWriter
writes packets received on its input
port to a PCAP file.
PcapReader
Both PcapReader
and PcapWriter
expect a filename string as their configuration arguments to read from and write to respectively. PcapWriter
will alternatively accept an array as its configuration argument, with the first element being the filename and the second element being a mode argument to io.open
.
The RawSocket
app is a bridge between Linux network interfaces (eth0
, lo
, etc.) and a Snabb app network. Packets taken from the rx
port are transmitted over the selected interface. Packets received on the interface are put on the tx
port.
RawSocket
The RawSocket
app accepts a string as its configuration argument. The string denotes the interface to bridge to.
The UnixSocket
app provides I/O for a named Unix socket.
UnixSocket
The UnixSocket
app takes a string argument which denotes the Unix socket file name to open, or a table with the fields:
filename
- the Unix socket file name to open.listen
- if true
, listen for incoming connections on the socket rather than connecting to the socket in client mode.mode
- can be “stream” or “packet” (the default is “stream”): the difference is that in packet mode, the packets are not split or merged (in both modes packets arrive in order).NOTE: The socket is not opened until the first call to push() or pull(). If connection is lost, the socket will be re-opened on the next call to push() or pull().
The Tap
app is used to interact with a Linux tap device. Packets transmitted on the input
port will be sent over the tap device, and packets that arrive on the tap device can be received on the output
port.
Tap
The Tap
app accepts a string that identifies an existing tap interface.
The Tap device can be configured using standard Linux tools:
ip tuntap add Tap345 mode tap
ip link set up dev Tap345
ip link set address 02:01:02:03:04:08 dev Tap0
There are three VLAN related apps, Tagger
, Untagger
and VlanMux
. The Tagger
and Untagger
apps add or remove a VLAN tag whereas the VlanMux
app can multiplex and demultiplex packets to different output ports based on tag.
The Tagger
app adds a VLAN tag, with the configured value, to packets received on its input
port and transmits them on its output
port.
— Key tag
Required. VLAN tag to add or remove from the packet.
The Untagger
app checks packets received on its input
port for a VLAN tag, removes it if it matches with the configured VLAN tag and transmits them on its output
port. Packets with other VLAN tags than the configured tag will be dropped.
— Key tag
Required. VLAN tag to add or remove from the packet.
Despite the name, the VlanMux
app can act both as a multiplexer, i.e. receive packets from multiple different input ports, add a VLAN tag and transmit them out onto one, as well as receiving packets from its trunk
port and demultiplex it over many output ports based on the VLAN tag of the received packet.
Packets received on its trunk
input port with Ethernet type 0x8100 are inspected for the VLAN tag and transmitted on an output port vlanX
where X is the VLAN tag parsed from the packet. If no such output port exists the packet is dropped. Received packets with an Ethernet type other than 0x8100 are transmitted on its native
output port,
Packets received on its native
input port are transmitted verbatim on its trunk
output port.
Packets received on input ports named vlanX
, where X is a VLAN tag, will have the VLAN tag X added and then be transmitted on its trunk
output port.
There is no configuration for the VlanMux
app, simply connect it to your other apps and it will base its actions on the name of the ports.
A bridge
app implements a basic Ethernet bridge with split-horizon semantics. It has an arbitrary number of ports. For each input port there must exist an output port with the same name. Each port name is a member of at most one split-horizon group. If it is not a member of a split-horizon group, the port is also called a free port. Packets arriving on a free input port may be forwarded to all other output ports. Packets arriving on an input port that belongs to a split-horizon group are never forwarded to any output port belonging to the same split-horizon group. There are two bridge
implementations available: apps.bridge.flooding
and apps.bridge.learning`.
bridge
A bridge
app accepts a table as its configuration argument. The following keys are defined:
— Key ports
Optional. An array of free port names. The default is no free ports.
— Key split_horizon_groups
Optional. A table mapping split-horizon groups to arrays of port names. The default is no split-horizon groups.
— Key config
Optional. The configuration of the actual bridge implementation.
The flooding bridge
app implements the simplest possible bridge, which floods a packet arriving on an input port to all output ports within its scope according to the split-horizon topology.
The flooding bridge
app ignores the config key of its configuration.
The learning bridge
app implements a learning bridge using a custom hash table to store the set of MAC source addresses of packets arriving on each input port. When a packet is received it is forwarded to all output ports whose corresponding input ports match the packet’s destination MAC address. When no input port matches, the packet is flooded to all output ports. Multicast MAC addresses are always flooded to all output ports associated with the input port. The scoping rules according to the split-horizon topology apply unchanged.
The learning bridge
app accepts a table as the value of the config key of its configuration. The following keys are defined:
— Key mac_table
Optional. This is a table that defines the characteristics of the MAC table. The following keys are defined
— Key size
Optional. The number of MAC addresses to be stored in the table. Default is 256. The size of the table is increased automatically if this limit is reached or if an overflow in one of the hash buckets occurs. This value is capped by resize_max.
— Key timeout
Optional. Timeout for learned MAC addresses in seconds. Default is 60.
— Key verbose
Optional. A boolean value. If true, statistics about table usage is logged during each timeout interval. Default is false
.
— Key copy_on_resize
Optional. A boolean value. If true, the contents of the table is copied to the newly allocated table after a resize operation. Default is true
.
— Key resize_max
Optional. An upper bound for the size of the table. Default is 65536.
The AES128gcm
implements ESP in transport mode using the AES-GCM-128 cipher. It encrypts packets received on its decapsulated
port and transmits them on its encapsulated
port, and vice-versa. Packets arriving on the decapsulated
port must have an IPv6 header, and packets arriving on the encapsulated
port must have an IPv6 header followed by an ESP header, otherwise they will be discarded.
AES128gcm
References:
lib.ipsec.esp
The AES128gcm
app accepts a table as its configuration argument. The following keys are defined:
— Key spi
Required. A 32 bit integer denoting the “Security Parameters Index” as specified in RFC 4303.
— Key transmit_key
Required. Hexadecimal string of 32 digits (two digits for each byte) that denotes a 128-bit AES key as specified in RFC 4106 used for the encryption of outgoing packets.
— Key transmit_salt
Required. Hexadecimal string of eight digits (two digits for each byte) that denotes four bytes of salt as specified in RFC 4106 used for the encryption of outgoing packets.
— Key receive_key
Required. Hexadecimal string of 32 digits (two digits for each byte) that denotes a 128-bit AES key as specified in RFC 4106 used for the decryption of incoming packets.
— Key receive_salt
Required. Hexadecimal string of eight digits (two digits for each byte) that denotes four bytes of salt as specified in RFC 4106 used for the decryption of incoming packets.
— Key receive_window
Optional. Minimum width of the window in which out of order packets are accepted as specified in RFC 4303. The default is 128.
— Key resync_threshold
Optional. Number of consecutive packets allowed to fail decapsulation before attempting “Re-synchronization” as specified in RFC 4303. The default is 1024.
— Key resync_attempts
Optional. Number of attempts to re-synchronize a packet that triggered “Re-synchronization” as specified in RFC 4303. The default is 8.
— Key auditing
Optional. A boolean value indicating whether to enable or disable “Auditing” as specified in RFC 4303. The default is nil
(no auditing).
The Match
app compares packets received on its input port rx
with those received on the reference input port comparator
, and reports mismatches as well as packets from comparator
that were not matched.
Match
— Method Match:errors
Returns the recorded errors as an array of strings.
The Match
app accepts a table as its configuration argument. The following keys are defined:
— Key fuzzy
Optional. If this key is true
packets from rx
that do not match the next packet from comparator
are ignored. The default is false
.
— Key modest
Optional. If this key is true
unmatched packets from comparator
are ignored if at least one packet from ´rx´ was successfully matched. The default is false
.
The Synth
app generates synthetic packets with Ethernet headers and alternating payload sizes. On each breath it fills each attached output link with new packets.
Synth
The Synth
app accepts a table as its configuration argument. The following keys are defined:
— Key src
— Key dst
Source and destination MAC addresses in human readable from. The default is "00:00:00:00:00:00"
.
— Key sizes
An array of numbers designating the packet payload sizes. The default is {64}
.
L7Spy
The L7Spy
app is a Snabb app that scans packets passing through it using an instance of the Scanner
class. The scanner instance may be shared among several L7Spy
instances or with a L7Fw
app for filtering.
— Method L7Spy:new config
Construct a new L7Spy
app instance based on a given configuration table. The table may contain the following key:
scanner
(optional): Either a string identifying the kind of scanner to construct (currently only "ndpi"
is accepted) or an existing scanner instance.L7Fw
The L7Fw
app implements a stateful firewall by querying the scanner state collected by a L7Spy
app. It then filters packets based on a given set of rules.
— Method L7Fw:new config
Construct a new L7Fw
app instance based on a given configuration table. The table may contain the following keys:
scanner
: A Scanner
instance shared with an L7Spy
instance. The metadata in this scanner is used for packet filtering.rules
: A table mapping protocol names (as strings) to firewall actions. The accepted actions are "accept"
, "reject"
, "drop"
, or a pfmatch expression. The pfmatch expression may use the variable flow_count
(as an arithmetic expression) to refer to the number of packets in a given protocol flow, and may call the accept
, reject
, or drop
methods.local_ipv4
(optional): An IPv4 address that identifies the host running the firewall. This is used as the source address in ICMPv4 or TCP reject responses.local_ipv6
(optional): An IPv6 address that identifies the host running the firewall. This is used as the source address in ICMPv6 or TCP reject responses.local_macaddr
(optional): A MAC address that identifies the host running the firewall. This is used for the source address in ethernet frames for reject responses.logging
(optional): A log level parameter that can be set to “on” or “off”. When set to “on”, it will report dropped/rejected packets to the system log.Scanner
objects are responsible for:
The class is not meant to be instantiated directly, but to be used as the basis for concrete implementations (e.g. NdpiScanner
). It provides one function that subclasses can use:
Extracts fields from the headers of an IPv4 or IPv6 packet. The returned values are:
Key objects contain some of the returned information in a compact FFI representation, and can be used as an aid to uniquely identify a flow of packets. The provide the following attributes:
:eth_type()
: Method which returns the type of the Ethernet frame payload, either ETH_TYPE_IPv4
or ETH_TYPE_IPv6
.:hash()
: Method which returns an integer calculated by hashing all the other values in the key object..vlan_id
: VLAN identifier. Zero for no VLAN tags..ip_proto
: The IP protocol..lo_addr
and .hi_addr
: IP addresses (either v4 or v6)..lo_port
and .hi_port
: For TCP and UDP, the ports as big-endian (network) integers.This method can be very useful to implement scanners using backends which do not implement their own flow classification.
All the Scanner
implementations conform to the Scanner
base API.
— Method Scanner:scan_packet packet, time
Scans a packet.
The time parameter is used to know at which time (in seconds from the Epoch) packet has been received for processing. A suitable value can be obtained using engine.now()
.
— Method Scanner:get_flow packet
Obtains the traffic flow for a given packet. If the packet is determined to not match any of the detected flows, nil
is returned. The returned flow object has at least the following fields:
protocol
: The L7 protocol for the flow. A user-visible string can be obtained by passing this value to Scanner:protocol_name()
.packets
: Number of packets scanned which belong to the traffic flow.last_seen
: Last time (in seconds from the Epoch) at which a packet belonging to the flow has been scanned.— Method Scanner:flows
Returns an iterator over all the traffic flows detected by the scanner. The returned value is suitable to be used in a for
-loop:
for flow in my_scanner:flows() do
-- Do something with "flow".
end
— Method Scanner:protocol_name protocol
Given a protocol identifier, returns a user-friendly name as a string. Typically the protocol is obtained flow objects returned by Scanner:get_flow()
.
NdpiScanner
uses the nDPI library (via the ljndpi FFI binding) to scan packets and determine L7 traffic flows. The nDPI library (libndpi.so
) must be available in the host system. Versions 1.7 and 1.8 are supported.
— Method NdpiScanner:new ticks_per_second
Creates a new scanner, with a ticks_per_second resolution.
The apps.wall.util
module contains miscellaneous utilities.
— Function util.ipv4_addr_cmp a, b
Compares two IPv4 addresses a and b. The returned value follows the same convention as for C.memcmp()
: zero if both addresses are equal, or an integer value with the same sign as the sign of the difference between the first pair of bytes that differ in a and b.
— Function util.ipv6_addr_cmp a, b
Compares two IPv6 addresses a and b. The returned value follows the same convention as for C.memcmp()
: zero if both addresses are equal, or an integer value with the same sign as the sign of the difference between the first pair of bytes that differ in a and b.
The SouthAndNorth
application is not to mean to be used directly, but rather as a building block for more complex applications which need two duplex ports (south
and north
) which forward packets between them, optionally doing some intermediate processing.
Packets arriving to the north
port are passed to the :on_southbound_packet()
method —which can be overriden in a subclass—, and forwarded to the south
port. Conversely, packets arriving to the south
port are passed to :on_northbound_packet()
method, and finally forwarded to the north
port.
SouthAndNorth
The value returnbyed :on_southbound_packet()
and :on_northbound_packet()
determines what will be done to the packet being processed:
false
discards the packet: the packet will not be forwarded, and packet.free()
will be called on it.packet.free()
called on it, and the returned packet is forwarded.nil
achieves the same effect.The following snippet defines an application derived from SouthAndNorth
which silently discards packets bigger than a certain size, and keeps a count of how many packets have been discarded and forwarded:
-- Setting SouthAndNorth as metatable "inherits" from it.
DiscardBigPackets = setmetatable({},
require("apps.wall.util").SouthAndNorth)
function DiscardBigPackets:new (max_length)
return setmetatable({
max_packet_length = max_length,
discarded_packets = 0,
forwarded_packets = 0,
}, self)
end
function DiscardBigPackets:on_northbound_packet (pkt)
if pkt.length > self.max_packet_length then
self.discarded_packets = self.discarded_packets + 1
return false
end
self.forwarded_packets = self.forwarded_packets + 1
end
-- Apply the same logic for packets in the other direction.
DiscardBigPackets.on_southbound_packet =
DiscardBigPackets.on_northbound_packet
The checksum module provides an optimized ones-complement checksum routine.
— Function ipsum pointer length initial
Return the ones-complement checksum for the given region of memory.
pointer is a pointer to an array of data to be checksummed. initial is an unsigned 16-bit number in host byte order which is used as the starting value of the accumulator. The result is the IP checksum over the data in host byte order.
The initial argument can be used to verify a checksum or to calculate the checksum in an incremental manner over chunks of memory. The synopsis to check whether the checksum over a block of data is equal to a given value is the following
if ipsum(pointer, length, value) == 0 then
-- checksum correct
else
-- checksum incorrect
end
To chain the calculation of checksums over multiple blocks of data together to obtain the overall checksum, one needs to pass the one’s complement of the checksum of one block as initial value to the call of ipsum() for the following block, e.g.
local sum1 = ipsum(data1, length1, 0)
local total_sum = ipsum(data2, length2, bit.bnot(sum1))
This function takes advantage of SIMD hardware when available.
A ctable is a hash table whose keys and values are instances of FFI data types. In Lua parlance, an FFI value is a “cdata” value, hence the name “ctable”.
A ctable is parameterized for the specific types for its keys and values. This allows for the table to be stored in an efficient manner. Adding an entry to a ctable will copy the value into the table. Logically, the table “owns” the value. Lookup can either return a pointer to the value in the table, or copy the value into a user-supplied buffer, depending on what is most convenient for the user.
As an implementation detail, the table is stored as an open-addressed robin-hood hash table with linear probing. This means that to look up a key in the table, we take its hash value (using a user-supplied hash function), map that hash value to an index into the table by scaling the hash to the table size, and then scan forward in the table until we find an entry whose hash value is greater than or equal to the hash in question. Each entry stores its hash value, and empty entries have a hash of 0xFFFFFFFF
. If the entry’s hash matches and the entry’s key is equal to the one we are looking for, then we have our match. If the entry’s hash is greater than our hash, then we have a failure. Hash collisions are possible as well of course; in that case we continue scanning forward.
The distance travelled while scanning for the matching hash is known as the displacement. The table measures its maximum displacement, for a number of purposes, but you might be interested to know that a maximum displacement for a table with 2 million entries and a 40% load factor is around 8 or 9. Smaller tables will have smaller maximum displacements.
The ctable has two lookup interfaces. One will perform the lookup as described above, scanning through the hash table in place. The other will fetch all entries within the maximum displacement into a buffer, then do a branchless binary search over that buffer. This second streaming lookup can also fetch entries for multiple keys in one go. This can amortize the cost of a round-trip to RAM, in the case where you expect to miss cache for every lookup.
To create a ctable, first create a parameters table specifying the key and value types, along with any other options. Then call ctable.new
on those parameters. For example:
local ctable = require('lib.ctable')
local ffi = require('ffi')
local params = {
key_type = ffi.typeof('uint32_t'),
value_type = ffi.typeof('int32_t[6]'),
hash_fn = ctable.hash_i32,
max_occupancy_rate = 0.4,
initial_size = math.ceil(occupancy / 0.4)
}
local ctab = ctable.new(params)
— Function ctable.new parameters
Create a new ctable. parameters is a table of key/value pairs. The following keys are required:
key_type
: An FFI type (LuaJIT “ctype”) for keys in this table.value_type
: An FFI type (LuaJT “ctype”) for values in this table.Hash values are unsigned 32-bit integers in the range [0, 0xFFFFFFFF)
. That is to say, 0xFFFFFFFF
is the only unsigned 32-bit integer that is not a valid hash value. The hash_fn
must return a hash value in the correct range.
Optional entries that may be present in the parameters table include:
hash_fn
: A function that takes a key and returns a hash value. If not given, defaults to the result of calling compute_hash_fn
on the key type.initial_size
: The initial size of the hash table, including free space. Defaults to 8 slots.max_occupancy_rate
: The maximum ratio of occupancy/size
, where occupancy
denotes the number of entries in the table, and size
is the total table size including free entries. Trying to add an entry to a “full” table will cause the table to grow in size by a factor ofmin_occupancy_rate
: Minimum ratio of occupancy/size
. Removing an entry from an “empty” table will shrink the table.— Function ctable.load stream parameters
Load a ctable that was previously saved out to a binary format. parameters are as for ctable.new
. stream should be an object that has a :read_ptr(ctype) method, which returns a pointer to an embedded instances of ctype in the stream, advancing the stream over the object; and :read_array(ctype, count) which is the same but reading count instances of ctype instead of just one.
Users interact with a ctable through methods. In these method descriptions, the object on the left-hand-side of the method invocation should be a ctable.
— Method :resize size
Resize the ctable to have size total entries, including empty space.
— Method :insert hash, key, value, updates_allowed
An internal helper method that does the bulk of updates to hash table. hash is the hash of key. This method takes the hash as an explicit parameter because it is used when resizing the table, and that way we avoid calling the hash function in that case. key and value are FFI values for the key and the value, of course.
updates_allowed is an optional parameter. If not present or false, then the :insert
method will raise an error if the key is already present in the table. If updates_allowed is the string "required"
, then an error will be raised if key is not already in the table. Any other true value allows updates but does not require them. An update will replace the existing entry in the table.
Returns the index of the inserted entry.
— Method :add key, value, updates_allowed
Add an entry to the ctable, returning the index of the added entry. See the documentation for :insert
for a description of the parameters.
— Method :update key, value
Update the entry in a ctable with the key key to have the new value value. Throw an error if key is not present in the table.
— Method :lookup_ptr key
Look up key in the table, and if found return a pointer to the entry. Return nil if the value is not found.
An entry pointer has three fields: the hash
value, which must not be modified; the key
itself; and the value
. Access them as usual in Lua:
local ptr = ctab:lookup(key)
if ptr then print(ptr.value) end
Note that pointers are only valid until the next modification of a table.
— Method :lookup_and_copy key, entry
Look up key in the table, and if found, copy that entry into entry and return true. Otherwise return false.
— Method :remove_ptr entry
Remove an entry from a ctable. entry should be a pointer that points into the table. Note that pointers are only valid until the next modification of a table.
— Method :remove key, missing_allowed
Remove an entry from a ctable, keyed by key.
Return true if we actually do find a value and remove it. Otherwise if no entry is found in the table and missing_allowed is true, then return false. Otherwise raise an error.
— Method :save stream
Save a ctable to a byte sink. stream should be an object that has a :write_ptr(ctype) method, which writes an instance of a struct type out to a stream, and :write_array(ctype, count) which is the same but writing count instances of ctype instead of just one.
— Method :selfcheck
Run an expensive internal diagnostic to verify that the table’s internal invariants are fulfilled.
— Method :dump
Print out the entries in a table. Can be expensive if the table is large.
— Method :iterate
Return an iterator for use by for in
. For example:
for entry in ctab:iterate() do
print(entry.key, entry.value)
end
As mentioned earlier, batching multiple lookups can amortize the cost of a round-trip to RAM. To do this, first prepare a LookupStreamer
for the batch size that you need. You will have to experiment to find the batch size that works best for your table’s entry sizes; for reference, for 32-byte entries a 32-wide lookup seems to be optimum.
-- Stream in 32 lookups at once.
local stride = 32
local streamer = ctab:make_lookup_streamer(stride)
Wiring up streaming lookup in a packet-processing network is a bit of a chore currently, as you have to maintain separate queues of lookup keys and packets, assuming that each lookup maps to a packet. Let’s make a little helper:
local lookups = {
queue = ffi.new("struct packet * [?]", stride),
queue_len = 0,
streamer = streamer
}
local function flush(lookups)
if lookups.queue_len > 0 then
-- Here is the magic!
lookups.streamer:stream()
for i = 0, lookups.queue_len - 1 do
local pkt = lookups.queue[i]
if lookups.streamer:is_found(i)
local val = lookups.streamer.entries[i].value
--- Do something cool here!
end
end
lookups.queue_len = 0
end
end
local function enqueue(lookups, pkt, key)
local n = lookups.queue_len
lookups.streamer.entries[n].key = key
lookups.queue[n] = pkt
n = n + 1
if n == stride then
flush(lookups)
else
lookups.queue_len = n
end
end
Then as you see packets, you enqueue them via enqueue
, extracting out the key from the packet in some way and passing that value as the argument. When enqueue
detects that the queue is full, it will flush it, performing the lookups in parallel and processing the results.
Any hash function will do, as long as it produces values in the [0, 0xFFFFFFFF)
range. In practice we include some functions for hashing byte sequences of some common small lengths.
— Function ctable.hash_32 number
Hash a 32-bit integer. As a hash_fn
parameter, this will only work if your key type’s Lua representation is a Lua number. For example, use hash_32
on ffi.typeof('uint32_t')
, but use hashv_32
on ffi.typeof('uint8_t[4]')
.
— Function ctable.hashv_32 ptr
Hash the first 32 bits of a byte sequence.
— Function ctable.hashv_48 ptr
Hash the first 48 bits of a byte sequence.
— Function ctable.hashv_64 ptr
Hash the first 64 bits of a byte sequence.
— Function ctable.compute_hash_fn ctype
Return a hashv_
-like hash function over the bytes in instances of ctype. Note that the same reservations apply as for hash_32
above.
The CPU’s PMU (Performance Monitoring Unit) collects information about specific performance events such as cache misses, branch mispredictions, and utilization of internal CPU resources like execution units. This module provides an API for counting events with the PMU.
Hundreds of low-level counters are available. The exact list depends on CPU model. See pmu_cpu.lua for our definitions.
— Function is_available
If the PMU hardware is available then return true. Otherwise return two values: false and a string briefly explaining why. (Cooperation from the Linux kernel is required to acess the PMU.)
— Function profile function [event_list] [aux]
Call function, return the result, and print a human-readable report of the performance events that were counted during execution.
— Function measure function [event_list]
Call function and return two values: the result and a table of performance event counter tallies.
— Function setup event_list
Setup the hardware performance counters to track a given list of events (in addition to the built-in fixed-function counters).
Each event is a Lua string pattern. This could be a full event name:
mem_load_uops_retired.l1_hit
or a more general pattern that matches several counters:
mem_load.*l._hit
Return the number of overflowed counters that could not be tracked due to hardware constraints. These will be the last counters in the list.
Example:
setup({"uops_issued.any",
"uops_retired.all",
"br_inst_retired.conditional",
"br_misp_retired.all_branches"}) => 0
— Function new_counter_set
Return a counter_set
object that can be used for accumulating events. The counter_set will be valid only until the next call to setup().
— Function switch_to counter_set
Switch to a new set of counters to accumulate events in. Has the side-effect of committing the current accumulators to the previous record.
If counter_set is nil then do not accumulate events.
— Function to_table counter_set
Return a table containing the values accumulated in counter_set.
Example:
to_table(cs) =>
{
-- Fixed-function counters
instructions = 133973703,
cycles = 663011188,
ref-cycles = 664029720,
-- General purpose counters selected with setup()
uops_issued.any = 106860997,
uops_retired.all = 106844204,
br_inst_retired.conditional = 26702830,
br_misp_retired.all_branches = 419
}
— Function report counter_set [aux]
Print a textual report on the values accumulated in a counter set. Optionally include auxiliary application-level counters. The ratio of each event to each auxiliary counter is also reported.
Example:
report(my_counter_set, {packet = 26700000, breath = 208593})
prints output approximately like:
EVENT TOTAL /packet /breath
instructions 133,973,703 5.000 642.000
cycles 663,011,188 24.000 3178.000
ref-cycles 664,029,720 24.000 3183.000
uops_issued.any 106,860,997 4.000 512.000
uops_retired.all 106,844,204 4.000 512.000
br_inst_retired.conditional 26,702,830 1.000 128.000
br_misp_retired.all_branches 419 0.000 0.000
packet 26,700,000 1.000 128.000
breath 208,593 0.008 1.000
The lib.hardware.pci
module provides functions that abstract common operations on PCI devices on Linux. In order to drive a PCI device using Direct memory access (DMA) one must:
pci.unbind_device_from_linux
.pci.set_bus_master
in order to enable DMA.pci.map_pci_memory
.pci.map_pci_memory
.pci.set_bus_master
.pci.close_pci_resource
.The correct ordering of these steps is absolutely critical.
— Variable pci.devices
An array of supported hardware devices. Must be populated by calling pci.scan_devices
. Each entry is a table as returned by pci.device_info
.
— Function pci.canonical pciaddress
Returns the canonical representation of a PCI address. The canonical representation is preferred internally in Snabb and for presenting to users. It shortens addresses with leading zeros like this: 0000:01:00.0
becomes 01:00.0
.
— Function pci.qualified pciaddress
Returns the fully qualified representation of a PCI address. Fully qualified addresses have the form 0000:01:00.0
and so this function undoes any abbreviation in the canonical representation.
— Function pci.scan_devices
Scans for available PCI devices and populates the pci.devices
table.
— Function pci.device_info pciaddress
Returns a table containing information about the PCI device by pciaddress. The table has the following keys:
pciaddress
—String denoting the PCI address of the device. E.g. "0000:83:00.1"
.vendor
—Identification string e.g. "0x8086"
for Intel.device
—Identification string e.g. "0x10fb"
for 82599 chip.interface
—Name of Linux interface using this device e.g. "eth0"
.status
—String denoting the Linux operational status, or nil
if not known.driver
—String denoting the Lua module that supports this hardware e.g. "apps.intel.intel10g"
.usable
—String denoting if the device was suitable to use when scanned. One of "yes"
or "no"
.— Function pci.which_driver vendor, model
Returns the module name for a suitable device driver (if available) for a device of model from vendor.
— Function pci.unbind_device_from_linux pciaddress
Forces Linux to unbind the device identified by pciaddress from any kernel drivers.
— Function pci.set_bus_master pciaddress, enable
Enables or disables PCI bus mastering for device identified by pciaddress depending on whether enable is a true or a false value. PCI bus mastering must be enabled in order to perform DMA on the PCI device.
— Function pci.map_pci_memory_unlocked pciaddress, n — Function pci.map_pci_memory_locked pciaddress, n
Memory maps configuration space n of PCI device identified by pciaddress. Returns a pointer to the memory mapped region and a file descriptor of the opened sysfs resource file. PCI bus mastering must be enabled on the device identified by pciaddress before calling this function. The 2 variants indicate if the underlying memory mapped file should be exclusively flocked
or not.
— Function pci.close_pci_resource file_descriptor, pointer
Closes memory mapped file_descriptor of sysfs resource file and unmaps it from pointer as returned by pci.map_pci_memory
.
The lib.hardware.register
module provides an abstraction for hardware device registers. This abstraction can be used to declaratively specify and conveniently manipulate structured memory regions via DMA. The functions register.define
and register.define_array
construct Register
objects based on a register description string. The resulting Register
objects can be used to manipulate the defined registers using the methods Register:read
, Register:write
, Register:set
, Register:clr
, Register:wait
and Register:reset
(exact set depends on the register mode).
A register description is a string with one Register
object definition per line. A Register
object definition must be expressed using the following grammar:
Register ::= Name Offset Indexing Mode Longname
Name ::= <identifier>
Indexing ::= "-"
::= "+" OffsetStep "*" Min ".." Max
Mode ::= "RO" | "RW" | "RC" | "RCR" | "RW64" | "RO64" | "RC64" | "RCR64"
Longname ::= <string>
Offset ::= OffsetStep ::= Min ::= Max ::= <number>
A Register
object definition is made up of the following properties:
Register
object. Must be a valid Lua identifier, e.g. "foo"
, "foo_bar"
, "FOO"
etc.register.define
and register.define_array
)."RO"
, "RW"
, "RC"
, "RCR"
"RO64"
, "RW64"
, "RC64"
, "RCR64"
standing for read-only, read-write and counter modes in 32bit and 64bit modes respectively. Counter mode is for counter registers that clear back to zero when read, RCR is for counters that wrap.For instance, the following Register
object definition defines a register range “TXDCTL” in read-write mode starting at offset 0x06028 with 128 registers each of length 0x40.
TXDCTL 0x06028 +0x40*0..127 RW Transmit Descriptor Control
The next example defines a singular register “TPT” in counter mode located at offset 0x01428.
TPT 0x01428 - RC Total Packets Transmitted
— Function register.define description, table, base_pointer, n
Creates Register
objects for description relative to base_pointer. The resulting Register
objects will become a named entries in table using the names defined in description. If an entry in description defines an indexing range then n specifies the index of the register within that range. N defaults to 0.
— Function register.define_array description, table, base_pointer
Creates Register
objects for description relative to base_pointer. The resulting Register
objects will become a named entries in table using the names defined in description. If an entry in description defines an indexing range, an array of Register
objects will be created instead of a singular Register
object.
— Function register.dump table
Prints a pretty-printed register dump of a table of registers.
— Method Register:read
Returns the value of register. For convenience register objects can be called without arguments instead of calling Register:read
. E.g. reg:read()
is equivalent to reg()
.
— Method Register:write value
Sets the value of register to value. Only available on registers in read-write mode. For convenience register objects can be called with an argument instead of calling Register:write
. E.g. reg:write(value)
is equivalent to reg(value)
.
If register is in counter mode it is assumed that the register will be reset to zero upon reading. The read value is added to a register accumulator and the sum of all reads is returned.
— Method Register:set bitmask
Sets bits of register according to bitmask. Only available on registers in read-write mode.
— Method Register:clr bitmask
Clears bits of register according to bitmask. Only available on registers in read-write mode.
Get or set length bits at offset in register. Sets length bits at offset in register to bits if bits is supplied. Returns length bits at offset in register otherwise. Setting is only available on registers in read-write mode.
Get or set byte at offset in register. Sets byte at offset in register to byte if byte is supplied. Returns byte at offset in register otherwise. Setting is only available on registers in read-write mode.
— Method Register:wait bitmask, value
Blocks until applying bitmask to the register equals value. If value is not supplied blocks until all bits in the mask are set instead. Only available on registers in read-write and read-only modes.
— Method Register:reset
Reset the register accumulator to 0. Only available on registers in counter mode.
— Method Register:print
Prints the register state to standard output.
The lib.protocol.header
module contains the base class from which the supported protocol classes are derived. It defines generic methods on all protocol subclasses.
— Method header:new_from_mem memory, length
Creates and returns a header object by “overlaying” the respective header structure over length bytes of memory.
— Method header:header
Returns the raw header as a cdata object.
— Method header:sizeof
Returns the byte size of header.
— Method header:eq header
Generic equality predicate. Returns true
if header is equal to self and false
otherwise.
— Method header:copy destination, relocate
Copies the header to destination. The caller must ensure that there is enough space at destination. If relocate is a true value, destination is promoted to be the active storage for the header.
— Method header:clone
Returns a copy of the header object.
— Method header:upper_layer
Returns the protocol class that can handle the “upper layer protocol” or nil
if the protocol is not supported or the protocol has no upper layer.
For instance, on an Ethernet header object this method might return a IPv4 or IPv6 header class.
The lib.protocol.ethernet
module contains a class for representing Ethernet headers. The ethernet
protocol class supports two upper layer protocols: lib.protocol.ipv4
and lib.protocol.ipv6
.
— Method ethernet:new config
Returns a new Ethernet header for config. Config must a be a table which may contain the following keys:
dst
- Destination MAC (binary representation). Default is 00:00:00:00:00:00
.src
- Source MAC (binary representation). Default is 00:00:00:00:00:00
.type
- Either 0x0800
or 0x86dd
for IPv4/6 individually. Default is 0x0
.— Method ethernet:src mac
— Method ethernet:dst mac
— Method ethernet:type type
Combined accessor and setter methods. These methods set the values of the source, destination and type fields of an Ethernet header. If no argument is given the current value is returned.
Example:
local eth = ethernet:new({src = ethernet:pton("00:00:00:00:00:00"),
dst = ethernet:pton("00:00:00:00:00:00"),
type = 0x86dd})
eth:dst(ethernet:pton("54:52:00:01:00:00"))
ethernet:ntop(eth:dst()) => "54:52:00:01:00:00"
— Method ethernet:src_eq mac
— Method ethernet:dst_eq mac
Predicate methods to test if mac is equal to the source or destination addresses individually.
— Method ethernet:swap
Swaps the values of the source and destination fields.
— Function ethernet:pton string
Returns the binary representation of MAC address denoted by string.
— Function ethernet:ntop mac
Returns the string representation of mac address.
— Function ethernet:is_mcast mac
Returns a true value if mac address denotes a Multicast address.
— Function ethernet:is_bcast mac
Returns a true value if mac address denotes a Broadcast address.
— Function ethernet:ipv6_mcast ip
Returns the MAC address for IPv6 multicast ip as defined by RFC2464, section 7.
The lib.protocol.ipv4
module contains a class for representing IPv4 headers. The ipv4
protocol class supports four upper layer protocols: lib.protocol.tcp
, lib.protocol.udp
, lib.protocol.gre
and lib.protocol.icmp.header
.
— Method ipv4:new config
Returns a new IPv4 header for config. Config must a be a table which may contain the following keys:
dst
- Destination IPv4 address (binary representation). Default is 0.0.0.0
.src
- Source IPv4 address (binary representation). Default is 0.0.0.0
.protocol
- The upper layer protocol, can be 6 (TCP), 17 (UDP), 47 (GRE) or 58 (ICMP). Default is 255.dscp
- “Differentiated Services Code Point” field (6 bit unsigned integer). Default is 0.ecn
- “Explicit Congestion Notification” field (2 bit unsigned integer). Default is 0.id
- “Identification” field (16 bit unsigned integer). Default is 0.flags
- “Don’t Fragment (DF)” and “More Fragments (MF)” fields (3 bit unsigned integer). Default is 0.frag_off
- “Fragment Offset” field (13 bit unsigned integer). Default is 0.ttl
- “Time To Live” field (8 bit unsigned integer). Default is 0.— Method ipv4:dst ip
— Method ipv4:src ip
— Method ipv4:protocol protocol
— Method ipv4:dscp dscp
— Method ipv4:ecn ecn
— Method ipv4:id id
— Method ipv4:flags flags
— Method ipv4:frag_off frag_off
— Method ipv4:ttl ttl
Combined accessor and setter methods. These methods set the values of the instance fields (see new
) of an IPv4 header. If no argument is given the current value is returned.
— Method ipv4:version version
Combined accessor and setter method for the “Version” field (4 bit unsigned integer). Defaults to 4 (set automatically by new
). Sets the “Version” field to version. If no argument is given the current value is returned.
— Method ipv4:ihl ihl
Combined accessor and setter method for the “Internet Header Length” field (4 bit unsigned integer). Set automatically by new
. Sets the “Internet Header Length” field to ihl. If no argument is given the current value is returned.
— Method ipv4:total_length length
Combined accessor and setter method for the “Total Length” field (16 bit unsigned integer). Defaults to header length (set automatically by new
). Sets the “Total Length” field to length. If no argument is given the current value is returned.
— Method ipv4:checksum
Computes and sets the IPv4 header checksum. Its called automatically by new
but must be called after the header is changed.
— Method ipv4:dst_eq ip
— Method ipv4:src_eq ip
Predicate methods to test if ip is equal to the source or destination addresses individually.
— Function ipv4:pton string
Returns the binary representation of IPv4 address denoted by string.
— Function ipv4:ntop ip
Returns the string representation of ip address.
The lib.protocol.ipv6
module contains a class for representing IPv6 headers. The ipv6
protocol class supports four upper layer protocols: lib.protocol.tcp
, lib.protocol.udp
, lib.protocol.gre
and lib.protocol.icmp.header
.
— Method ipv6:new config
Returns a new IPv6 header for config. Config must a be a table which may contain the following keys:
dst
- Destination IPv6 address (binary representation). Default is 0::0
.src
- Source IPv6 address (binary representation). Default is 0::0
.traffic_class
- “Traffic Class” field (8 bit unsigned integer). Default is 0.flow_label
- “Flow Label” field (20 bit unsigned integer). Default is 0.next_header
- “Next Header” field (8 bit unsigned integer). Default is 0.hop_limit
- “Hop Limit” field (8 bit unsigned integer). Default is 0.— Method ipv6:dst ip
— Method ipv6:src ip
— Method ipv6:traffic_class traffic_class
— Method ipv6:flow_label flow_label
— Method ipv6:next_header next_header
— Method ipv6:hop_limit hop_limit
Combined accessor and setter methods. These methods set the values of the instance fields (see new
) of an IPv6 header. If no argument is given the current value is returned.
— Method ipv6:version version
Combined accessor and setter method for the version field (4 bit unsigned integer). Defaults to 6 (set automatically by new
). Sets the “Version” field to version. If no argument is given the current value is returned.
— Method ipv6:dscp dscp
Combined accessor and setter method for the “Differentiated Services Code Point” field (6 bit unsigned integer). Default is 0. This is a sub-field of the “Traffic Class” field. Sets the “Differentiated Services Code Point” field to dscp. If no argument is given the current value is returned.
— Method ipv6:ecn ecn
Combined accessor and setter method for the “Explicit Congestion Notification” (2 bit unsigned integer). Default is 0. This is a sub-field of the “Traffic Class” field. Sets the “Explicit Congestion Notification” field to ecn. If no argument is given the current value is returned.
— Method ipv6:payload_length length
Combined accessor and setter method for the “Payload Length” field (16 bit unsigned integer). Default is 0. Sets the “Payload Length” field to length. If no argument is given the current value is returned.
— Method ipv6:dst_eq ip
— Method ipv6:src_eq ip
Predicate methods to test if ip is equal to the source or destination addresses individually.
— Function ipv6:pton string
Returns the binary representation of IPv6 address denoted by string.
— Function ipv6:ntop ip
Returns the string representation of ip address.
— Function ipv6:solicited_node_mcast ip
Returns the solicited-node multicast address from the given unicast ip.
The lib.protocol.tcp
module contains a class for representing TCP headers.
— Method tcp:new config
Returns a new TCP header for config. Config must a be a table which may contain the following keys:
src_port
- “Source Port Number” field (16 bit unsigned integer). Default is 0.dst_port
- “Destination Port Number” field (16 bit unsigned integer). Default is 0.seq_num
- “Sequence Number” field (32 bit unsigned integer). Default is 0.ack_num
- “Acknowledgement Number” field (32 bit unsigned integer). Default is 0.window_size
- “Window Size” field (16 bit unsigned integer). Default is 0.offset
- “Data Offset” field (4 bit unsigned integer). Default is 0.ns
- “NS” flag (1 bit). Default is 0.cwr
- “CWR” flag (1 bit). Default is 0.ece
- “ECE” flag (1 bit). Default is 0.urg
- “URG” flag (1 bit). Default is 0.ack
- “ACK” flag (1 bit). Default is 0.psh
- “PSH” flag (1 bit). Default is 0.rst
- “RST” flag (1 bit). Default is 0.syn
- “SYN” flag (1 bit). Default is 0.fin
- “FIN” flag (1 bit). Default is 0.— Method tcp:src_port port
— Method tcp:dst_port port
— Method tcp:seq_num seq_num
— Method tcp:ack_num ack_num
— Method tcp:window_size window_size
— Method tcp:offset offset
— Method tcp:ns ns
— Method tcp:cwr cwr
— Method tcp:ece ece
— Method tcp:urg urg
— Method tcp:ack ack
— Method tcp:psh psh
— Method tcp:rst rst
— Method tcp:syn syn
— Method tcp:fin fin
Combined accessor and setter methods. These methods set the values of the instance fields (see new
) of a TCP header. If no argument is given the current value is returned.
— Method tcp:flags flags
Combined accessor and setter method for the TCP header flags (NS, CRW, ECE, URG, ACK, PSH, RST, SYN and FIN). Sets the header’s flags accoring to flags (9 bit unsigned intetger). If no argument is given the current flags are returned.
— Method tcp:checksum payload, length, ip
Computes and sets the “Checksum” field for length bytes of payload and optionally ip. If no argument is given the current value of the “Checksum” field is returned.
The lib.protocol.udp
module contains a class for representing UDP headers.
— Method udp:new config
Returns a new UDP header for config. Config must a be a table which may contain the following keys:
src_port
- “Source Port Number” field (16 bit unsigned integer). Default is 0.dst_port
- “Destination Port Number” field (16 bit unsigned integer). Default is 0.— Method udp:src_port port
— Method udp:dst_port port
Combined accessor and setter methods for the source and destination port fields. Sets the source or destination port individually. Returns the current port if called without arguments. Default is 8 (the UDP header length).
— Method udp:length length
Combined accessor and setter method for the “Length” field. Sets the “Length” field* to length (a 16 bit unsigned integer). If no argument is given the current value of the “Length” field is returned.
— Method udp:checksum payload, length, ip
Computes and sets the “Checksum” field for length bytes of payload and optionally ip. If no argument is given the current value of the “Checksum” field is returned.
The lib.protocol.gre
module contains a class for representing GRE headers. The gre
protocol class only supports the checksum and key extensions and the lib.protocol.ethernet
upper layer protocol.
— Method gre:new config
Returns a new GRE header for config. Config must a be a table which may contain the following keys:
protocol
- Upper layer protocol. May be 0x6558
(Ethernet). Default is nil
.checksum
- Set to true
to enable checksumming. Default is false
.key
- 32 bit unsigned integer. Enables keying if supplied. Default is nil
.— Method gre:checksum payload, length
Combined accessor and setter method for the checksum field. Computes and sets the checksum field for length bytes of payload. If no argument is given the current checksum is returned. Returns nil
if checksumming is disabled.
— Method gre:checksum_check payload, length
Predicate to verify length bytes of payload against the header checkum. Return nil
if checksumming is disabled.
— Method gre:key key
Combined accessor and setter method for the key field. Sets the key field to key. If no argument is given the current key is returned. Returns nil
if keying is disabled.
— Method gre:protocol protocol
Combined accessor and setter method for the upper layer protocol. Sets the upper layer protocol to protocol. If no argument is given the current upper layer protocol is returned.
The lib.protocol.icmp.header
module contains a class for representing ICMP headers. The icmp
protocol class currently supports two upper layer protocols: lib.protocol.icmp.nd.ns
and lib.protocol.icmp.nd.na
. These upper layer protocols implement the headers necessary to perform “Neighbor Discovery”.
— Method icmp:new type, code
Returns a new ICMP header of type which may be either 135 or 136 for lib.protocol.icmp.nd.ns
or lib.protocol.icmp.nd.na
respectively. Optionally code can be supplied to set the “Code” field for the type.
— Method icmp:type type
— Method icmp:code code
Combined accessor and setter methods. These methods set the values of the instance fields (see new
) of an ICMP header. If no argument is given the current value is returned.
— Method icmp:checksum payload, length, ipv6
Computes and sets the “Checksum” field for length bytes of payload. If the lower protocol layer is lib.protocol.ipv6
then ipv6 must be set to a true value.
— Method icmp:checksum_check payload, length, ipv6
Predicate to test if the header’s “Checksum” field matches length bytes of payload. If the lower protocol layer is lib.protocol.ipv6
then ipv6 must be set to a true value.
— Method ns:new target
Returns a new Neighbor Solicitation header. Target is the IP address used for the “Target Address” field.
— Method ns:target target
Combined accessor and setter method for the “Target Address” field. Sets the “Target Address” field to target. If no argument is given the current value is returned.
— Method ns:target_eq target
Predicate to test if the header’s value in the “Target Address” field is equivalent to target.
— Method na:new target, router, solicited, override
Returns a new Neighbor Advertisement header. Target is the IP address used for the “Target Address” field. Router, solicited and override can be boolean values to set the “Router”, “Solicited” and “Override” flags respectively. The default for the flags is 0.
— Method ns:target target
— Method ns:router router
— Method ns:solicited solicited
— Method ns:override override
Combined accessor and setter methods. These methods set the values of the instance fields (see new
) of an Neighbor Advertisement header. If no argument is given the current value is returned.
— Method ns:target_eq target
Predicate to test if the header’s value in the “Target Address” field is equivalent to target.
Both Neighbor Solicitation and Advertisement (lib.protocol.icmp.nd.ns
and lib.protocol.icmp.nd.na
) headers implement an options
method for parsing TLV Options contained in the their payloads.
Example:
-- Parse datagram with ICMP/NA packet
local na = dgram:parse()
-- Parse TLV Options
local options = na:options(dgram:payload())
— Method nd:options payload, length
Parses and returns an array of TLV Options (see lib.protocol.icmp.nd.options.tlv
) from length bytes of payload.
The lib.protocol.icmp.nd.options.tlv
module contains a class for representing TLV Options. Currently only two types of options are implemented: “Source Link-Layer Address” ("src_ll_addr"
) and “Target Link-Layer Address” ("tgt_ll_address"
). Both are represented by the lladdr
class (see lib.protocol.icmp.nd.options.lladdr
).
— Method tlv:new type, data
Returns a new TLV Option object for data of type. Type may be either 1 for “Source Link-Layer Address” or 2 for “Target Link-Layer Address”. Data must be a lladdr
object.
— Method tlv:name
Returns a string denoting the type of the option. Either "src_ll_addr"
for “Source Link-Layer Address” or "tgt_ll_address"
for “Target Link-Layer Address”.
— Method tlv:length
Returns the the size of the TLV Option as multiples of 8 bytes.
— Method tlv:type type
Combined accessor and setter method. Sets the type field (see new
) to type. If no argument is given the current value of the type field is returned.
— Method tlv:option
Returns an object of the class denoted by the type field. Currently that only includes lladdr
instances.
The lib.protocol.icmp.nd.options.lladdr
module contains a class for representing Link-Layer Address Options.
— Method lladdr:new address
Returns a new Link-Layer Option object for MAC address in binary representation.
— Method lladdr:name
Returns the string "ll_addr"
.
— Method lladdr:addr address
Combined accessor and setter method. Sets the address field (see new
) to address. If no argument is given the current value of the address field is returned.
The lib.protocol.datagram
module provides basic mechanisms for parsing, building and manipulating a hierarchy of protocol headers and the associated payload contained in a data packet. In particular, it supports:
It mediates between packets as defined in core.packet
and protocol classes which are defined as classes derived from the protocol header base class in the lib.protocol.header
module.
The contents of a datagram instance are logically divided into three areas: The payload, parsed headers and pushed headers. The datagram payload is a sequence of bytes either inherited from the packet given to datagram:new
or appended using datagram:payload
. The headers in the payload can be parsed using datagram:parse_match
, which will shrink the payload by the header. Finally, synthetic headers can be prepended to the datagram using datagram:push
. To get the whole datagram as a packet use datagram:packet
.
Datagram
A datagram can be used in two modes of operation, called “immediate commit” and “delayed commit”. In immediate commit mode, the push
and pop
methods immediately modify the underlying packet. However, this can be undesireable.
Even though the manipulations are relatively fast by using SIMD instructions to move and copy data when possible, performance-aware applications usually try to avoid as much of them as possible. This creates a conflict if the caller performs operations to push or parse a sequence of protocol headers in immediate commit mode.
This problem can be avoided by using delayed commit mode. In this mode, the push
methods add the data to a separate buffer as intermediate storage. The buffer is prepended to the actual packet in a single operation by calling datagram:commit
.
The pop
methods are made light-weight in delayed commit mode as well by keeping track of an additional offset that indicates where the actual packet starts in the packet buffer. Each call to one of the pop
methods simply increases the offset by the size of the popped piece of data. The accumulated actions will be applied as a single operation by datagram:commit
.
The push
and pop
methods can be freely mixed in delayed commit mode.
Due to the destructive nature of these methods in immediate commit mode, they cannot be applied when the parse stack is not empty, because moving the data in the packet buffer will invalidate the parsed headers. The push
and pop
methods will raise an error in that case.
The buffer used in delayed commit mode has a fixed size of 512 bytes. This limits the size of data that can be pushed in a single operation. A sequence of push/commit operations can be used to push an arbitrary amount of data in chunks of up to 512 bytes.
— Method datagram:new packet, protocol, options
Creates a datagram for packet or from scratch if packet is nil
. Protocol will be used by parse_match
to parse the packet payload. If protocol is not nil
it is set as the initial upper layer protocol. If options is not nil
it must be a table that selects configurable properties of the class. Currently, the only option is the selection of immediate or delayed commit mode by setting the key delayed_commit
to false
or true
, respectively. The default is immediate commit mode.
— Method datagram:push header
Prepends header to the front of the datagram. This method destructively modifies the underlying packet in immediate commit mode and raises an error if the parse stack is not empty.
In delayed commit mode, header is prepended to an intermediate buffer.
— Method datagram:push_raw data, length
This method behaves like the datagram:push method for an arbitrary chunk of memory of length length located at the address pointed to by data.
— Method datagram:parse_match protocol, check
Attempts to parse the next header in the datagram, thereby removing it from the payload. Returns a header instance of class protocol on success. If protocol is nil
the current upper layer protocol as set by datagram:new
or previous calls to parse_match
is used.
If neither protocol nor the upper layer protocol is set or the constructor of the protocol class returns nil
, the parsing operation has failed and parse_match
returns nil
. The datagram remains unchanged.
If the protocol class instance has been created successfully, it is passed as single argument to the anonymous function check.
If check returns a false value, the parsing has failed and parse_match
returns nil
. The packet remains unchanged.
If check is not supplied or if it returned a true value, the parsing has succeeded and the current upper layer protocol of the datagram is set to the value returned by header:upper_layer
.
— Method datagram:parse protocols_and_checks
A wrapper around parse_match
that allows parsing of a sequence of headers with a single method call.
If protocols_and_checks is a sequence of protocol class and check function pairs, parse_match
is called for each pair. Returns the header object of the last header parsed or nil
if any of the calls to parse_match
return nil
.
If called with a nil
argument, this method is equivalent to parse_match
called without arguments.
— Method datagram:parse_n n
A wrapper around parse_match
that parses the next n protocol headers using the current upper layer protocol and subsequent values of header:upper_layer
. It returns the last header object or nil
if less than n headers could be parsed successfully.
— Method datagram:unparse n
Undoes the last n calls to parse_match
on the datagram. E.g. prepends n parsed headers back to the payload. The sequence of parsed headers can be obtained by calling stack
.
— Method datagram:pop n
Removes the leading n parsed headers from the datagram. Note that headers added via push
can not be removed using pop
. The caller has to ensure that the datagram contains at least n headers that were parsed using parse_match
. The sequence of parsed headers can be obtained by calling stack
. This method destructively modifies the underlying packet in immediate commit mode and raises an error if the parse stack is not empty.
In delayed commit mode, the packet is not modified and the parse stack remains valid.
For instance let d be an datagram with an Ethernet header followed by an IPv6 header. Assuming we have parsed both headers using d:parse_n(2)
, we could call d:pop(1)
to decapsulate the IPv6 packet from its Ethernet header.
— Method datagram:pop_raw length, ulp
Removes length bytes from the beginning of the datagram. If ulp is given it is set as the current upper layer protocol. This method destructively modifies the underlying packet in immediate commit mode and raises an error if the parse stack is not empty.
In delayed commit mode, the packet is not modified and the parse stack remains valid.
— Method datagram:stack
Returns the parsed header objects as a sequence.
— Method datagram:packet
Returns a packet (see core.packet
) containing the datagram (including pushed headers).
— Method datagram:payload pointer, length
Combined payload accessor and setter method. Returns a pointer to the datagram payload and its byte size.
If pointer and length are supplied then length bytes starting from pointer are appended to the datagram’s payload.
— Method datagram:data
Returns data
and length
of the underlying packet.
If called in delayed commit mode, the operations accumulated by the push
and pop
methods since the creation of the datagram or the last invocation of datagram:commit are commited to the underlying packet. An error is raised if the parse stack is not empty.
The method can be safely called in immediate commit mode.
The lib.ipsec.esp
module contains two classes esp_v6_encrypt
and esp_v6_decrypt
which implement implement packet encryption and decryption with IPsec ESP using the AES-GCM-128 cipher in IPv6 transport mode. Packets are encrypted with the key and salt provided to the classes constructors. These classes do not implement any key exchange protocol.
The encrypt class accepts IPv6 packets and inserts a new ESP header between the outer IPv6 header and the inner protocol header (e.g. TCP, UDP, L2TPv3) and also encrypts the contents of the inner protocol header. The decrypt class does the reverse: it decrypts the inner protocol header and removes the ESP protocol header.
ESP-Transport
References:
— Method esp_v6_encrypt:new config
— Method esp_v6_decrypt:new config
Returns a new encryption/decryption context respectively. Config must a be a table with the following keys:
mode
- Encryption mode (string). The only accepted value is the string "aes-128-gcm"
.spi
- A 32 bit integer denoting the “Security Parameters Index” as specified in RFC 4303.key
- Hexadecimal string of 32 digits (two digits for each byte) that denotes a 128-bit AES key as specified in RFC 4106.salt
- Hexadecimal string of eight digits (two digits for each byte) that denotes four bytes of salt as specified in RFC 4106.window_size
- Optional. Minimum width of the window in which out of order packets are accepted as specified in RFC 4303. The default is 128. (esp_v6_decrypt
only.)resync_threshold
- Optional. Number of consecutive packets allowed to fail decapsulation before attempting “Re-synchronization” as specified in RFC 4303. The default is 1024. (esp_v6_decrypt
only.)resync_attempts
- Optional. Number of attempts to re-synchronize a packet that triggered “Re-synchronization” as specified in RFC 4303. The default is 8. (esp_v6_decrypt
only.)auditing
- Optional. A boolean value indicating whether to enable or disable “Auditing” as specified in RFC 4303. The default is nil
(no auditing). (esp_v6_decrypt
only.)— Method esp_v6_encrypt:encapsulate packet
Encapsulates packet and encrypts its payload. Returns true
on success and false
otherwise.
— Method esp_v6_decrypt:decapsulate packet
Decapsulates packet and decrypts its payload. Returns true
on success and false
otherwise.
The program.snabbnfv.nfvconfig
module implements a Network Functions Virtualization component based on Snabb. It introduces a simple configuration file format to describe NFV configurations which it then compiles to app networks. This NFV component is compatible with OpenStack Neutron.
NFV
— Function nfvconfig.load file, pci_address, socket_path
Loads the NFV configuration from file and compiles an app network using pci_address and socket_path for the underlying NIC driver and VhostUser
apps. Returns the resulting engine configuration.
The configuration file format understood by program.snabbnfv.nfvconfig
is based on Lua expressions. Initially, it contains a list of NFV ports:
return { <port-1>, ..., <port-n> }
Each port is defined by a range of properties which correspond to the configuration parameters of the underlying apps (NIC driver, VhostUser
, PcapFilter
, RateLimiter
, nd_light
and SimpleKeyedTunnel
):
port := { port_id = <id>, -- A unique string
mac_address = <mac-address>, -- MAC address as a string
vlan = <vlan-id>, -- ..
ingress_filter = <filter>, -- A pcap-filter(7) expression
egress_filter = <filter>, -- ..
tunnel = <tunnel-conf>,
crypto = <crypto-conf>,
rx_police = <n>, -- Allowed input rate in Gbps
tx_police = <n> } -- Allowed output rate in Gbps
The tunnel
section deviates a little from SimpleKeyedTunnel
’s terminology:
tunnel := { type = "L2TPv3", -- The only type (for now)
local_cookie = <cookie>, -- As for SimpleKeyedTunnel
remote_cookie = <cookie>, -- ..
next_hop = <ip-address>, -- Gateway IP
local_ip = <ip-address>, -- ~ `local_address'
remote_ip = <ip-address>, -- ~ `remote_address'
session = <32bit-int> } -- ~ `session_id'
The crypto
section allows configuration of traffic encryption based on apps.ipsec.esp
:
crypto := { type = "esp-aes-128-gcm", -- The only type (for now)
spi = <spi>, -- As for AES128gcm
transmit_key = <key>,
transmit_salt = <salt>,
receive_key = <key>,
receive_salt = <salt>,
auditing = <boolean> }
The snabbnfv traffic
program loads and runs a NFV configuration using program.snabbnfv.nfvconfig
. It can be invoked like so:
./snabb snabbnfv traffic <file> <pci-address> <socket-path>
snabbnfv traffic
runs the loaded configuration indefinitely and automatically reloads the configuration file if it changes (at most once every second).
The snabbnfv neutron2snabb
program converts Neutron database CSV dumps to the format used by program.snabbnfv.nfvconfig
. For more info see Snabb NFV Architecture. It can be invoked like so:
./snabb snabbnfv neutron2snabb <csv-directory> <output-directory> [<hostname>]
snabbnfv neutron2snabb
reads the Neutron configuration csv-directory and translates them to one lib.nfv.conig
configuration file per physical network. If hostname is given, it overrides the hostname provided by hostname(1)
.
The lib.watchdog.watchdog
module implements a per-thread watchdog functionality. Its purpose is to watch and kill processes which fail to call the watchdog periodically (e.g. hang).
It does so by using alarm(3) and ualarm(3) to have the OS send a SIGALRM to the process after a specified timeout. Because the process does not handle the signal it will be killed and exit with status 142.
— Function watchdog.set milliseconds
Set watchdog timeout to milliseconds. Values for milliseconds greater than 1,000 are truncated to the next second. For example:
watchdog.set(1100) == watchdog.set(2000)
— Function watchdog.reset
Starts the timout if the watchdog has not yet been started and resets the timeout otherwise. If the timeout is reached the process will be killed.
— Function watchdog.stop
Disables the timeout.
Servers devoted to the Snabb project and usable by all known developers.
Want to be a known developer? Sure! Just edit the user account list with your user and send a pull request. No fuss.
sudo lock ./snabb ...
. The lock
command will automatically wait if somebody else is running a Snabb process on the same machine and that helps us avoid conflicts for access to hardware resources.luke@snabb.co
your email address(es) to get an invitation to the Lab Slack.Name | Purpose | SSH | Xeon model | NICs |
---|---|---|---|---|
lugano-1 | General use | lugano-1.snabb.co | E3 1650v3 | 2 x 10G (82599), 4 x 10G (X710), 2 x 40G (XL710) |
lugano-2 | General use | lugano-2.snabb.co | E3 1650v3 | 2 x 10G (82599), 4 x 10G (X710), 2 x 40G (XL710) |
lugano-3 | General use | lugano-3.snabb.co | E3 1650v3 | 2 x 10G (82599), 2 x 100G (ConnectX-4) |
lugano-4 | General use | lugano-4.snabb.co | E3 1650v3 | 2 x 10G (82599), 2 x 100G (ConnectX-4) |
davos | Continuous Integration tests & driver development | lab1.snabb.co port 2000 | 2x E5 2603 | Diverse 10G/40G: Intel, SolarFlare, Mellanox, Chelsio, Broadcom. Installed upon request. |
grindelwald | Snabb NFV testing | lab1.snabb.co port 2010 | 2x E5 2697v2 | 12 x 10G (Intel 82599) |
interlaken | Haswell/AVX2 testing | lab1.snabb.co port 2030 | 2x E5 2620v3 | 12 x 10G (Intel 82599) |
You are welcome to play, test, and develop on the lugano-1
.. lugano-4
servers. Once your account is added you can connect like this:
$ ssh user@lugano-1.snabb.co
and check the PCI devices and their addresses with lspci
.
Certain cards (82599 and ConnectX-4) are cabled to themselves. That is, dual-port cards have their ports connected to each other. Certain other cards (X710/XL710) are currently not cabled. If you have special cabling needs then please open an issue on the snabblab-nixos.
All servers run the latest stable version of NixOS Linux distribution.
To quickly install a package:
$ nox <search string>
For other operations such as uninstalling a package, refer to man nix-env
.
If you have any questions or trouble, ask on the #lab channel or open an issue.
We are grateful to Silicom for their sponsorship in the form of discounted network cards for chur
and to Netgate for giving us jura
. Thanks gang!