Programming an Aggregation Network in SOS and Emstar

Rahul Balani, James Carwana and Laura Balzano

EE 206 A, Spring 2005

Here is a zipped file with the SOS code, the Emstar code, and the project report.



From the HW: “The high-level specification of the application is as follows: you have a network of sensor nodes with addresses 1 through , laid out in a  grid with internode spacing d. You do not know N, d, or the radio range R, which may be different from node to node and also change over time. Also, the node address to (row, column) mapping is arbitrary, but a node may know its (row, column) location besides its address. In other words, your application code cannot be configured for specific values of N, d, and R, and does not have knowledge of global topology.”


The nodes are periodically measuring temperature, and the user node is interested to find out the maximum temperature across the field and where that temperature is located (at which node), and the average temperature across the field. The nodes are synchronized, and there is no multi-hop routing available.


The Algorithm


Three main issues were kept in mind while devising the algorithm for this problem. First, we have a user node that wishes to know the maximum temperature value in the network, the location of this max, and the average temperature over the network. Averaging over all the temperature values requires us to keep careful track of which nodes have contributed to a result during any given measurement period. Second, lots of data will be generated and propagated at regular intervals of time, resulting in heavy network traffic. Finally, the solution should be resilient in presence of lossy and broken links.


The solution we chose is a sink-tree algorithm which tries to maintain the tree structure using minimal network messages. It first sets up a tree, then it measures and sends temperature data after every T time period, and finally it repairs the tree when necessary. Thus, the algorithm can be logically divided into three different phases. First, the "discovery" phase is the initial phase where the nodes are discovered and a tree is formed. Second, in the "data" phase, nodes take measurements and send them to the root node while performing in-network aggregation to minimize network traffic. Third, the "help" phase kicks in when a node discovers that the link to its parent is broken and tries to find a new parent. It makes excessive use of two-way handshakes and timeouts for reliable data transfer.


Some of the major assumptions for this algorithm are that all the nodes are time-synced and that link breakages in a neighborhood occur one at a time, that is, if a link breaks it needs enough time to reestablish that link before another link in its neighborhood can break. We define neighborhood here as a circle of radius R where R is the radio range of the node whose link to its parent is broken.


The following sections describe each phase and provide a state diagram. Then we discuss the SOS implementation, the Emstar implementation, and the analysis of our algorithm.

Discovery Phase


The root node begins the discovery phase by broadcasting a "discovery" packet. All the nodes that receive this packet will respond, requesting to be a child of the root node. The root node adds this node to its child list and then responds in confirmation to this child. The child registers its parent upon receipt of this packet. This node then turns around and broadcasts a discovery packet to find its own children. As discovery packets move through the network, nodes will possibly receive multiple discovery packets. In order to minimize complexity, any node only responds to the first discovery packet it receives. The downside of this is that the first link, and not the best link, is chosen as the parent link. Of course, it is difficult to even define “the best link.” So, though the first discovery packet may represent a poor link, our algorithm provides resilience to bad and broken links in the “help” phase.



A node only stores the addresses of its children and its parent. The discovery packets, besides containing the source address, also contain the sending node’s distance to the root node. This is incremented and passed on as the tree is formed. This distance is used in the "help" phase as explained later in its section.


A timer is started as soon as a node sends out its discovery packet. If it doesn't get any response from potential children before the timer fires, the node is declared to be the leaf of the tree. A leaf node then sends a packet up to its parent with its distance to the furthest leaf, which of course for a leaf node is 0. A parent receiving this packet will update its own distance to the furthest leaf and then send the updated packet to its own parent, and so on, till the packet reaches the root. Whenever a node receives a packet with a higher distance than its own current distance to the furthest leaf, it updates that distance and forwards this new packet. So by the end, each node has information on the maximum distance of that node to a leaf in its corresponding sub-tree. This value is used to set an aggregation timer for each node as explained in the "data" phase section.


A pre-decided fixed amount of time, is allocated for this phase to finish. After this, the periodic timer is set for the main "data" phase described in the section below. Nodes, if left undiscovered in this phase, are automatically taken care of in the help phase.


Data Phase



2 bytes

Maximum Temperature

1 byte

ID of node with Max

2 bytes

Total temperature

2 bytes

Number of Nodes incorporated here

2 bytes

A node takes a measurement after every time period T after the end of discovery phase.  Leaf nodes immediately send their data to their parents. The basic packet structure is shown below. In the case of a leaf node, the max and total temperatures will be its own current measurement, the ID of the max node will be its own ID and the number of nodes incorporated into the packet will be one (1). At the point of measurement if a node is not a leaf node, it starts the child_wait_timer, whose period is set according to the distance to the furthest leaf in that node’s subtree. The period of this timer also depends on the maximum number of retries that one node is allowed to make while trying to send a packet to its parent and the amount of time between two consecutive retires.


A non-leaf node aggregates data as soon as it receives it. If it receives all the data from its children before its child_wait_timer fires, then it sends the aggregated data to its parent. But, if the timer fires before it is able to receive all the data, then it removes its reference to the child from which it didn't receive any data, signifying that the link is broken. This immediate removal was a choice we made based on the complexity of finding out whether or not a particular child is still trying to reconnect this broken link. In the future, a set number of measurement periods could be defined such that the child will enter the help phase only after this number of periods, and equivalently a parent will only remove the child after that number of periods.


At this point, the node then sends the stored aggregated data to its parent. The parent in turn confirms the receipt of the data by sending a measurement ack packet back to the child. The root node follows the same procedure, except that it does not forward the final aggregation. At this point, the root node divides the total temperature in the packet by the number of nodes incorporated to find the average temperature over the network. The root also has an idea of the confidence of the data in the packet depending on the number of nodes whose measurements have been aggregated there.


Help Phase


When a node is not able to send a packet to its parent even after trying some maximum number of tries, the link is declared broken and the node broadcasts a help packet containing its distance to the root node. All the nodes that hear the broadcast respond to it with a help_ack packet if their own distance to the root is less than or equal to the received distance to root. This prevents nodes in one’s own subtree from responding to a node's request for help, and thus it avoids loops in the tree. A negative consequence of this is that we prevent other potential "helpers” in a different sub-tree, which have greater distance to root, from responding to the node's request. This is a trade-off we are willing to take.


The node requesting help responds back to the first help_ack packet it receives with a packet containing the latest data from the node. It also includes its own distance from the furthest leaf, as the new parent-child link may increase the depth of the new parent. The new link is finally confirmed by the parent with one more packet to its new child. A new packet containing the distance to the furthest leaf is again propagated up the tree if this new link resulted in an increase in this distance of the parent. Similarly, there is also a need to update the distance to the root of all the nodes belonging to the sub-tree rooted at this new connection. This is accomplished by sending the new distance to root value with measurement ack packets during the data phase. Thus, it needs on average (for a binary tree) iterations for the sub-tree to settle down before any node in it can respond to any other help request correctly. Uncertain behavior may result if this condition is violated.


SOS Implementation


We implemented this algorithm on the SOS mote operating system and simulated a network using both the source code simulator and Avrora.

Coding in SOS


As one network will in general be quite different from another, it would be a great asset for a network simulator to easily allow the placement of nodes given a point in space.  Additionally it would be another important asset to be able to easily specify the radio range of each node as some designers would prefer a high range high range high bit rate radio whereas others would prefer a low range low bit rate network.  These variables are expressed in different ways in SOS and in Avrora so the next two sections will highlight each architecture respectively.


For SOS a network topology can be specified by changing the $sos_base/tools/topo.def file.  A sample file is shown below:



1          0.0       0.0       30.0

2          5.0       5.0       30.0

3          2.5       0.0       30.0    

4          15.0     15.0     30.0


The first number, ‘4’ in the above example, is the number of nodes specified in the topology file.  Following this is the node number, x-coordintae, y-coordinate, and radio range. 


For Avrora a network topology is passed to the simulator as a command line parameter, namely –topology=_path_to_topology_file.  It has a similar architecture to SOS and a simple example is shown below:


node0   0  0   0

node1   9  3   3

node2   12  2   4

node3   2  4   7

In this case the first parameter is the node number, the following parameters are the x, y, and z coordinates of the node respectively.  In order to specify the radio range this must be done in the source code.  This aspect will be discussed later in this document


A C program was written called produce_topodef.c which automates the process of creating a topology file.  This file is included in the source code zip which is set to create a network of N nodes with a spacing d between each node.  The values N and d are specified using the #define statements at the beginning of the program.  Therefore if a user wishes to create a large network where the nodes form a grid this program can be used to make the process of specifying nodes a painless task compared to having to manually enter in the location of each node.



In SOS the sample period T is a function of the maximum number of nodes possible
in the network. Additionally the amount of time allocated for the discovery phase is based on the maximum number of nodes.  Although the equations to determine T and the discovery phase time is allocated assuming the worst possible scenario (each node would have to send the maximum number of retries), it allows the time to scale with the network size causing a transition from a 10 node to a 1000 node network to involve no amount of timer recalculations. 


In the discovery phase a node will respond to the discovery packet by sending a DISC_ACK packet.  Then the sender of the DISCOVERY packet would add this new node to its child list and send back a DISC_ACK_ACK packet as confirmation.  At this point the new node starts a timer called the leaf_wait_timer and begins a discovery sequence itself.  If any node responds to its DISCOVERY packet the above sequence is repeated.  However if no node responds the leaf_wait_timer will expire and the node would know that it is a leaf node.  It will then begin the ‘depth’ process where each node will then know its maximum distance to a leaf.  This is accomplished by having each leaf send a packet to its parent indicating its depth where a leaf node has a depth of zero.  This will continue until the root node determines its maximum depth.  This depth value will then feed a timer called the ‘child_wait_timer’ which determines how long each node should wait for data from its children.  Therefore a node close to the leaves of the tree will not wait long for data from its children whereas nodes closer to the root will wait for a longer period of time.  If the child_wait_timer expires and a parent has not received all the data from its children it will then forward the data it currently has to its parent and remove the nodes which didn’t send it data from its child list.


In the case of a broken link each node will try to send data to its parent a maximum of MAX_RETRIES number of times.  Once this value is passed the node will then broadcast a HELP packet attempting to join another branch in the tree.  The node which answers this HELP request will send a HELP_ACK packet.  The node to join will then send a HELP_ACK_MEAS_DEPTH packet which includes its last measurement (in case the new parent has not sent the current aggregated version to its parent) and its depth.  Therefore if the new parent now has a path to a leaf longer than its current longest (its depth has increased), it will update its depth packet.  A node’s current depth is always part of measurement packets so the new depth will then also flow all the way to the root. 

Data in our case is a temperature reading and is determined by indexing into an array based on the node’s id.  Therefore if a node has an id of 4, then it will read element number four from this array.  Obviously if there is a large number of nodes in the network it would be impossible to store an array of thousands of integers without overwhelming the node’s memory.  To counter this the actual index into the array should be the node id mod number of elements in the temperature array. 



The Avrora simulator utilizes a free space radio model to emulate radio propagation.  This model includes the ability to model not only packet collisions but also a fading transmission channel.  The simulator determines whether there is a collision by simply monitoring the effective transmission range of the nodes and determines whether two nodes are attempting to transmit simultaneously whose transmission ranges collide.  The model of a fading transmission channel is very interesting and gives a sensor network programmer the ability to test under conditions that begin to approach real-world conditions much greater than traditional network emulators.  The basic equation that the Avrora simulator utilizes for a fading channel is:


            rxPower = txPower * lightConst * (1 / (freq^2 * distance^2))


Basically, if the received power is above 9 uW, then the packet is admitted by the receiver.  Otherwise, the packet is considered lost.  The other equation parameters, including transmit power and frequency, can be selected in the source code.  The default frequency is 433 MHz with a transmission power of 10 dBm which allows the largest transmission radius.  If a user decides to transmit at a higher frequency, say 868 MHz, the transmission power falls to 5 dBm.  We actually tested the range under Avrora simulations which produced a range of 18 meters, meaning a node located 18 meters away from a transmitter would be able to receive a packet while a node 19 meters would be out of range. 




As Avrora’s network simulator is much more complex than the SOS source simulator, we first made sure our program works under the SOS simulator for link probabilities in the range of 80 – 100 %.  Afterwards, we began testing in Avrora, which is a difficult tool while debugging because of its lack of available output.  Unfortunately this tool must be used, as the free space radio model is not available in SOS.  We decided to use the hardware UART in order to give our simulations a slight amount of readable output.  Only one node is able to communicate with the UART, so we selected the root node to utilize this output channel.  Then we use ‘sossrv’, a tool provided by the SOS architecture which will communicate with Avrora over a TCP channel.  In our simulations we use the default sossrv output of a set of integers, however this can be changed by simply modifying the sossrv.c file. 


After every period T, sossrv will output five pieces of data: (1) The packet version number, (2) The maximum temperature of the system, (3) The node number where the maximum temperature occurred, (4) The temperature totals (the sum of all the temperatures at each node), and (5) The number of nodes contributing to the temperature totals.  This output gives the user a narrow view into the network while at the same time displaying the final results of maximum temperature and average temperature (calculated using temperature totals divided by the number of nodes contributing to the temperature totals). 




We tested our algorithm in Avrora using ten nodes which were randomly spaced apart.  The result was a perfectly working system with a tree of depth two.  It is impossible to know the actual network link topology as only the root node can output data over the UART, however the root node did show the correct data including average temperature, max temperature, and max temperature node number. 


In order to fully test the capability of our algorithm we decided to create a topology where we can show that our tree can be dynamically reformed.  There are four nodes in this tree as shown in the diagram below.



Node 2 and Node3 are within transmission range of Node 1, and Node 4 is in the transmission range of Nodes 2 and 3.  Therefore there are two possible tree topologies that can be created.  In both cases Node 2 and Node 3 are children of Node 1, however it is impossible to know whether Node 4 would be a child of Node 2 or Node 3 (shown with the dotted lines).  Therefore for part of the simulation we kill Node 2, and for another part we kill Node 3.  In particular Node 2 is killed for the third and fourth iterations and Node 3 killed for the seventh and eighth iteration.  The goal of this simulation is to show that even if the parent of Node 4 is killed, Node 4 should be able to request help, obtain another parent, and continue to send its data towards the root. 


For sake of discussion assume that Node 2 is the parent of Node 4.  After Node 2 is killed in simulation Node 4 is unaware and tries three times to send its data to Node 2.  After this point it will send a HELP packet which will be answered by Node 3.  The new tree will be formed with Node 1 as the parent of Node 3 and Node 3 the parent of Node 4. 


At this point our algorithm is not capable of adding nodes as we use version numbers in order to keep track of the current iteration.  This is useful if for some reason a node receives data with an older version number.  At this point it will discard the packet in order to keep the integrity of the current readings.  The ability to add nodes to the network would only require a small change in the HELP packet transmissions.  Imagine two nodes: Node A and Node B where Node B wishes to join the network at Node A.  Node B will broadcast a HELP packet that would be received by Node A.  Node A would send a HELP_ACK packet which would include the current version number.  At this point Node B could update its version number and would have successfully joined the network.


Emstar Implementation


The Emstar simulator is a linux-based sensor network simulator. The algorithm is programmed as a single process that gets spawned on each node, and the nodes use link devices to communicate with one another. The process has two kinds of events, both timer events and packet events. Upon the firing of a timer or the receipt of a packet, different specified functions get called.


We chose the Emstar implementation to showcase the algorithm’s scalability to N and how that affects the measurement period T. Here we didn’t implement “help” in Emstar. Unfortunately, problems with collisions and the mote MAC layer in Emstar kept us from being able to showcase all that we wanted. In order to show scalability here, either a fast 802.11 radio would need to be used or the timing of the measurement periods would have to be on the order of ten minutes.

Coding in Emstar


Emstar is very well programmed but amazingly poorly documented. We had trouble finding even a single file with a line of comments inside! The emstar_users listserv was very useful, though, as people were very responsive.


A layered protocol is very easy to express in Emstar. Each layer would be a different process, and they communicate on the link or packet devices. Emstar provides a mote level MAC, link and physical layer, and it simulates these for you as well in Emsim. These are easily interchangeable, as long as your application does not directly depend on the time it takes to send packets, which isn’t true of course in this homework.


The radio range is not something we are able to set directly, but there are radio models and radio power levels for the different radios that can be set. The topology can easily be defined as random, but to make a grid topology of an arbitrary number of nodes we would have to write code to calculate node positions and print them to a file, which will then be the input to Emstar. Emview is a really nice tool for visualization, and it was easy to incorporate. The other good thing about Emstar is the automatic maintenance of state.


We believe if we were more familiar with linux, /dev, and the structure of Emstar, debugging this code would have been easier. With our short time requirement we had to resort to logging debug statements to the screen, which of course makes tricky bugs more difficult to find.



In the Emstar code, the length of the discovery phase and T are be set exactly at compile-time. These must be large enough for the number of nodes N. Under the no-collisions hypothesis, for example, 20 seconds set up and 5 seconds measurement interval are both more than enough for 50 nodes in the network. On the other hand, a two minute discovery phase and 30 second intervals are not even large enough for 8 nodes once collisions are introduced.


The temperatures are read from a single file shared by all nodes. The nodes each index into a different place in the file.


The latency is always the upper bound because nodes wait for the timer to expire instead of going ahead and sending their measurements once all the children have reported. It certainly doesn’t have to be this way, we jus ran out of time. The measurement periods can overlap in Emstar, because state is well maintained by the Emstar simulator.



The tests available for simulation of this algorithm fall into two categories, collision-free and collisions. There is a sim file with a grid of 16 nodes and no collisions. Below you can see the two phases of that simulation, where first the leaf nodes are sending their measurements to their parents, and then these parents are sending the aggregated measurements to their own parents. The heavy lines represent packets being transmitted, and the thin lines represent a parent-child connection. Any random topology can be run with the collision-free code, and the sim file for a random topology is available with the collision sim files.


The tests available for collisions include the random topology which takes two arguments, first the total number of nodes and second the dimensions of the space. There are two other topologies, one with 16 nodes on a 48x48 metre space and the other with 16 nodes on a 76x76 metre space. The second simulation works better than the first.



Above you see the first round of measurements. Note that all these leaves have their own measurement only, and their count is 1. Below is the second round, when the middle levels of nodes then forward the measurements to the root. At this point the root node sees 16 nodes reporting, and has the max and average.




In our analysis, we are interested to know what values of T we can achieve for a given deployment of N nodes, with d distance between nodes and R radio range.


We would like to make a few comments before we discuss each of the following sections in detail. The latency and T are related. Obviously the latency can not consistently be longer than T, or else the data will cease to be useful. Another observation is that the relationship distance between nodes d and the radio range R is important. The density of the network is inversely related to d but directly related to R, yet the relation is not a proportional one in either case. When we discuss dense and sparse networks below, we are in a sense abstracting away from d and R.

Traffic Load


In a network of synchronized motes who all want to get their data to one user node, one thing we must be careful about is traffic load. Every node takes a measurement at the same time, and you could imagine the traffic problems we then cause by leaves sending their packets immediately.


There are two issues to deal with here. First, you want to minimize the number of packets that get sent, so that congestion on the links is minimal. We address that in our algorithm by sending O(N) packets per measurement interval. A node aggregates measurements from its subtree in that message.


Once there is traffic on the wireless channel, the next issue is dealing with collisions. The MAC algorithm is there to handle this problem, but this isn’t always reliable enough. Especially in motes doing synchronized measurements, some folks in the CS and EE departments have told us that the MAC layer has not been solid enough to handle this. Transport layer protocols can help by sending acknowledgements and resend packets after timeouts, and these protocols must be carefully defined so that many nodes trying to get through don’t simply continue to collide with one another over and over again, especially in a slow radio environment of a network of motes.


See more discussion on traffic load in the scalability section below.



Latency is bounded by the timer at a node that waits for its children’s measurements during each iteration. If this timer is set at T_wait, then the maximum latency is the maximum distance from the root node to a leaf node times this T_wait value. The higher this T_wait value is set at, the more time a node’s children has to aggregate and send up their measurements, and the more complete and accurate the aggregation will be (see the accuracy section below). In a good situation, latency will be lower than this bound because nodes will send their packet up as soon as they have received all the aggregated data from their children. This depends on how long it takes for a number of children to send up their aggregated measurements, which in turn depends on the traffic flow and the transport and MAC protocol (as described above in the traffic section).


The maximum distance from the root node to a leaf node is O(log N) for N nodes. For dense networks, parents will have a lot of children and the overall depth of the tree will be small. The tradeoff here is that when parents have lots of children, T_wait needs to be longer to ensure that the node can receive all its children’s packets. This again depends on the MAC algorithm.


To look at an example, slotted ALOHA has an upper bound on throughput of e-1. This means that the number of packet transmission slots a parent should set in its T_wait should be more than three times the number of children it has. This doesn’t even take into consideration the fact that our tree is not a single shared medium, and so really the T_wait will have to be even longer to ensure no collisions between layers in the tree.


The measurement period T should be longer than the average latency. It doesn’t have to be longer than the maximum latency, though. Our algorithm is robust to the case where some measurement packets haven’t yet reached the root, and leaf nodes begin sending a new measurement.



Scalability is an issue that involves both the size and density of the network.


The size of our network is fairly scalable. Maintaining large trees is difficult, but we have kept the maintenance as localized as possible. The only non-local overhead maintenance messages are those that send the distance to the furthest leaf up the tree to the root node. Locally, the overhead is a broadcast message from the node that needs a new parent plus a response to that node by all potential parent nodes (nodes higher in the tree). Earlier we mentioned that a trade-off of only accepting parents at our node our higher in the tree was that we cannot receive help from potential parents in separate sub-trees. A positive side-effect comes in here, in that we won’t get messages filling the wireless channel from every node that hears a help packet.


As for density scaling, our tree discovery is too simple to deal with the problems introduced. If a network is very dense, parent nodes will have many children. This can cause more traffic problems at the measurement intervals when many nodes are trying to send to one node. A good way around this would be to more carefully form the tree. We could tolerate a longer discovery phase if nodes did some intelligent parent-child selection. These algorithms get complex very quickly, though, so the careful design of a tree-building algorithm would be crucial here.




The maximum temperature, its location, and the average temperature are all as accurate as the communications links are stable. Nothing in our algorithm requires uncertain convergence except in the presence of lossy or broken links. In these situations, the root node in the end knows exactly how many nodes of the network participated in the aggregate. In this case, the accuracy and confidence in the result can be estimated.


For the sake of discussion, first let’s assume that the phenomenon is slowly varying over space. For very small numbers of missing nodes, then the user node knows the average is fairly accurate. As the number of missing nodes grows, the accuracy confidence intervals become much larger, because we could either be missing several leaf nodes or we could be missing entire subtrees of the network.


On the other hand, if the phenomenon is not slowly varying over space, we have a more interesting problem. There could be a few nodes in crucial areas where the phenomenon is drastically different than in other areas, and thus when those nodes are missing our accuracy decreases rapidly. This algorithm does not collect enough information as it aggregates its packets to be able to deal with this scenario.

Resource Usage


The algorithm implementation in SOS uses minimal resources on the mica2 motes as seen in the table below.



Code Size (bytes)

Memory Usage – State size (bytes)








Another feature of our algorithm is that the network packet size does not depend on the number of nodes in the network, and it doesn’t grow in size as it travels from the leaves towards the root node. The size of each data packet containing aggregated data from a node is only 9 bytes. Other packets (Discovery phase packets, Acknowledgements) contain maximum 1 byte data.

Division of Labor


All three James, Rahul and Laura spent time together designing the algorithm, and when algorithm problems came about we shared the responsibility of finding new solutions.


Throughout the entire implementation process, the three of us were in close contact, at all times aware of the progress and difficulties the others were experiencing. James and Rahul coded SOS and Laura helped test it. James ramped up on Avrora, and Laura ramped us up on Emstar and Emview. Laura and Rahul coded Emstar.


Although James was more involved in SOS and Laura more involved in Emstar, we tested the code together, as testing and debugging is the best time for getting a feel for how the code really works.