@(#)$Id: README,v 2.0 2005/05/02 22:25:44 huebsch Exp $ Copyright (c) 2004 Regents of the University of California. All rights reserved. This file is distributed under the terms in the attached BERKELEY-LICENSE file. If you do not find these files, copies can be found by writing to: Computer Science Division, Database Group, Universite of California, 617 Soda Hall #1776, Berkeley, CA 94720-1776. Attention: Berkeley License Copyright (c) 2004 Intel Corporation. All rights reserved. This file is distributed under the terms in the attached INTEL-LICENSE file. If you do not find these files, copies can be found by writing to: Intel Research Berkeley, 2150 Shattuck Avenue, Suite 1300, Berkeley, CA, 94704. Attention: Intel License Inquiry. PlanetLab Application Control Written by: Ryan Huebsch =============== Client Scripts =============== The client scripts must be customized for your applications. All the client scripts run either on a PlanetLab node, or any node that can communicate with the server and PlanetLab nodes (i.e. its on the Internet). The provided scripts can be used for most applications, although some customization is still REQUIRED! You are also free to write/change these scripts to met your exact needs. There are 2 main task that must be accomplished: 1) Installing software on PL nodes 2) Periodic status reports from PL nodes ***************************** SECTION 1: Installing on PL ***************************** The server (db server) maintains the list of which nodes need to be installed. A program should be periodically run that contacts the db server to see if there are any nodes which should be installed, performs the necessary installation, then reports the installation status to the server. The database can support an unlimited number of simulatenous installations (i.e. multuple instances of teh install program running). The server uses a simple protocol to communicate with the scripts. Communication occurs through a HTTP interface on the server. There are two types of transactions: A) Get the IP of a node needing installation - Client -> Server: (HTTP GET OR PUT) Get http://server/installStatus.php?appid={APPID}&auth={CODE}& getip=TRUE [ NO BREAK LINE IN ABOVE URL ] * {APPID} - The application ID assigned by the database server * {CODE} - A user selected PIN like number. Used for security - Server -> Client: (HTTP RESPONSE) {IP}#{REMAINING}#\n{COMMENTS} The server responds with a TEXT document containing 3 fields using a "#" as the delimiter * {IP} - The IP of a node needing to be installed, this field will be set to zero if there are no nodes waiting to be installed * {REMAINING} - The total number of nodes (including the one in this message) requiring installation. This can be useful to determine degree of parallelism you want in your installer * {COMMENTS} - Human understandable text. Will contain the error message if appropriate B) Report the status of an installation - Client -> Server: (HTTP GET OR PUT) Get http://server/installStatus.php?appid={APPID}&auth={CODE}& nodeip={IP}&outcome={OUTCOME}&text{TEXT} [ NO BREAK LINE IN ABOVE URL ] * {APPID} - The application ID assigned by the database server * {CODE} - A user selected PIN like number. Used for security * {IP} - The IP address that the install was attempted on * {OUTCOME} - 'success' or 'failure' * {TEXT} - Arbitrary text useful to a human. It is preferred if 'SUCCESS' is returned if successful and the error message otherwise - Server -> Client: (HTTP RESPONSE) 0#0#\nFINISHED The literal string above corresponds to transaction A's response when there are no more IPs to be installed It is possible to combine both transactions into a single GET/RESPONSE exchange. - Client -> Server: (HTTP GET OR PUT) Get http://server/installStatus.php?appid={APPID}&auth={CODE}& nodeip={IP}&outcome={OUTCOME}&text{TEXT}&getip=TRUE [ NO BREAK LINE IN ABOVE URL ] - Server -> Client: (HTTP RESPONSE) {IP}#{REMAINING}#\n{COMMENTS} The provided script statusInstall.sh does this task. Once executed it will continue to fetch IPs needing installation, call a user-defined script to perform the installation, report the outcome and repeat the process until the server reports no more IP's requiring installation. You can execute this script with a CRON job on non-PlanetLab machine. In general installation, is performed periodically even if there are no software updates. I found this useful given the instability of PlanetLab, slices are occasionally 'dropped' and appear to be unresponsive to the server. A peridoic installation solves this problem is a simple solution the server can do automatically. I use a simple rsync for installation, so there is only a small bandwidth cost to this method. The package comes with a basic installation script, appInstall.sh. This script does two major tasks. First it will rsync a local directory onto the PlanetLab machine, and second it will execute a script on the PlanetLab machine to setup CRON. A) RSYNC The directory you rsync should contain everything. By everything I mean, a copy of the client scripts, your application, and auxillary programs your applications needs, such as a JVM. A potential directory tree may look like: /pl-image - your local directory /scripts - a copy of the client scripts /program - your program /java - a copy of the jvm B) CRON Cron is setup to execute the status script periodically. I run it every fifteen minutes, but 30-60 should also be fine. Note, the script will never end unless it crashes. Attempting to start the status script while another instance is running will cause the new one to immediately exit. Only if the existing instance is stuck, will it be killed and the new instance take over. So the cron job is specifying how often the script itself should be checked to see if it crashed. How often the database is contacted is decided at runtime The setup of cron is slightly complicated so there is a separate script, cronInstall.sh that does it. This script *must* be somewhere in the directory that was rsync'ed, probably the scripts directory. The cronInstall.sh script does 3 things: 1) Start crond if it is not running 2) Add crond to rc.vinit if it is not there 3) Add a cronjob to run the status script (described later) To control the operation of the packaged scripts, edit settings.sh. Just about everything is configurable by changing variables in that script. In particular: - specify the install script (appInstall.sh), cron script (cronInstall.sh) - specify the local directory for rysncing - specify the slice name - specify the appid and authcode for the db server - specify the cron job that should be installed, here is where can modify how often it is run - the URL for the db server ***************************** SECTION 2: Periodic status reports ***************************** The database server is able to keep track of each node through periodic status reports. Status reports are pushed to the server, i.e. the client initiates the transaction. This transactions also allows the database server to give new instructions to the node, such as tell the node when to contact the server again, start/stop the application and feed it arguments. - Client -> Server: (HTTP GET OR PUT) Get http://server/reportStatus.php?appid={APPID}&auth={CODE}& ip={IP}&status={STATUSCODE}&build={BUILD}&v=2 [ NO BREAK LINE IN ABOVE URL ] * {APPID} - The application ID assigned by the database server * {CODE} - A user selected PIN like number. Used for security * {IP} - The IP address of the node that is reporting * {STATUSCODE} - An integer representing the nodes current state: 1 = ERROR 2 = OFFLINE, app not running 3 = ONLINE, app is running * {BUILD} - An integer representing the version of software on the node. * {NODESTATUS} - An arbitrary text string reported to server - Server -> Client: (HTTP RESPONSE) {NEWSTATUS}#{DELAY}#{PARAMS}#{CIDELAY}#\n{COMMENTS} * {NEWSTATUS} - An integer representing what the server wants the state to be. It can be one of the above, or 4 = RELOAD, stop and start the app 5 = HOLD, if the app is running do not stop it, if it stopped do not start it. * {DELAY} - The number of seconds delay before changing the status if needed * {PARAMS} - Arguments that should be passed to the application start script. * {CIDELAY} - The number of seconds till the server should call home again. * {COMMENTS} - Human readable error messages The statusCheck.sh script performs the above transaction. You may list two URLs for the status reports. The reason for this is that our db server is not connected directly to Internet 2, so for the PlanetLab Internet 2 only nodes, they will use the backup URL which is to a machine on Internet 2 which acts as a proxy. The script does its work with the help of four helper scripts. - appStart.sh - starts the application up in the background. This script *should* exit quickly. The provided script will execute yet another script which actually does the starting. This script records the PID for later use. - appStop.sh - stops the application if it is currently running. The provided script will call kill with the PID recorded by the start script. - appBuild.sh - exits with an exit code equal to the current build installed on this node. The provided script will read a file called build, which contains the build number. - appCheck.sh - checks whether the application is running an exits with a return code of 2 (offline) or 3 (online) The provided script will call ps and determine if the PID recording by the start script is listed. *** The PID start/check/stop scripts do not work reliably. If you want to fix them that would be great, otherwise use your own scripts. I use the provided appStart script which executes a small script file that simply starts my JAVA application. I use a small script file since it adds its own arguments. For status check, I check if 'java' is listed in PS. For stopping, I do a killall java. This works for me since I know the only java application running in my slice is my application. You may need to come up with other means. To make these custom check/stop commands easy, you can specify them in the settings file and those will take precedence over the PID based method. The script will run a main loop for ever, each cycle of the main loop, default every 60 seconds, the status and build are checked. If either has changed, or the number of seconds since it last contacted the database server exceeds the delay previously retrieved, the server is contacted. To control the operation of the packaged scripts, edit settings.sh. Just about everything is configurable by changing variables in that script. In particular: - The location of the start, stop, build, and check scripts - The location of the file containing the build number - Custom start/stop/check commands