Sunday, November 2, 2014

ssh to Windows from an iPad

My main computer is Windows, and sometimes I'd like to connect to it via ssh with an iPad and iSSH.  The reason I do this is to get access to an R console on iPad.  The App store has an R console app, but it's limited to packages that the app author included and it can't be extended.  The best way to get a full R installation on an iPad is to ssh to a computer with a full R installation.

A computer needs an active sshd (ssh daemon) to accept ssh connections. Windows doesn't provide an sshd and requires special software.  Many options are available, but most of them are limited or not free.  The best free solution that provides full functionality is OpenSSH, and the best way to get OpenSSH is through Cygwin.

I won't repeat all the steps here, but instead I will supply what I found to be the most helpful resources, and raise some potential obstacles.  NOTE that any instructions completed within a Cygwin terminal will usually require you to launch the terminal 'as administrator,' which you can do by right clicking the executable.

First go get Cygwin:

Before installing Cygwin read through the instructions for setting up an sshd with OpenSSH.  Cygwin is big and you don't need all of it; you can install just the portions you need.  The best walkthrough I've found for setting up OpenSSH's sshd is here:

If you install Cygwin, get the right packages, and start the above walkthrough, you're likely to encounter an error at the portion of the walkthrough where you invoke ssh-user-config.  You might get something to the effect of a home directory that doesn't exist.  This is because Cygwin chooses the wrong default home directory, unless you've set the correct home directory in a Windows global environment variable named HOME.  Thing is, you probably don't want to set a global environment variable for your home folder, but would rather use a user environment variable, but this isn't good enough for Cygwin.  The solution is to regenerate a special file that Cygwin uses to find things like your home directory.  This file resides in C:\\cygwin64\etc\passwd--Cygwin uses some special POSIX to Windows file system conversions, however, and will refer to this file as /etc/passwd.  You could go changing this file by hand with a text editor, but the simplest solution is to invoke mkpasswd and regenerate the /etc/passwd file to point to the correct home directory.  Launch a Cygwin terminal and enter the following:

mkpasswd -l -p "$(cygpath -H)" > /etc/passwd

This tells Cygwin to regenerate /etc/passwd.  The l switch tells mkpasswd to print the local user account--that's you--and the p switch tells it to set the home directory to the specified file path instead of using Cygwin's default method of finding the home.  The next bit, between the quotes, is the argument to p--the specified home path.  We enclose this in quotes because the home path may contain spaces, which must be enclosed in quotes.  The $( is a special syntax that starts a subshell, a shell script within a shell, and returns the results.  Here we make a subshell to invoke cygpath to get the value for the argument to the p switch of mkpasswd.  Are we confused yet?  And why can't we just type out the home path ourselves?  Cygwin is bastard child of scandalous trysts between Windows and POSIX, and while it operates in Windows it still needs POSIX style paths.  The cygpath utility helps convert between Windows and POSIX paths.  The H switch to cygpath is a convenience switch that returns the homeroot of the current user.  Lastly, the > is telling mkpasswd to write the results to the /etc/passwd file--it's a standard *nix thing.  Diligent readers may note that /etc/passwd has already been modified by the ssh-host-config command, but don't worry mkpasswd will keep all that intact and rewrite only the entry for the user's home.

If none of that worked, then there's some more resources on this problem here:

Now you can get back to the walkthrough, and you should be able to finish everything normally.

The next step is to try connecting.  If you could connect to the localhost, as described in the above walkthrough, then you are ready to go.  Launch iSSH and create a new configuration.  You will need to know your computer's IP address on the local network.  To find this launch Window's usual command line terminal--not Cygwin--and type ipconfig /all.  That will list all the IP stuff, there will be a lot of it, but there should be only one IPv4 entry, and that's what you need.  It should be four integers separated by periods, the first integer should be 192, but not always.  In iSSH enter the IPv4 value in the Host field.  Enter 22 in the Port field, and enter your Window's user name in the Login field.  You should also go ahead and enter your Window's login password in the password field.  It should be optional, but for me if I didn't do this the connection would often 'drop unexpectedly' after I entered my password at the prompt; I never had this problem after setting my password here in the configuration setup.

You will probably now encounter another problem: connection timeout.  Windows firewall likely has port 22 blocked.  To open port 22 follow these instructions:

Now try connecting again through iSSH.  If everything works you should find yourself at a Cygwin prompt.  Just type R and you're cookin' with gas!

If you royally fucked up any of the steps and feel like you need to nuke your sshd stuff and start over, then here's an entry at Superuser that can help you do that:

Configuration finished.  Have fun!


Wednesday, October 15, 2014

R's quirky nested assignment

R has some quirky features that I think exist to serve its functional paradigm.  One of the oddest is what I know only as nested assignment, as demonstrated in this function for the Collatz sequence:

collatz <- function(x) {
  cseq <- c(x)
  while (x>1) cseq <- c(cseq, x<-ifelse(x%%2<1, x/2, 3*x+1))
  cseq
}

Here in the call to combine--c--we've also assigned x a new value.  The really odd thing about this is that most general-purpose non-lisp style languages use the equal sign = as the assignment operator.  In function calls the assignment operator is slightly different, where k=v usually means "set the argument named k to the value v," and that's exactly what the equal sign in R's function calls also mean.  But since R uses a little ASCII arrow <- for the assignment operator, it is still free to make assignments outside the scope of the called function within the argument list of the called function!  Because R is a functional language, most function calls must be assigned to something.  In this sense assignments within function calls are nested.  Kinda weird, but it saves lines of code and is probably suited to a functional style where one might (tastefully) chain together a couple (anonymous) functions.

The next post will look at the functions behind R's operators and how to use them with sapply and friends.  As it turns the assignment operator is a function!  The subset operator is also a function and this enables some really elegant solutions for data management using R's base functions.

Sunday, October 5, 2014

Sliding windows in R

The sliding window is conceptually simple but sometimes tricky to implement.  R sadly lacks off-the-shelf options for these prototypical data structures--the zoo package provides rollapply, but it's an overwhelming function for simple tasks.  Fortunately it's easy to implement your own sliding window with base functions.  The two functions we'll need are rle (for: run length encoding) and cumsum (cumulative sum).

rle requires a sorted vector and will find the lengths of the runs of values--thus run length encoding.  For example let:

X <- c(5, 5, 5, 6, 6, 7, 9, 9)

Which has an rle of:

(3, 2, 1, 2)

for values:

(5, 6, 7, 9)

Next we obtain the cumulative sum of our run length encoding.  For the rle of X from the previous example:

csum <- c(0, cumsum(c(3, 2, 1, 2))) ## == c(0, 3, 5, 6, 8)

We need to add the zero on to the front of this vector, it's not part of the usual cumsum but we need it.  Now we can index the original vector, Xusing the cumulative sum; just create an integer vector from beginning and ending points in the cumulative sum vector.  For example, if the cumulative sum vector is csum and the original vector is X, then we index values 5 through 7 in X with:

X[(csum[1]+1):csum[4]] ## == (5, 5, 5, 6, 6, 7)

and sliding forward we get values 6, 7, and 9 with:

X[(csum[2]+1):csum[5]] ## == (6, 6, 7, 9, 9)

Now we have a means to slide over a dataset of something like dates, where the range of dates remains constant--for example seven days--but the number of cases within the sliding range could vary.

csum's first element is zero and each succeeding element is the total count of cases with values equal to or lesser than that element's run value.  Because we need to add zero to the beginning, csum is one longer than the number of run values and the zero does not associate with any run value.  The starting value of the integer vector for a window over a run of values is the cumulative sum of the preceding counts of the runs of values plus one; and that's why we add one to the starting value from the cumulative sum.  For example, the starting value for a window going over 6, 7, and 9 is going to be the index of the last case of the previous run value--5--plus one.  There are three 5s and they are the first three elements in all the runs, so the index of their last case is three.  Add one for four and that's the index of the first element of the next group--the 6s:

(csum[2]+1):csum[5] ## == 4:8 == (4, 5, 6, 7, 8)
X[4:8] ## == (6, 6, 7, 9, 9)

Putting this all together, what we need is a list of the integer vectors that reference the cases in each iteration of the sliding window, and we might even want the window to move forward in step sizes larger than one.  Here's an easy to use function:

slide <- function(X, width, a=1, step.size=1) {
  ## X must be sorted
  rle.X <- rle(X)
  cum.rle <- c(0, cumsum(rle.X$ lengths))
  if (length(cum.rle) < 2) return(NA)
  A <- seq(a, length(cum.rle)-width, step.size)
  Z <- seq(width+a, length(cum.rle), step.size)
  slides <- mapply(function(a, z) (cum.rle[a]+1):cum.rle[z], A, Z)
  names(slides) <- rle.X$ values[A]
  return(slides)
}

slide returns a list of integer vectors for referencing each iteration of a sliding window that you can use with lapplysapply, or similar functions:

mydframe <- data.frame(x=sample(1:10, 100, replace=TRUE), y=rnorm(100))
mydframe <- mydframe[order(mydframe$ x), ]
myslides <- slide(mydframe$ x, 3)
do.stuff.to <- function(x) x
myresults <- lapply(myslides, function(N) do.stuff.to(mydframe[N, ]))

Viola!  That's it!  Just write your own do.stuff.to function.