Monday, November 02, 2009

A few useful Windows HPC commands

0. Set Default scheduler
set CCP_SCHEDULER=headnode_name

1. clusrun
One can run the following command from his desktop to setup compute node environments

clusrun /user:domain\username dir
clusrun /user:domain\username xcopy \\headnode_name\dir .
clusrun /user:domain\username setx PATH "somepath"

2. job
job list /scheduler:headnode_name

3. node
node list /scheduler:headnode_name

5. cluscfg
cluscfg setcreds


Sunday, November 01, 2009

Imporve serialize() in R for windows

serialize() function of R is very slow in Windows. My observation is that calling realloc is the bottle neck. The workaround can be replacing realloc with Rm_realloc in $R_HOME/src/main/serialize.c. More specifically, I have updated as follows:

1. In function, resize_buffer(...), replace
mb->buf = realloc(mb->buf, newsize);
with
mb->buf = Rm_realloc(mb->buf, newsize);

2. In function, free_mem_buffer(...), replace
free(buf);
with
Rm_free(buf);


The following is a short result running on my Windows 7 desktop:
The original:
> system.time(serialize(matrix(0, 1000, 1000), NULL))
user system elapsed
5.74 4.39 10.15
> system.time(serialize(matrix(0, 2000, 2000), NULL))
user system elapsed
85.40 74.80 161.62

After updating:
> system.time(serialize(matrix(0, 1000, 1000), NULL))
user system elapsed
0.78 0.30 1.10
> system.time(serialize(matrix(0, 2000, 2000), NULL))
user system elapsed
6.21 4.13 10.54

Saturday, October 31, 2009

Compile Rmpi with Windows MPI (HPC pack)

Compile Rmpi (http://www.stats.uwo.ca/faculty/yu/Rmpi/) in Windows

1. Install HPC Pack 2008 SDK with SP1 and modify line 314 in mpi.h (installed_dir\Include) as follows:
//typedef __int64 MPI_Offset;
typedef long long MPI_Offset;

2. Download Rmpi from http://www.stats.uwo.ca/faculty/yu/Rmpi/download/linux/Rmpi_0.5-7.tar.gz

3. Untar the source and modify src/Makevars.win by changing directory and library option (-lmsmpi) as follows:
PKG_CFLAGS = -I"C:\Program Files\Microsoft HPC Pack 2008 SDK\Include" -DMPI2 -DWin32
PKG_LIBS = -L"C:\Program Files\Microsoft HPC Pack 2008 SDK\Lib\i386" -lmsmpi

4. Compile and install
R CMD INSTALL Rmpi

5. Build for re-distribution
R CMD build --binary Rmpi

Wednesday, October 28, 2009

Setting up a non-admin SVN repository shared with multiple users

A trick is using SSH with public key as described in

As a quick summary, I have set up a SVN repository as follows:
1. create a svn root directory, which will share with others, and create a repository:
$ mkdir /path/svnshare
$ svnadmin create /path/svnshare/project

2. Get public keys of user to share svn repository and insert to ~/.ssh/authorized_keys the following line:
command="/path/to/svnserve -t -r /path/svnshare --tunnel-user=[USERID]",no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty [KEY_TYPE] [PUB_KEY] [COMMENT]
[.] should be replaced with users' information

3. Now users can access the shared svn repository remotely as follows:
svn co svn+ssh://[MYID]@[SERVER IP or DNS]/project

Note that use relative path names of repository after server ip or dns

Monday, June 29, 2009

More tips for using R with GotoBLAS

After building R with GotoBLAS (See http://jychoi-report-cgl.blogspot.com/2009/04/compile-r-with-gotoblas.html), a few things we can do for verification.

1. Download a R benchmark script (http://r.research.att.com/benchmarks/R-benchmark-25.R) and run it to check if every step works ok. (You can compare the performance with ones from normal R build too.)

2. If R is hang in calling eigen() function, try to rebuild GotoBLAS and R by using the same fortran compiler.

3. If nothing works, one can use ATLAS instead of GotoBLAS

Friday, April 24, 2009

Compile R with GotoBLAS

I want to share my experience to use GotoBLAS as an external multithread library of R. I didn’t make through performance test with other libraries (such as ATLAS) but I did got lot of performance gains with GotoBLAS in using R.

1. GotoBLAS from http://www.tacc.utexas.edu/resources/software
Follow instructions in 02QuickInstall.txt.

2. CBLAS from http://www.netlib.org/blas/blast-forum/cblas.tgz
This is not required for using R but you may need this for using GSL(GNU Scientific Library)
a. If the architecture is Linux, type
$ ln –s Make.LINUX Make.in
b. In Make.in, Modify BLLIB, CBDIR and –fPIC –lpthread to LOADER option.
c. Type make all for building libraries and testing
d. After completing, go to lib/LINUX and type
$ ld -melf_x86_64 -shared -soname libgotocblas.so -o libgotocblas.so cblas_LINUX.a

Note. try to use –m64 or –m32 if you are working with powerpc

3. R from http://cran.r-project.org/
Run configure as follows:
$ export GOTOBLAS_LIB=PATH/TO/GOTOBLAS_LIB
$ mkdir build_goto; cd build_goto;
$ ../configure --prefix=$HOME/usr/R/ --with-blas="-L$GOTOBLAS_LIB -lgotoblas -lpthread" --enable-R-shlib --enable-R-static-lib --enable-BLAS-shlib

Tuesday, April 07, 2009

Create AVI from R plot

1. In R, save plots as postscript (or png) files by adding sequence number. For example,

postscript(sprintf("%04d.eps", i))

2. By using ImageMagick’s convert command line tool, convert images from eps to png (You may skip if you have png files)

for f in *.eps; do convert -rotate 90 $f png32:$f.png; done

You may want to add the following options:

-resize 1280x720 : change image size
-bordercolor white -border 0x0 : add white background

3. Create AVI by using ffmpeg

ffmpeg -r 15 -sameq -i %04d.eps.png out.avi

You can control frame rate (-r rate) and quality (-sameq or –b bitrate)

 

Note:

If you need speed-up in conversion, you may try to use a bash script parallel.sh in http://pebblesinthesand.wordpress.com/2008/05/22/a-srcipt-for-running-processes-in-parallel-in-bash/

Sunday, February 01, 2009

Summary of the recommendation system survey paper

I’ve found a good survey paper about recommendation systems as follows:

Gediminas Adomavicius, Alexander Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734-749, June, 2005.

A short summary:

-. A short definition of recommendation system: a problem of extrapolation for predicting unknown values

-. 3 approaches: i) Content-based, ii) Collaborative, and iii) Hybrid

i) Content-based RS(Recommendation System): a user’s feature is computed solely based on the user’s activity history. Can have the following limitations:
a. Feature extraction can be hard in some domain, such as multimedia or image
b. Over specialization: Diversity is required. Randomness, genetic algorithms, or some adjustment (remove too similar, or too different outputs)
c. New user problem: No information to consider

Known algorithms: (Naive) Bayesian classifier, Rocchio, winnow, ANN, …

ii) Collaborative RS: a user’s feature is computed by a group of like-mined people or peers. Limitations:
a. New user problem [83][89] : The same with content-based RS
b. New item problem
c. Sparsity: a few workaround ideas -- use of demographic information, dimension reduction, …

Known algorithms: clustering, Bayesian network, SVD, maximum entropy, …

iii) Hybrid RS: utilize both content-based and collaborative system.