Monday, November 02, 2009
A few useful Windows HPC commands
Sunday, November 01, 2009
Imporve serialize() in R for windows
Saturday, October 31, 2009
Compile Rmpi with Windows MPI (HPC pack)
Wednesday, October 28, 2009
Setting up a non-admin SVN repository shared with multiple users
Monday, June 29, 2009
More tips for using R with GotoBLAS
After building R with GotoBLAS (See http://jychoi-report-cgl.blogspot.com/2009/04/compile-r-with-gotoblas.html), a few things we can do for verification.
1. Download a R benchmark script (http://r.research.att.com/benchmarks/R-benchmark-25.R) and run it to check if every step works ok. (You can compare the performance with ones from normal R build too.)
2. If R is hang in calling eigen() function, try to rebuild GotoBLAS and R by using the same fortran compiler.
3. If nothing works, one can use ATLAS instead of GotoBLAS
Friday, April 24, 2009
Compile R with GotoBLAS
I want to share my experience to use GotoBLAS as an external multithread library of R. I didn’t make through performance test with other libraries (such as ATLAS) but I did got lot of performance gains with GotoBLAS in using R.
1. GotoBLAS from http://www.tacc.utexas.edu/resources/software
Follow instructions in 02QuickInstall.txt.
2. CBLAS from http://www.netlib.org/blas/blast-forum/cblas.tgz
This is not required for using R but you may need this for using GSL(GNU Scientific Library)
a. If the architecture is Linux, type
$ ln –s Make.LINUX Make.in
b. In Make.in, Modify BLLIB, CBDIR and –fPIC –lpthread to LOADER option.
c. Type make all for building libraries and testing
d. After completing, go to lib/LINUX and type
$ ld -melf_x86_64 -shared -soname libgotocblas.so -o libgotocblas.so cblas_LINUX.a
Note. try to use –m64 or –m32 if you are working with powerpc
3. R from http://cran.r-project.org/
Run configure as follows:
$ export GOTOBLAS_LIB=PATH/TO/GOTOBLAS_LIB
$ mkdir build_goto; cd build_goto;
$ ../configure --prefix=$HOME/usr/R/ --with-blas="-L$GOTOBLAS_LIB -lgotoblas -lpthread" --enable-R-shlib --enable-R-static-lib --enable-BLAS-shlib
Tuesday, April 07, 2009
Create AVI from R plot
1. In R, save plots as postscript (or png) files by adding sequence number. For example,
postscript(sprintf("%04d.eps", i))
2. By using ImageMagick’s convert command line tool, convert images from eps to png (You may skip if you have png files)
for f in *.eps; do convert -rotate 90 $f png32:$f.png; done
You may want to add the following options:
-resize 1280x720 : change image size
-bordercolor white -border 0x0 : add white background
3. Create AVI by using ffmpeg
ffmpeg -r 15 -sameq -i %04d.eps.png out.avi
You can control frame rate (-r rate) and quality (-sameq or –b bitrate)
Note:
If you need speed-up in conversion, you may try to use a bash script parallel.sh in http://pebblesinthesand.wordpress.com/2008/05/22/a-srcipt-for-running-processes-in-parallel-in-bash/
Sunday, February 01, 2009
Summary of the recommendation system survey paper
I’ve found a good survey paper about recommendation systems as follows:
Gediminas Adomavicius, Alexander Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734-749, June, 2005.
A short summary:
-. A short definition of recommendation system: a problem of extrapolation for predicting unknown values
-. 3 approaches: i) Content-based, ii) Collaborative, and iii) Hybrid
i) Content-based RS(Recommendation System): a user’s feature is computed solely based on the user’s activity history. Can have the following limitations:
a. Feature extraction can be hard in some domain, such as multimedia or image
b. Over specialization: Diversity is required. Randomness, genetic algorithms, or some adjustment (remove too similar, or too different outputs)
c. New user problem: No information to consider
Known algorithms: (Naive) Bayesian classifier, Rocchio, winnow, ANN, …
ii) Collaborative RS: a user’s feature is computed by a group of like-mined people or peers. Limitations:
a. New user problem [83][89] : The same with content-based RS
b. New item problem
c. Sparsity: a few workaround ideas -- use of demographic information, dimension reduction, …
Known algorithms: clustering, Bayesian network, SVD, maximum entropy, …
iii) Hybrid RS: utilize both content-based and collaborative system.