Monday, June 29, 2009

More tips for using R with GotoBLAS

After building R with GotoBLAS (See http://jychoi-report-cgl.blogspot.com/2009/04/compile-r-with-gotoblas.html), a few things we can do for verification.

1. Download a R benchmark script (http://r.research.att.com/benchmarks/R-benchmark-25.R) and run it to check if every step works ok. (You can compare the performance with ones from normal R build too.)

2. If R is hang in calling eigen() function, try to rebuild GotoBLAS and R by using the same fortran compiler.

3. If nothing works, one can use ATLAS instead of GotoBLAS

Friday, April 24, 2009

Compile R with GotoBLAS

I want to share my experience to use GotoBLAS as an external multithread library of R. I didn’t make through performance test with other libraries (such as ATLAS) but I did got lot of performance gains with GotoBLAS in using R.

1. GotoBLAS from http://www.tacc.utexas.edu/resources/software
Follow instructions in 02QuickInstall.txt.

2. CBLAS from http://www.netlib.org/blas/blast-forum/cblas.tgz
This is not required for using R but you may need this for using GSL(GNU Scientific Library)
a. If the architecture is Linux, type
$ ln –s Make.LINUX Make.in
b. In Make.in, Modify BLLIB, CBDIR and –fPIC –lpthread to LOADER option.
c. Type make all for building libraries and testing
d. After completing, go to lib/LINUX and type
$ ld -melf_x86_64 -shared -soname libgotocblas.so -o libgotocblas.so cblas_LINUX.a

Note. try to use –m64 or –m32 if you are working with powerpc

3. R from http://cran.r-project.org/
Run configure as follows:
$ export GOTOBLAS_LIB=PATH/TO/GOTOBLAS_LIB
$ mkdir build_goto; cd build_goto;
$ ../configure --prefix=$HOME/usr/R/ --with-blas="-L$GOTOBLAS_LIB -lgotoblas -lpthread" --enable-R-shlib --enable-R-static-lib --enable-BLAS-shlib

Tuesday, April 07, 2009

Create AVI from R plot

1. In R, save plots as postscript (or png) files by adding sequence number. For example,

postscript(sprintf("%04d.eps", i))

2. By using ImageMagick’s convert command line tool, convert images from eps to png (You may skip if you have png files)

for f in *.eps; do convert -rotate 90 $f png32:$f.png; done

You may want to add the following options:

-resize 1280x720 : change image size
-bordercolor white -border 0x0 : add white background

3. Create AVI by using ffmpeg

ffmpeg -r 15 -sameq -i %04d.eps.png out.avi

You can control frame rate (-r rate) and quality (-sameq or –b bitrate)

 

Note:

If you need speed-up in conversion, you may try to use a bash script parallel.sh in http://pebblesinthesand.wordpress.com/2008/05/22/a-srcipt-for-running-processes-in-parallel-in-bash/

Sunday, February 01, 2009

Summary of the recommendation system survey paper

I’ve found a good survey paper about recommendation systems as follows:

Gediminas Adomavicius, Alexander Tuzhilin, "Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 6, pp. 734-749, June, 2005.

A short summary:

-. A short definition of recommendation system: a problem of extrapolation for predicting unknown values

-. 3 approaches: i) Content-based, ii) Collaborative, and iii) Hybrid

i) Content-based RS(Recommendation System): a user’s feature is computed solely based on the user’s activity history. Can have the following limitations:
a. Feature extraction can be hard in some domain, such as multimedia or image
b. Over specialization: Diversity is required. Randomness, genetic algorithms, or some adjustment (remove too similar, or too different outputs)
c. New user problem: No information to consider

Known algorithms: (Naive) Bayesian classifier, Rocchio, winnow, ANN, …

ii) Collaborative RS: a user’s feature is computed by a group of like-mined people or peers. Limitations:
a. New user problem [83][89] : The same with content-based RS
b. New item problem
c. Sparsity: a few workaround ideas -- use of demographic information, dimension reduction, …

Known algorithms: clustering, Bayesian network, SVD, maximum entropy, …

iii) Hybrid RS: utilize both content-based and collaborative system.

Sunday, December 21, 2008

Flash 3D Engine: Sandy3D Vs. Papervision3D

Inspired by an article compared performance between Away3D vs. Papervision3D, I’ve just wanted to compare simple performance Sandy 3D (3.1 AS3) vs. Papervision 3D (2.0. Revision 849. Code name: Greate White) in rendering 1,000 objects.

As a result, the papervision3D is faster than Sandy3D by roughly 2~3 times or even more. My benchmark implementations of both Sandy and Papervision3D and source codes are available. 

Optimization advice of Sandy can be found here.

How to Parallelize

I’ve found a very good introduction about how to parallelize applications from http://www.cs.princeton.edu/courses/archive/spr08/cos598A/parallelization_course.pdf

Especially, a few tools are very helpful to analyze the code at the beginning stage:

1. gprof – profiler. Help to decide which part should look at. Consider to use a pthread wrapper available at http://sam.zoy.org/writings/programming/gprof.html

2. helgrind – To detect race conditions

Tuesday, December 09, 2008

GCC 4.3.2 Compilation and OpenMP

 

In order to try OpenMP3.0, I had been trying to install gcc-4.3.2 on an x86_64 Linux box without any luck. I got the following error in building gcc-4.3.2

/usr/bin/ld: crti.o: No such file: No such file or directory

It turns out that I need 32bit libc for cross-compilation which I can’t install since I’m not a superuser. The workaround is to disable this feature by using “--disable-multilib” option as follow:

./configure –-disable-multilib
make

After installation, if you meet the following error in compiling OpenMP file:

gcc: libgomp.spec: No such file or directory,

find libgomp.spec under /path/to/gcc/lib64 and make a symbolic link under /path/to/gcc/lib/gcc/x86_64-unknown-linux-gnu/4.3.2 (See [2])

Reference:
[1] gcc-help mailing list
[2] OpenMP in 30 Minutes

Tuesday, November 04, 2008

Collective Collaborative Tagging System (Abstract)

Currently in the Internet many collaborative tagging sites exist, but there is the need for a service to integrate the data from the multiple sites to form a large and unified set of collaborative data from which users can have more accurate and richer information than from a single site. In our paper, we have proposed a collective collaborative tagging (CCT) service architecture in which both service providers and individual users can merge folksonomy data (in the form of keyword tags) stored in different sources to build a larger, unified repository. We have also examined a range of algorithms that can be applied to different problems in folksonomy analysis and information discovery. These algorithms address several common problems for online systems: searching, getting recommendations, finding communities of similar users, and finding interesting new information by trends. Our contributions are to a) systematically examine the available public algorithms’ application to tag-based folksonomies, and b) to propose a service architecture that can provide these algorithms as online capabilities.