Sunday, May 30, 2010

The Longest Increasing Subsequence problem

The Longest Increasing Subsequence problem can be solved by these methods:

1) Sort the sequence L into L_sorted, then find LCS(L, L_sorted). Since sort takes O(NlogN) time and LCS takes O(N^2) time, total it takes O(N^2) time.
2) There is a DP method [2]. It's O(N^2) and can be improved to O(NlogN) by using binary search.
3) Find the longest path in a directed acyclic graph. The source provides no detail on this solution. But it should be like building a tree with internal links (a DAG). It won't be easy.
4) Patience sort [3]. This is a card game and easy to understand by taking a deck of cards to try out oneself. Similar to the DP method, this is also O(N^2) and can be improved to O(NlogN) by using binary search. I prefer this method, since it's easy to find both the length and one of the LIS sequences.

Patience sort is an interesting sort [4]. This is one case where greedy strategy works optimally. Here are some excerpts from [4] about the history of patience sort:

The name "patience sorting" comes from Mallows (1973), who credits A.S.C. Ross for its invention for the motivation of sorting a deck of cards. Mallow's analysis was done in 1960 but not published until a long time later. Robert Floyd discovered this
independently and exchanged idea on it with Donald Knuth in private letters. Hammersley (1972) independently recognized its use as an algorithm for computing the length of LIS.

A related problem of LIS is LCIS: Longest Continuous Increasing Subsequence. This is much easier. A one-pass traversal of the sequence will do. we just keep 2 variables: the length of the current LCIS, and the ending location of it. At the end we know both the length and the location.

[1] http://en.wikipedia.org/wiki/Longest_increasing_subsequence
[2] http://blog.programfan.com/article.asp?id=13086
[3] http://en.wikipedia.org/wiki/Patience_sort
[4] Longest Increasing Subsequences: From Patience Sorting to the Baik-Dieft-Johansson Theorem. Caroline Gates, D. Aldous, Bull. Amer. Math. Soc., 36:413-32 (1999)

Tuesday, May 25, 2010

ASP upload file size upper limit

When use multipart/form-data type to upload file in ASP, there is a problem of file size limit. On windows server 2003, this limit is about 200KB (204800 bytes). This is not enough since that's really a small file size in today's standards.

The solution on windows server 2003 (IIS 6.0) is to set a bigger value on variable AspMaxRequestEntityAllowed, which is in systemroot\System32\Inetsrv\Metabase.xml. On earlier versions of IIS, the value of AspMaxRequestEntityAllowed can be set in the registry.

Now since big file upload can take long, it may need to set ASP ScriptTimeout value bigger than default (90 seconds). Can do this by "<%Server.ScriptTimeout[=NumSeconds]%>" [3], or in IIS6.0 can set this in IIS manager [4].

References:
[1] What is the limit on Form / POST parameters?
[2] Description of the MaxClientRequestBuffer Registry Value
[3] ASP ScriptTimeout Property
[4] Setting the ASP Script Timeout (IIS 6.0)

Thursday, May 20, 2010

Disk scheduling algorithms

Several common disk scheduling algorithms:

First Come First Served (FCFS)
Shortest Seek Time First (SSTF)
SCAN (Elevator algorithm)
Circular SCAN (C-SCAN)
LOOK
Circular LOOK (C-LOOK)

References:
[1] Disk scheduling
[2] Wiki: elevator algorithm
[3] Example of calculation

Lecture by Vladimir Vapnik

Lecture on Empirical Inference by Vladimir Vapnik.

Coin tossing

Coin tossing is a game form ancient time. Yet there are interesting issues related and research going on.

One example is to simulate an unbiased coin with a biased one. von Neumann gave a simple method: HT is counted as Head, TH is counted as Tail, HH and TT are discarded. This can have extension to be more efficient in the sense of needed less number of tosses per decision though, e.g., HHTT can be counted as Head, TTHH can be counted as Tail as well [2].

Search "biased coin toss" would give some interesting research papers on relevant issues. Some examples are:

[1] Optimal random number generation from a biased coin (2005)
[2] Tossing a Biased Coin
[3] Tree algorithm for unbiased coin tossing with a biased coin (1984)
[4] "Topic 5: random generation" from CSE 103: PROBABILITY AND STATISTICS -- Readings

Wednesday, May 12, 2010

OLE with Perl

Perl can do many things beyond normal expectation. One example is with OLE objects. If you have MS Office installed on your computer, you can try the following code. It creates a word document and saves as C:\test.doc. You will see the Word document automatically opening up, adding lines and closing down. This demonstrates the capability of Perl to manipulate MS Office files. Application is such as automatic business report generation.

#!\usr\bin\perl -w
# http://www.adp-gmbh.ch/perl/word.html
# http://www.xav.com/perl/faq/Windows/ActivePerl-Winfaq12.html
# http://www.ngbdigital.com/perl_ole_word.html

use warnings;
use strict;

use Win32::OLE;

my $word = CreateObject Win32::OLE 'Word.Application' or die $!;
$word->{'Visible'} = 1;

my $filename = "C:\\test.doc";
my $document = $word->Documents->Add;

my $selection = $word->Selection;

$selection -> TypeText("Hello HomeTom");
$selection -> TypeParagraph;
$selection -> TypeText("How are you doing today?");
$selection -> TypeParagraph;

$selection -> TypeText("Great. How about you?");
$selection -> {'Style'} = "Heading 1";

$selection -> TypeParagraph;

my $heading_1 = $document->Styles("Heading 1");
my $heading_1_font = $heading_1 -> Font;

$heading_1_font -> {Name} = "Bookmann";
$heading_1_font -> {Size} = 20;
$heading_1_font -> {Bold} = 1;

# Save As
$word->ActiveDocument->SaveAs({FileName => $filename});

$selection -> Typeparagraph;
$selection -> TypeText("Now save and exit after 3 seconds");

sleep 3;

$word->Documents->Close;
$word->Quit;

1;

Monday, May 3, 2010

Min edge coverage problem

Problem: Given a directed graph, find a minimal set of vertices which touch all edges
within a graph.

This is a NP-complete problem. No polynomial time solution exists.

One solution is to get powerset of all vertices, then for each vertice subset, check if it covers all edges in the graph. Return the subset with the smallest size.

min_size = Infi;
set_index = -1;
Generate all possible vertex set S;
for all set si in S
if union of edges covered by each vertex in set si == E
if (si.size() < min_size)
min_size=si.size();
set_index = i;
return i;

This solution is from Careercup Top 150 Questions.

Sunday, May 2, 2010

Architecture of high-throughput, scalable web application

The points here are taken from this link (In Chinese).

A. Some notes from the book Building Scalable Web Sites (ISBN 0596102356, 2006, 352 pages).

1. Scale up a web application
- vertically scale up: increase setting of single machine (memory, CPU).
- horizontally scale up: add more machines
2. Redundancy
- Back up: hot back up (online backup), cold back up (offline backup).
3. Load balancing
- session/session-less load balancing
- hardware/software load balancing
4. Cache

B. Article by Ni Haitao.

1. Use static html: convert dynamic content to static html.
2. Separate image/graphics server from application server.
3. Use database cluster instead of single database.
4. Use Cache. E.g.
Apache - mod_proxy, squid;
Linux - memcached;
PHP - Pear cache, eaccelerator, Apc, XCache.
5. Use mirror site.
6. Load balancing.
- hardware 4-layer exchange
- software 4-layer exchange
- 7-layer exchange

Blog Archive

Followers