HomeTom - CS: 2010

Wednesday, December 29, 2010

A Subversion Cheat Sheet

SVN can use a GUI interface such as TortoiseSVN, or a command line version such as slik SVN.

Setup ssh key for svn so no need to enter password each time: HOWTO: set up ssh keys

A short usage note for Subversion.

Often used svn commands are:

Create a Repository

UNIX: svnadmin create /path/to/repository
Windows: svnadmin create d:/path_to_repository

Checking Out a Project - svn checkout

UNIX: svn checkout file:///repository_name/project/trunk project
Windows: svn checkout file:///d:/repository_name/project/trunk project
Network: svn checkout http://host_name/svn_dir/repository_name/project/trunk project

Getting a List of Projects - svn list

UNIX: svn list --verbose file:///repository_name/project
Network: svn list --verbose http://host_name/svn_dir/repository_name/project

Reviewing Changes - svn status

UNIX: svn status

Adding New Files and Directories - svn add

UNIX: svn add file_or_dir_name

Deleting Files and Directories - svn delete

UNIX: svn delete file_or_dir_name
Network: svn delete -m "Deleting project dir" http://localhost/svn_dir/repository/project_dir

Committing Changes - svn commit

Network: svn commit -m "Saving recent changes" http://localhost/svn_dir/repository/project_dir

Updating Your Local Files - svn update

Network: svn update

Get revision number - svn info

svn info | grep Rev

Check log for the 10 most recent entries:

svn log --limit 10
or
svn log -l 10

MSSQL extended property

MSSQL extended property is used to store metadata information on database objects (tables, views, columns, indexes etc.).

- Extended Property is NOT case sensitive.
- MSDN: add, drop, update, list extended properties:
  http://msdn.microsoft.com/en-us/library/ms180047.aspx

- list all extended properties:
  SELECT * FROM sys.extended_properties;
- list all extended properties for columns in all tables:
  SELECT major_id, minor_id, t.name AS [Table Name], c.name AS [Column Name], value AS [Extended Property]
  FROM sys.extended_properties AS ep
  INNER JOIN sys.tables AS t ON ep.major_id = t.object_id 
  INNER JOIN sys.columns AS c ON ep.major_id = c.object_id AND ep.minor_id = c.column_id
  WHERE class = 1;
- Show all the E.P. of a specific columns in a specific table:
  SELECT objtype, objname, name, value
  FROM fn_listextendedproperty (NULL, 'schema', 'dbo', 'table', 'T1', 'column', 'id');
  
Add/drop/update/list the Extended Property of a column:
  
- Show an E.P. of a columns in a table:
  SELECT objtype, objname, name, value
  FROM fn_listextendedproperty ('[E.P. name]', 'schema', 'dbo', 'table', 'T1', 'column', 'id');

- Add an E.P. of a column in a table (the IF part is optional check):

  IF NOT Exists (SELECT * FROM fn_listextendedproperty(
    'Summary', 'schema', 'dbo', 'table', 'T1', 'column', 'id'
  ))
  EXEC sp_addextendedproperty 
    @name = 'caption' 
    ,@value = 'Employee ID' 
    ,@level0type = 'schema', @level0name = dbo
    ,@level1type = 'table', @level1name = 'T1'
    ,@level2type = 'column', @level2name = id;
  GO

- Drop an E.P. of a column in a table (the IF part is optional check):

  IF Exists (SELECT * FROM fn_listextendedproperty(
    'Summary', 'schema', 'dbo', 'table', 'T1', 'column', 'id'
  ))
  EXEC sp_dropextendedproperty 
    @name = 'caption' 
    ,@level0type = 'schema', @level0name = dbo
    ,@level1type = 'table', @level1name = 'T1'
    ,@level2type = 'column', @level2name = id;
  GO

- Update an E.P. of a column in a table:
  EXEC sp_updateextendedproperty 
    @name = N'Caption'
    ,@value = 'Employee ID must be unique.'
    ,@level0type = N'Schema', @level0name = dbo
    ,@level1type = N'Table',  @level1name = T1
    ,@level2type = N'Column', @level2name = id;
  GO

Add/drop/update/list the Extended Property of a table:

- The above shows how to add/drop/update/list the E.P. of a column.
  To do the same thing for a table, just ignore the last 2 items for
  level2type/level2name. For the IF part, use NULL for the last two items. E.g.:
  IF Exists (SELECT * FROM fn_listextendedproperty(
    'Summary', 'schema', 'dbo', 'table', 'T1', NULL, NULL
  ))
  Examples are given below.

- Drop an extended property of a table:  
  IF Exists (SELECT * FROM fn_listextendedproperty(
    'Summary', 'schema', 'dbo', 'table', 'T1', null, null
  ))
  EXEC sp_dropextendedproperty 
    @name = 'Summary' 
    ,@level0type = 'schema' ,@level0name = dbo
    ,@level1type = 'table' ,@level1name = 'T1';
  GO
  
- Add an extended property to a table:  
  IF NOT Exists (SELECT * FROM fn_listextendedproperty(
    'Summary', 'schema', 'dbo', 'table', 'T1', null, null
  ))
  EXEC sp_addextendedproperty 
    @name = 'Summary' 
    ,@value = 'T1 table''s summary' 
    ,@level0type = 'schema' ,@level0name = dbo
    ,@level1type = 'table' ,@level1name = 'T1';
  GO  

Add/drop/update/list the Extended Property of a view or view's column:

- Same as table, except that use "view" instead of "table" as level1type.
  
Note:

- Since single quote "'" is used as delimiter, it should escaped
  by "''" if it is used in the value.
- Note that string quoted by "'" is allowed to contain new line character.
  So if a value spans multiple lines, it won't be a problem.

Perl OOP and reference

Perl can do OOP for sure. I reviewed relevant material for recent work. Use "Package" to define a class. What's interesting is the use of "bless" keyword. An object is used in the syntax of a C++ object, say you use "->" to retrieve its member variable or function.

The use of reference facilitates a lot of programming features. In Perl using a back slash "\" in front of a variable (scalar, array or hash) turns it into a reference. This can be passed to subroutines, to be used recursively etc. To cast it back and refer to the object, use $/@{}/%{} for scalar/array/hash respectively.

References:
- Object Oriented Programming in PERL

Friday, December 17, 2010

Perl - count number of occurrences

http://www.chengfu.net/2005/10/count-occurrences-perl/

The author wants to find out what's the fastest way of counting occurrences of a string in a text. e.g. '-' in “1-4-7-8-37-5-7-8-3-42″. He gives 5 ways of doing this:

1. my $size = (scalar(@{[$string =~ /-/g]}) + 1);
2. my $size = scalar(split /-/, $string);
3. my $size = (($string =~ s/-//g) + 1);
4. my $size = (($string =~ tr/-//) + 1);
5. my $size = 1; $size++ while $string =~ /-/g;

He found the speed to be 4, 3, 5, 1, 2 in the order of fastest to slowest on his machine.

Doodling in math class

Doodling in math class: http://vihart.com/doodling/

Interesting.

Tuesday, December 14, 2010

Perlpod

Perlpod: The Plain Old Document format.

Description about Perlpod from the above source:

Pod is a simple-to-use markup language used for writing documentation for Perl, Perl programs, and Perl modules.
Translators are available for converting Pod to various formats like plain text, HTML, man pages, and more.
Pod markup consists of three basic kinds of paragraphs: ordinary, verbatim, and command.

Monday, December 13, 2010

Setup email forwarding in linux/unix

Email forwarding can be setup in an email client such as Thunderbird.

Under linux/unix, it can done by admin or by user himself, as explained here.

To do it by the user himself, he only needs to create a file .forward in his account root, and enter the forward email addresses separated by comma or new line. The .forward file should have permission 644.

An example .forward file is:

"|/usr/local/bin/procmail"
example@gmail.com

Saturday, December 11, 2010

Java becomes closed source

December 9, 2010: The ASF Resigns From the JCP Executive Committee.

It seems that now Java to Oracle is as C# to Microsoft. Java becomes proprietary language of Oracle. But, C/C++/PHP/Perl/Python/Ruby are still open source.

Since Oracle also owns MySQL, don't know what will happen to MySQL. PostgreSQL is still open source though.

Wednesday, November 17, 2010

Silicon valley

Here is an introduction to silicon valley.

Friday, October 1, 2010

SQL Injection Attack

One of the servers is hacked. The symptom is that javascript code were inserted into some database tables. When users visit the site, the javascript code would invoke remote site scripts, display a faked virus scan and report (might be a dynamic gif image), and ask the user to download and install a virus removal software. Once the user installs the software, his computer will be infected.

A script is written to search and clean the database based on signature (characteristic substring) in the javascript code. SQL Query log is added to record all the executed queries. Nothing was found. The problem was finally identified by checking the web server visit log. There are visit requests where SQL commands such as UPDATE used in query string parameter value. It takes advantage of the fact that two consecutive SQL statements can be executed one after the other. This SQL injection attack achieved its goal. Checking the request parameter before executing the SQL can catch such problems. Use of stored procedure and avoid this vulnerability.

A perl script is written to analyze the visit log files and extract these attack requests. The attacked page and attacker's IP is found. Whois service is used to locate where the attacker's IP is from. Patches were made to the attacked page and the site.

Below is the perl script to analyze web server visit log. It extract queries using the 'update' command. Actually it's found that command 'select' is also used in other queries. Can change the script to extract those as well.

#!/usr/bin/perl

#
# @Author: Tom
# @Created on: 9/30/2010
# @Last modified: 9/30/2010
# @Usage example: perl Analyzer.pl < ex100920.log > 100920.txt
# All lines containing the substring "1091+update" are extracted.
#

use strict;

my @fields;
my $fields_ct;
my $line_ct = 0;
my $attack_line_ct = 0;
my @attacked_Pages;
my @attack_IPs;
my $v;
my @vs;
my $i;
my $prefix;

my @ascii_table = (
"NUL",
"SOH",
"STX",
"ETX",
"EOT",
"ENQ",
"ACK",
"BEL",
"BS",
"HT",
"LF",
"VT",
"FF",
"CR",
"SO",
"SI",
"DLE",
"DC1",
"DC2",
"DC3",
"DC4",
"NAK",
"SYN",
"ETB",
"CAN",
"EM",
"SUB",
"ESC",
"FS",
"GS",
"RS",
"US",
" ",
"!",
"\"",
"#",
"\$",
"%",
"&",
"'",
"(",
")",
"*",
"+",
",",
"-",
".",
"/",
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
":",
";",
"<",
"=",
">",
"?",
"@",
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
"O",
"P",
"Q",
"R",
"S",
"T",
"U",
"V",
"W",
"X",
"Y",
"Z",
"[",
"\\",
"]",
"^",
"_",
"`",
"a",
"b",
"c",
"d",
"e",
"f",
"g",
"h",
"i",
"j",
"k",
"l",
"m",
"n",
"o",
"p",
"q",
"r",
"s",
"t",
"u",
"v",
"w",
"x",
"y",
"z",
"{",
"|",
"}",
"~",
"DEL",
);

while(<>) {
  $line_ct ++;
  if (/^\s+$/) { next; } # ignore empty line.
  if (/id=1091\+update/) {} else { next; } # ignore non-attack lines.
  $attack_line_ct ++;

  print $line_ct . ": \n" ;
  print $_ . "\n"; 
  print "Fields dump: \n";
  @fields = split(' ', $_);
  $fields_ct = @fields;
  for ($i = 0; $i < $fields_ct; $i ++) {
    print "$i: ";
    $v = $fields[$i];
    if ($i == 6) {
      $v =~ s/\%2B/+/g;
      #print "$v\n";
      @vs = split('\+', $v);
      foreach my $u (@vs) {
        if ($u =~ /^varchar\(8000\)/) {
          print "$u";
        } elsif ($u =~ /(cast\()(char\()((\d)+)\)/) {
          print "cast(" . $ascii_table[$3]; 
        } elsif ($u =~ /(char\()((\d)+)\)/) {
          print $ascii_table[$2]; 
        } else {
          print "$u";
        }
        #print "+";
        print " ";
      }
      print "\n";
    } elsif ($i == 5) {
      if (Page_Exists($v) == 0) {
        push(@attacked_Pages, $v);
      }
      print "$v\n";
    } elsif ($i == 9) {
      if (IP_Exists($v) == 0) {
        push(@attack_IPs, $v);
      }
      print "$v\n";
    } else {
      print "$v\n";
    }
  }
  print "\n";
}

print "$attack_line_ct attack requests found.\n";

my $Page_ct = @attacked_Pages;
print "$Page_ct attacked Page(s) found: \n";
foreach my $p (@attacked_Pages) {
  print "$p\n";
}

my $IP_ct = @attack_IPs;
print "$IP_ct attacking IP(s) found: \n";
foreach my $p (@attack_IPs) {
  print "$p\n";
}


sub Page_Exists() {
  my ($ip) = @_;
  foreach my $p (@attacked_Pages) {
    if ($ip eq $p) { return 1; }
  }
  return 0;
}

sub IP_Exists() {
  my ($ip) = @_;
  foreach my $p (@attack_IPs) {
    if ($ip eq $p) { return 1; }
  }
  return 0;
}

For cleaning the database, an ASP script is written.


'
' Remove injected code from database tables.
' Use signature and full_signature to specify the injected code.
' Use request string "doClean=y" to do cleaning.
' If not use doClean, will only show infected rows.
' Tom 9-24-2010
'
Dim signature, full_signature
signature = "</script>" 
full_signature = "<script src=http://.../...js></script>"

Dim doClean
doClean = false
if request("doClean") <> "" then doClean = true

Response.Write("<p>Use doClean request parameter to do cleaning. ")
Response.Write("doClean = " & doClean & "</p>")
call getTables()

function getTables()
    'Response.Write("test()<br>")
    Dim db, rs, sql, count, tbl, infectedTblCount
    infectedTblCount = 0

    sql = "select table_name as Name from INFORMATION_SCHEMA.Tables where TABLE_TYPE ='BASE TABLE'"
    set db = Connect()
    set rs = ExecuteRS(db, sql)
    
    count = 1
    do while not rs.eof
        tbl = rs("Name")
            Response.Write("Table " & count & ". " & tbl & "<br>")
            if getTableColumns(tbl) > 0 then infectedTblCount = infectedTblCount + 1
            count = count + 1
        rs.MoveNext()
    loop
    
    Response.Write("<p>" & infectedTblCount & " tables are infected.</p>")

    call rs.close()
    set rs = nothing
    call db.close()
    set db = nothing    
end function


function getTableColumns(tbl)
    Dim db, rs, sql, col, str, chk

    sql = "select column_name as Name from INFORMATION_SCHEMA.COLUMNS where TABLE_name ='" & tbl & "'"
    set db = Connect()
    set rs = ExecuteRS(db, sql)
    
    count = 0 ' count of infected columns.
    
    do while not rs.eof
        col = rs("Name")
        chk = checkTblColumn(tbl, col)
        str = str & "<li>" & col & chk & "</li>"
        if len(chk) > 0 then 
            count = count + 1
            if doClean then str = str & cleanTblColumn(tbl, col)
        end if
        rs.MoveNext()
    loop
    str = "<ol>" & str & "</ol>"
    Response.Write(str)
    
    getTableColumns = count

    call rs.close()
    set rs = nothing
    call db.close()
    set db = nothing
end function


function checkTblColumn(tbl, col)
    Dim db, rs, sql, val, str

    sql = "select " & col & " as Name from " & tbl
    set db = Connect()
    set rs = ExecuteRS(db, sql)
    
    count = 0   
    str = ""
    do while not rs.eof
        val = rs("Name")
        if InStr(1, val, signature) > 0 then
            str = str & ("<li>Infected row: " & encodeStr(val) & "</li>")
            count = count + 1
        end if
        rs.MoveNext()
    loop
    if count > 0 then
        str = "<font color='red'>" & count & " rows infected</font>" & str
        str = "<ol>" & str & "</ol>"
        'Response.Write(str)
        checkTblColumn = str
    else
        checkTblColumn = ""
    end if

    call rs.close()
    set rs = nothing
    call db.close()
    set db = nothing
end function


function cleanTblColumn(tbl, col)
    Dim db, rs, sql, val, str

    set db = Connect()
    set rs = Server.CreateObject("ADODB.Recordset")
    rs.Open tbl, db, 1, 2, adCmdTableDirect
    
    count = 0   
    str = ""
    do while not rs.eof
        val = rs(col)
        if InStr(1, val, signature) > 0 then
            rs(col) = replace(val, full_signature, "")
            rs.Update()
            count = count + 1
        end if
        rs.MoveNext()
    loop
    if count > 0 then
        str = "<font color='red'>CLEANED " & count & " rows infected</font>" & str
        cleanTblColumn = str
    else
        cleanTblColumn = ""
    end if

    call rs.close()
    set rs = nothing
    call db.close()
    set db = nothing
end function


function encodeStr(s)
    s = replace(s, "<", "&lt;")
    s = replace(s, ">", "&gt;")
    encodeStr = s
end function

BTW, this is a wiki article on SQL injection. Search on google for SQL injection would bring up much more articles. Good and important to know for web developers, for security concern.

Thursday, September 16, 2010

Book: Apache Jakarta and Beyond

Book: Apache Jakarta and Beyond - A Java programmer's introduction. By Larne Pekowsky. ISBN 0-321-23771-4 QA76.73.J38P44 2004. 2005.

This book introduces a series of Jakarta tools to be used by Java programmers. Including:


- Ant
- Eclipse
- Testing with JUnit
- Testing web sites with HTTPUnit
- Further web testing with Jakarta Cactus
- Stress Testing with Jakarta JMeter
- Simplifying Bean Development with BeanUtils
- Traversing Hierarchical Data with JXPath
- Database tools: 
  Hsqldb, DBCP, OJB
- Logging
  - Java.util.logging
  - Log4j
- Configuring program options
  - Jakarta CLI (Command-Line Interface)
  - Jakarta Digester (XML-based: object stack, element matching patterns, processing rules)
- Working with Text 1: Regular Expressions
- Working with Text 2: Searching  
- Creating office documents with POI
- Scripting
- Tomcat
- The standard tag library
- Struts: application toolkit/web application framework
- Cocoon: provides a complete XML-based publishing suite, for the generation, manipulation and rendering of XML.

Wednesday, August 25, 2010

Design Patterns

Design Patterns - Elements of Reusable Object-Oriented Software. This is a classical book. The authors won ACM 2010 SIGSOFT outstanding research award for their contribution to software engineering for. The four authors are classed the GoF (Gang of Four). The design patterns in their book is called the GoF patterns.

Design patterns are solutions abstracted from repeatedly occurring design problems and can be reused in similar situations. Each has a pattern name, associated problem, solution and consequence.

Some design patterns are bounded to languages features, so is easier to implement in some languages than the others. For example, the Template pattern is easy to do in C++ and Java, since C++ provides template and Java provides generics.

In this book, designed patterns are divided into 3 categories based on purpose. The following notes are extracted from the book.

A. Creational

Class:

1. Factory method
Define an interface for creating an object, but let subclasses decide which class to instantiate. It lets a class defer instantiation to subclasses.

Object:

2. Abstract Factory
Provide an interface for creating families of related or dependent objects w/o specifying their concrete classes.

Isn't this the interface concept in C++/Java? Obviously it's related to polymorphism.

3. Builder
Separate the construction of a complex object from its representation so that the same construction process can create different representations.

4. Prototype
Specify the kinds of objects to create using a prototypical instance, and create new objects by copying this prototype.

5. Singleton
Ensure a class only has one instance, and provide a global point of access to it.

B. Structural

Class:

6. Adapter (class)
Convert the interface of a class into another interface clients expect. It lets classes work together that couldn't otherwise because of incompatible interfaces.

Object:

7. Adapter (object)

8. Bridge
Decouple an abstraction from its implementation so that the two can vary independently.

This is the abstraction and encapsulation principles of OOP.

9. Composite
Compose objects into tree structures to represent part-whole hierarchies. It lets clients treat individual objects and compositions of objects uniformly.

OOP uses inheritance for IS-A relationship, and uses composition for HAS-A relationship.

10. Decorator
Attach additional responsibilities to an object dynamically. It provides a flexible alternative to subclassing for extending functionality.

11. Facade
Provide a unified interface to a set of interfaces in a subsystem. Facade defines a higher-level interface that makes the subsystem easier to use.

12. Flyweight
Use sharing to support large numbers of fine-grained objects efficiently.

13. Proxy
Provide a surrogate or placeholder for another object to control access to it.

C. Behavioral

Class:

14. Interpreter
Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in the language.

Sounds related to compiler/interpreter.

15. Template method
Define the skeleton of an algorithm in an operation, deferring some steps to subclasses. Template method lets subclasses redefine certain steps of an algorithm w/o changing the algorithm's structure.

Object:

16. Chain of Responsibility
Avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request. Chain the receiving objects and pass the request along the chain until an object handles it.

One example to this is to catch a series of exceptions in C++/Java.

17. Command
Encapsulate a request as an object, thereby letting you parameterize clients with different requests, queue or log requests, and support undoable operations.

18. Iterator
Provide a way to access the elements of an aggregate object sequentially w/o exposing its underlying representation.

This occurs abundantly in C++/Java.

19. Mediator
Define an object that encapsulates how a set of objects interact. It promotes loose coupling by keeping objects from referring to each other explicitly, and it lets you vary their interaction independently.

20. Memento
W/o violating encapsulation, capture and externalize an object's internal state so that the object can be restored to this state later.

21. Observer
Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.

For example, Java uses wait(), notify() and notifyAll() methods for threads communication.

22. State
Allow an object to alter its behavior when its internal state changes. It will appear to change it class.

One example for this is the workflow state management as in my work.

23. Strategy
Define a family of algorithms, encapsulate each one, and make them interchangeable. It lets the algorithm vary independently from clients that use it.

24. Visitor
Represent an operation to be performed on the elements of an object structure. Visitor lets you define a new operation w/o changing the classes of the elements on which it operates.

The famous MVC is not included in list because it's a combination of multiple patterns. Use Smalltalk MVC as an example. The VC relationship uses the Strategy pattern. MVC also uses Factory method to specify the default controller class for a view, and Decorator pattern to add scrolling to a view.

A new comer at OOD can start with the simplest and most common patterns:
Creational: Abstract Factory, Factory
Structural: Adapter, Composite, Decorator
Behavioral: Observer, Strategy, Template(! Yeah, this is behavioral, not structural)

Seems like I already had experience with at least these design patterns:
Creational: Singleton, Factory
Structural: Adapter, Bridge, Composite
Behavioral: Interpreter, Template, Chain of Responsibility, Iterator, Observer, State

[1] Amazon: Design patterns. By Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. QA 76.64 .D47 1994.
[2] Wiki - design patterns. Short but comprehensive list of design patterns from different sources.
[3] Design pattern implementation in C# and VB.NET. Good link with real code examples. E.g., load balancer using the Singleton pattern.

Tuesday, August 24, 2010

Tips on IMS development

Here are some tips on information management system web application development from my work. Most are platform independent and can be easily applied to different platforms and technologies. They can also be applied to non-web applications, and are actually general design tips.

Menu. The menu can be dynamically generated from database. This allows flexible configuration and update. If there are multiple roles, each can be configured to draw from the pool of menu items. For this first get a javascript menu template, then create it dynamically. For better performance and void regenerate it each time, such menu can be generated once at creation time, then be stored for later use.
Roles and permissions. There can be many roles. The permission can be configured this way: for each item that needs permission control, say we need read/write/delete, then can use a string such as "742172". Here each digit stands for the permission setting of one role. The value of a digit is the combined sum of 4/2/1, which stands for read/write/delete, similar to unix.
Workflow. A table can be used for this, which records stage and stage transition settings. Each role may have different permissions at different stages. Such a table can easily store these information.
Email. If there are emails to be sent, an admin interface can be provided to configure the subject, body and other fields of emails. For specific information, provide some macros that the administrator can use, which will be replaced by program later.
HTML Form. Can create a self-defined framework, instead of using that of such as .NET's control. This allows more flexibility, and can be more efficient since .NET's control may keep more information than necessary. To store the state/parameters, such as when you need to ask for confirmation of user, use hidden HTML control.
HTML Form fields. The form fields settings can be managed in the database. So there is no need to go to code for update. Everything can be configured from database. Like label management below, adding a new field can automatically insert itself to a table in database with default settings. Given each form and each form field (if needed) a class name, and control display properties from CSS.
Label management. An admin interface can be provided for this. When a tag is put on a page, it can automatically check if it's in database already, if not then insert itself in as a unique tag name. This allows domain experts to change it, and frees programmers from such ad-hoc, trivial but time-consuming and distracting chores.
The use of coding. Any systematic categorizing paradigms can be implemented by coding for ease of control and ease of versioning. Say we can use the starting code of 001 to stand for version 1, 002 to stand for version 2, etc. This can apply to many things and is highly useful.
Use classes as much as possible, decouple business logic from interface files.

Sunday, August 22, 2010

Computer Systems: A Programmer’s Perspective

Computer Systems: A Programmer’s Perspective. 1st edition (August 23, 2002). 2nd edition (February 14, 2010). By R. E. Bryant, D. R. O’Hallaron.

This books introduces computer systems in a broad way, including operating system, architecture, compiler, network and more. It is a good reference for system programmers to improve their understanding on computer systems and improve programming skills.

Chapter 1. Introduction

- Compilation
- Processors Read and Interpret Instructions Stored in Memory
  Hardware Organization of a System.
- Cache
  - cache hierarchy (registers, on-chip L1 cache, off-chip L2 cache, memory, local disk, remote storage)
- OS: process, thread, virtual memory, files
- Inter system communication by network.

Part I Program Structure and Execution

Chapter 2 Representing and Manipulating Information
Chapter 3 Machine-Level Representation of C Programs
Chapter 4 Processor Architecture 

Chapter 5 Optimizing Program Performance

5.1 Capabilities and Limitations of Optimizing Compilers
5.2 Expressing Program Performance
5.3 Program Example
5.4 Eliminating Loop Inefficiencies
5.5 Reducing Procedure Calls
5.6 Eliminating Unneeded Memory References
5.7 Understanding Modern Processors
5.7.1 Overall Operation
5.7.2 Functional Unit Performance
5.7.3 A Closer Look at Processor Operation
- Translating Instructions into Operations
- Processing of Operations by the Execution Unit
- Scheduling of Operations with Unlimited Resources
- Scheduling of Operations with Resource Constraints
5.8 Reducing Loop Overhead
5.9 Converting to Pointer Code
5.10 Enhancing Parallelism
5.10.1 Loop Splitting
5.10.2 Register Spilling
5.10.3 Limits to Parallelism
5.11 Putting it Together: Summary of Results for Optimizing Combining Code
5.11.1 Floating-Point Performance Anomaly
5.11.2 Changing Platforms
5.12 Branch Prediction and Misprediction Penalties
5.13 Understanding Memory Performance
5.13.1 Load Latency
5.13.2 Store Latency
5.14 Life in the Real World: Performance Improvement Techniques
5.15 Identifying and Eliminating Performance Bottlenecks
5.15.1 Program Profiling
5.15.2 Using a Profiler to Guide Optimization
5.15.3 Amdahl’s Law
5.16 Summary

Chapter 6 The Memory Hierarchy

- A memory system is a hierarchy of storage devices with different capacities, costs, and access times.
- SRAM, DRAM, disk. 
- Storage technologies
  - Random-access memory
    - SRAM - bistable memory cell.
    - DRAM - capacitors.
    - Nonvolatile memory: ROM, PROM, EPROM, EEPROM.
    - Accessing main memory.
  - Disk storage
    - geometry, capacity, operation (seek time, rotational latency, transfer time),
      logical disk blocks, accessing disks
  - Storage technology trends.
    - price/performance trade-off
- Locality: temporal/spatial locality.
  - Locality of references to program data.
  - Locality of Instruction Fetches.
  - Summary
- The memory hierarchy
  + Different storage technologies have widely different access times.
  + Well-written programs tend to exhibit good locality.
  - Caching in the Memory Hierarchy
    - Cache hits
    - Cache misses
    - Cache Management
  - Summary of Memory Hierarchy Concepts
    - Exploiting temporal/spatial locality
- Cache Memories
  - CPU registers, L1 cache, L2 cache, main DRAM memory, and disk storage
- Writing Cache-friendly Code
  + Make the common case go fast.
  + Minimize the number of cache misses in each inner loop.
- Putting it Together: The Impact of Caches on Program Performance
  - The Memory Mountain
  - Rearranging Loops to Increase Spatial Locality
  - Using Blocking to Increase Temporal Locality
- Summary

Part II Running Programs on a System 

Chapter 7 Linking
Chapter 8 Exceptional Control Flow
Chapter 9 Measuring Program Execution Time

Chapter 10 Virtual Memory

10.1 Physical and Virtual Addressing
10.2 Address Spaces
10.3 VM as a Tool for Caching
+ Set of virtual pages: Unallocated/Cached/Uncached
10.3.1 DRAM Cache Organization
10.3.2 Page Tables
10.3.3 Page Hits
10.3.4 Page Faults
10.3.5 Allocating Pages
10.3.6 Locality to the Rescue Again
10.4 VM as a Tool for Memory Management
10.4.1 Simplifying Linking
10.4.2 Simplifying Sharing
10.4.3 Simplifying Memory Allocation
10.4.4 Simplifying Loading
10.5 VM as a Tool for Memory Protection
10.6 Address Translation
10.6.1 Integrating Caches and VM
10.6.2 Speeding up Address Translation with a TLB
10.6.3 Multi-level Page Tables
10.6.4 Putting it Together: End-to-end Address Translation
10.7 Case Study: The Pentium/Linux Memory System
10.8 Memory Mapping
10.8.1 Shared Objects Revisited
10.8.2 The fork Function Revisited
10.8.3 The execve Function Revisited
10.8.4 User-level Memory Mapping with the mmap Function
10.9 Dynamic Memory Allocation
10.9.1 The malloc and free Functions
10.9.2 Why Dynamic Memory Allocation?
10.9.3 Allocator Requirements and Goals
- Requirements:
  - Handling arbitrary request sequences
  - Making immediate responses to requests
  - Using only the heap.
  - Aligning blocks (alignment requirement).
  - Not modifying allocated blocks.
- Goals:
  - Goal 1: Maximizing throughput.
  - Goal 2: Maximizing memory utilization.
10.9.4 Fragmentation
- internal fragmentation: 
  occurs when an allocated block is larger than the payload.
  e.g. to satisfy memory alignment.
- external fragmentation: 
  there is enough aggregate free memory to satisfy an 
  allocate request, but no single free block is large enough
- External fragmentation is much more difficult to quantify than internal
  fragmentation
10.9.5 Implementation Issues
10.9.6 Implicit Free Lists
10.9.7 Placing Allocated Blocks
10.9.8 Splitting Free Blocks
10.9.9 Getting Additional Heap Memory
10.9.10 Coalescing Free Blocks
10.9.11 Coalescing with Boundary Tags
10.9.12 Putting it Together: Implementing a Simple Allocator
10.9.13 Explicit Free Lists
10.9.14 Segregated Free Lists
10.10 Garbage Collection
10.10.1 Garbage Collector Basics
- A garbage collector views memory as a directed reachability graph.
- The nodes of the graph are partitioned into 1) root nodes and 2) heap nodes.
10.10.2 Mark&Sweep Garbage Collectors
10.10.3 Conservative Mark&Sweep for C Programs
10.11 Common Memory-related Bugs in C Programs
10.11.1 Dereferencing Bad Pointers
10.11.2 Reading Uninitialized Memory
10.11.3 Allowing Stack Buffer Overflows
10.11.4 Assuming that Pointers and the Objects they Point to Are the Same Size
10.11.5 Making Off-by-one Errors
10.11.6 Referencing a Pointer Instead of the Object it Points to
10.11.7 Misunderstanding Pointer Arithmetic
10.11.8 Referencing Non-existent Variables
10.11.9 Referencing Data in Free Heap Blocks
10.11.10 Introducing Memory Leaks
10.12 Summary

Part III Interaction and Communication Between Programs

Chapter 11 Concurrent Programming with Threads

- A thread is a unit of execution, associated with a process, with its own 
  thread ID, stack, stack pointer, program counter, condition codes, and 
  general-purpose registers.
11.1 Basic Thread Concepts
- process context: 1) program context, 2) kernel context.
11.2 Thread Control
- Pthreads defines about 60 functions
11.2.1 Creating Threads
11.2.2 Terminating Threads
11.2.3 Reaping Terminated Threads
11.2.4 Detaching Threads
11.3 Shared Variables in Threaded Programs
11.3.1 Threads Memory Model
11.3.2 Mapping Variables to Memory
11.3.3 Shared Variables
11.4 Synchronizing Threads with Semaphores
11.4.1 Sequential Consistency
11.4.2 Progress Graphs
11.4.3 Protecting Shared Variables with Semaphores
11.4.4 Posix Semaphores
11.4.5 SignalingWith Semaphores
11.5 Synchronizing Threads with Mutex and Condition Variables
11.5.1 Mutex Variables
11.5.2 Condition Variables
11.5.3 Barrier Synchronization
11.5.4 Timeout Waiting
11.6 Thread-safe and Reentrant Functions
- A function is thread-safe if and only if it will always produce correct results 
  when called repeatedly within multiple concurrent threads.
- Four (non-disjoint) classes of thread-unsafe functions:
  1) Failing to protect shared variables.
  2) Relying on state across multiple function invocations.
  3) Returning a pointer to a static variable.
  4) Calling thread-unsafe functions.
11.6.1 Reentrant Functions
- Property: do not reference any shared data when called by multiple threads.
- Reentrant functions are typically more efficient than non-reentrant
  thread-safe functions because they require no synchronization operations.
- Reentrant functions is a subset of thread-safe functions.
11.6.2 Thread-safe Library Functions
11.7 Other Synchronization Errors
11.7.1 Races
11.7.2 Deadlocks
11.8 Summary

Chapter 12 Network Programming

12.1 Client-Server Programming Model
- The fundamental operation in the client-server model is the transaction.
12.2 Networks
12.3 The Global IP Internet
12.3.1 IP Addresses
12.3.2 Internet Domain Names
12.3.3 Internet Connections
12.4 Unix file I/O
12.4.1 The read and write Functions
12.4.2 Robust File I/OWith the readn and writen Functions
12.4.3 Robust Input of Text Lines Using the readline Function
12.4.4 The stat Function
12.4.5 The dup2 Function
12.4.6 The close Function
12.4.7 Other Unix I/O Functions
12.4.8 Unix I/O vs. Standard I/O
12.5 The Sockets Interface
12.5.1 Socket Address Structures
12.5.2 The socket Function
12.5.3 The connect Function
12.5.4 The bind Function
12.5.5 The listen Function
12.5.6 The accept Function
12.5.7 Example Echo Client and Server
12.6 Concurrent Servers
12.6.1 Concurrent Servers Based on Processes
12.6.2 Concurrent Servers Based on Threads
12.7 Web Servers
12.7.3 HTTP Transactions
12.8 Putting it Together: The TINY Web Server
12.9 Summary

Appendix A Error handling
Appendix B Solutions to Practice Problems

Friday, August 20, 2010

memcached

Memcached - distributed key-value pair memory caching system. Very easy to use.

Download:
- http://memcached.org/: C source code.
- NorthScale Membase server - based on memcached: server for various platforms. Clients can be C, .NET, Java, Perl, PHP, Ruby.

Wednesday, August 18, 2010

JEE

It's been some time since I used J2EE in class. Well it's called JEE nowadays. Today reviewed JEE tools.

Tutorials: 
- Spring Tutorial. (Took about 5 hours to finish, spreaded into 3 days)
- Hibernate Tutorial.
- Maven Tutorial. (Took half an hour to finish)
- Another Maven Tutorial.
- Oracle's JEE 5 tutorial

The environment is:
- OS: Windows XP
- Java: 1.60
- Ant: 1.7 (http://ant.apache.org/manual/index.html)
  [Need to setup environment variables ANT_HOME and Java_HOME]
- Apache Tomcat: 6.0.29 (http://tomcat.apache.org/download-60.cgi#6.0.29)
  Install as a service. (32-bit/64-bit Windows Service Installer). Use port 8080.
  Local site: http://localhost:8080
  Homepage root is: $CATALINA_HOME/webapps/ROOT/index.html
  $CATALINA_HOME is installation site of Tomcat.
- Spring: 2.5.0
  Most recent version is at http://www.springsource.org/download
  But I need version 2.5.0, which can be downloaded from 
  https://olex.openlogic.com/packages/spring
  This contains many other utilities. The most recent version package does not.
- Maven: 2.2.1. Obtained here. [Need to set up M2_HOME]
- Eclipse: Java EE IDE for web developers. (http://www.eclipse.org/downloads/)

A little note:
- Step 5.5 code for JdbcProductDaoTests.java should include the following two imports to function:
  import org.springframework.test.AbstractTransactionalDataSourceSpringContextTests;
  import springapp.domain.Product;
  Here the first imported class is in library spring-test.jar. 
  If eclipse can't find this then manually add it to properties->Java Build Path->Libraries.

Some concepts:
- SSH: Struts, Spring, Hibernate.
- EJB3.0+struts2+iBATIS
- jsp+Servlet+jdbc is the basics for all Java frameworks.
- jQuery: light-weighted javascript framework.
- Dojo, extjs: javascript frameworks. Used with JSF.
- SOA: Service Oriented Application
- Apache lucene: text search engine library written in Java
- Nutch: Open-source web-search software, built on Lucene Java.
- Webwork: Java web-application development framework.
- NoSQL
- Local search. e.g. foursquare

Some thoughts so far:
- A lot of work for simple application in Spring.
- JEE tools apply software engineering concepts, that may be the reason 
  it's said to be suitable for large web application development. On the 
  other hand this makes it heavy-weighted. Setup environment and getting 
  familiar with the tools cost lots of time. On this, .NET, PHP, ASP are
  easier to catch up.
- Eclipse eats big memory (over 200MB). 
- Maven is like Ant.

Thursday, August 12, 2010

CGI in C

Write CGI in C was a primitive idea. It's nonetheless interesting to know how it works.

It can be done by renaming the compiled executable with the suffix .cgi (this suffix needs be setup in Apache config file). Then put this into the server's cgi-bin folder. Visit http://server/cgi-bin/*.cgi should do.

See this short introduction on Getting Started with CGI Programming in C.

Thursday, July 29, 2010

Summary on web app layers and languages

Web Languages: Decoded (Chinese version)

This article is a summary on web application layers and languages. Not totally accurate in some places. But is a good summary anyways.

Tuesday, July 27, 2010

Drupal Customization - Add New tab to a page

In a Drupal content page, when the user is logged in, on the top of the page there are "View" and "Edit" link buttons for the user to view and edit the content of the page. Now we want to add a "New" link button to allow the user create a new page of this type. This can be done this way.

In "themes -> [theme folder] -> page.tpl.php", we have these two lines:

<?php if ($title): print '<h2'. ($tabs ? ' class="with-tabs"' : '') .'>'. $title .'</h2>'; endif; ?>
<?php if ($tabs): print '<ul class="tabs primary">'. $tabs . '</ul></div>'; endif; ?>

The first line is for the title. The second line is for the tabs "View" and "Edit". So we change the second line to be:

<?php if ($tabs): print '<ul class="tabs primary">'. $tabs . print_custom_tab($node) . '</ul></div>'; endif; ?>

And define the print_custom_tab() function at the end of the page:

<?php
function print_custom_tab($node) {
  // Use this to limit the "New" tab to certain page types.
  //if (substr($node->type, 0) == "[page type name]") 
  {
    return "<li><a href='?q=node/add/" . str_replace("_", "-", $node->type) . "'>New</a></li>";
  }
}
?>

From the $node we can also get $node->uid and other variables associated with node, and can do many other things this way.

Alternatively or in a cleaner way, there is an Add another module that do this. It needs to be configured on what page type and when to show. The configuration is done in "Administer -> Site configuration -> Add another".

Monday, July 26, 2010

C# read Excel, get Access schema

///
/// Reference: How To Open and Read an Excel Spreadsheet into a ListView in .NET
///
public static void readExcel(string fullpath) {
  Excel.Application excelObj = new Excel.Application();
  if (excelObj == null) {
    MessageBox.Show("Error: Excel cannot be started.");
    return;
  }

  excelObj.Visible = false; 

  Excel.Workbook theWorkbook = excelObj.Workbooks.Open(fullpath, 0, true, 5, "", "",
    true, Excel.XlPlatform.xlWindows, "\t", false, false, 0, false, false, false);
  // get the collection of sheets in the workbook
  Excel.Sheets sheets = theWorkbook.Worksheets;
  // get the first and only worksheet from the collection of worksheets
  Excel.Worksheet worksheet = (Excel.Worksheet)sheets.get_Item(1);
  
  // getExcelSize(worksheet); // get row/col: ws.Rows.Count, ws.Columns.Count

  // loop through 10 rows of the spreadsheet and place each row in the list view
  for (int i = 1; i <= 10; i++)
  {
    Excel.Range range = worksheet.get_Range("A"+i.ToString(), "FB" + i.ToString());
    System.Array myvalues = (System.Array)range.Cells.Value2;
    string str = ConvertArrayToString(myvalues);
    MessageBox.Show(str);
  }
}


///
/// Reference: How to read an Excel file with OleDb and a simple SQL query? 
/// To use this, the Excel version should be 8.0 or compatible.
/// 
/// The link http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=107963&SiteID=1
/// says that to use this method, it takes a little setup in your Excel document. 
/// Basically, you need to define "named objects" in Excel that are synonymous to 
/// tables in a database. The first row of the named object are the column headers. 
/// To set up a named object, first select the range of cells (your "table," with 
/// the first row being the column headers), then go to menu Insert->Names->Define. 
/// Name your object and press "Add." Now you have an object which can be read by 
/// ADO.NET.
///
public static void readExcel2(string fullpath) {
  String sConnectionString = 
    "Provider=Microsoft.Jet.OLEDB.4.0;" +
    "Data Source=" + fullpath + ";" + "Extended Properties=\"Excel 8.0;HDR=NO;IMEX=1\"";
    MessageBox.Show(sConnectionString);

  try 
  {
    OleDbConnection objConn = new OleDbConnection(sConnectionString);
    objConn.Open();

    string sheetName = filename.Substring(0, filename.IndexOf(".xls"));
    MessageBox.Show("sheetName: " + sheetName);

    //OleDbCommand objCmdSelect =new OleDbCommand("SELECT * FROM [Sheet1$]", objConn);
    OleDbCommand objCmdSelect =new OleDbCommand("SELECT * FROM [" + sheetName + "$]", objConn);
    OleDbDataAdapter objAdapter1 = new OleDbDataAdapter();
    objAdapter1.SelectCommand = objCmdSelect;
    DataSet objDataset1 = new DataSet();
    objAdapter1.Fill(objDataset1);

    string str = objAdapter1.ToString();
    MessageBox.Show(str);

    objConn.Close();
  } 
  catch (Exception e) {
    MessageBox.Show("error: " + e.Message);
  }
}

///
/// Get Access Schema.
///
public void getSchema(string path) 
{
  int i, j;
  string result = "";
  DataTable userTables = null;
  DataTable userTable = null;
  System.Windows.Forms.TreeNode node = null;
  System.Windows.Forms.TreeNode[] nc;

  try 
  {
    OleDbConnection conn = new OleDbConnection();
    conn.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + path;
    conn.Open();

    userTables = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, 
                   new object[] {null, null, null, "TABLE"});

    this.treeViewDBSchema.Nodes.Clear();
    // Add list of table names to listBox
    for (i=0; i < userTables.Rows.Count; i++) 
    {
        result += userTables.Rows[i][2].ToString() + " { ";
        userTable = conn.GetOleDbSchemaTable(OleDbSchemaGuid.Columns,
            new object[] {null, null, userTables.Rows[i][2].ToString(), null});

        nc = new TreeNode[userTable.Rows.Count];
        for (j = 0; j < userTable.Rows.Count - 1; j ++) 
        {
            result += userTable.Rows[j][3].ToString() + ", "; // [2].ToString();
            nc[j] = new TreeNode("Field: " + userTable.Rows[j][3].ToString());
            nc[j].Tag = userTable.Rows[j][3].ToString();
        }
        // for the last item.
        result += userTable.Rows[j][3].ToString(); // [2].ToString();
        nc[j] = new TreeNode("Field: " + userTable.Rows[j][3].ToString());
        nc[j].Tag = userTable.Rows[j][3].ToString();

        result += " }\n";
        node = new TreeNode("Table: " + userTables.Rows[i][2].ToString(), nc);
        node.Tag = userTables.Rows[i][2].ToString();
        this.treeViewDBSchema.Nodes.Add(node);
    }
    conn.Close();
    //MessageBox.Show(this, result);
    this.frmMain.setOutput(result);
  } 
  catch (Exception ex) 
  {
    MessageBox.Show(this, "Error: " + ex.Message);
  } 
}

C# work with local IP address

/// 
/// Ref: How To Get IP Address Of A Machine
/// Ref: WMI or How to change my IP address
/// using System.Management;
///

public string getLocalIP() 
{
  string localIP = "";
  string strHostName = Dns.GetHostName();

  IPHostEntry ipEntry = Dns.GetHostByName (strHostName);
  IPAddress [] addr = ipEntry.AddressList;

  /* what if more than one instance exists? */
  for (int i = 0; i < addr.Length; i++)  
  {
    localIP = addr[i].ToString();
  }

  return localIP;
}


public bool setLocalIP(string newIP) {
  try 
  {
    ManagementBaseObject inPar = null;
    ManagementBaseObject outPar = null;
    ManagementClass mc = new ManagementClass("Win32_NetworkAdapterConfiguration");
    ManagementObjectCollection moc = mc.GetInstances();
    foreach (ManagementObject mo in moc) 
    {
      if (!(bool) mo["IPEnabled"]) continue;
      inPar = mo.GetMethodParameters("EnableStatic");
      inPar["IPAddress"] = new string[] {newIP};
      inPar["SubnetMask"] = new string[] {subnetMask};
      outPar = mo.InvokeMethod("EnableStatic", inPar, null);
      break;
    }
  
    this.currentIP = newIP;
    return true;
  } 
  catch (Exception e) 
  {
    this.setErrMsg(e.Message + "\nsource: " + e.Source);
    return false;
  }
}


static void SwitchToDHCP()
{
  ManagementBaseObject inPar = null;
  ManagementBaseObject outPar = null;
  ManagementClass mc = new ManagementClass("Win32_NetworkAdapterConfiguration");
  ManagementObjectCollection moc = mc.GetInstances();
  foreach( ManagementObject mo in moc )
  {
    if( ! (bool) mo["IPEnabled"] )
      continue;

    inPar = mo.GetMethodParameters("EnableDHCP");
    outPar = mo.InvokeMethod( "EnableDHCP", inPar, null );
    break;
  }
}


static void SwitchToStatic()
{
  string newIP;
  string subnetMask = "255.255.255.0";
  newIP = (curIP.Equals("192.168.168.209"))?"192.168.168.204":"192.168.168.209";

  ManagementBaseObject inPar = null;
  ManagementBaseObject outPar = null;
  ManagementClass mc = new ManagementClass("Win32_NetworkAdapterConfiguration");
  ManagementObjectCollection moc = mc.GetInstances();
  foreach( ManagementObject mo in moc )
  {
    if( ! (bool) mo[ "IPEnabled" ] )
      continue;

    inPar = mo.GetMethodParameters( "EnableStatic" );
    inPar["IPAddress"] = new string[] { newIP };
    inPar["SubnetMask"] = new string[] { subnetMask };
    outPar = mo.InvokeMethod( "EnableStatic", inPar, null );
    break;
  }
}


static void ReportIP()
{
  Console.WriteLine( "****** Current IP addresses:" );
  ManagementClass mc = new ManagementClass("Win32_NetworkAdapterConfiguration");
  ManagementObjectCollection moc = mc.GetInstances();
  foreach( ManagementObject mo in moc )
  {
    if( ! (bool) mo[ "IPEnabled" ] )
      continue;

    Console.WriteLine( "{0}\n SVC: '{1}' MAC: [{2}]", (string) mo["Caption"],
      (string) mo["ServiceName"], (string) mo["MACAddress"] );

    string[] addresses = (string[]) mo[ "IPAddress" ];
    string[] subnets = (string[]) mo[ "IPSubnet" ];

    Console.WriteLine( " Addresses :" );
    foreach(string sad in addresses)
      Console.WriteLine( "\t'{0}'", sad );

    Console.WriteLine( " Subnets :" );
    foreach(string sub in subnets )
      Console.WriteLine( "\t'{0}'", sub );

    curIP = addresses[0];
  }
}

C# GET/POST request, login, download, axWebBrowser

///
/// Example of HTTP Get request.
///
private string requestURL(string url, int timeout) 
{
  HttpWebRequest wr;
  HttpWebResponse resp = null;
  Stream stream;
  StreamReader reader;
  string strResponse = "";

  try {
    wr = (HttpWebRequest) WebRequest.Create(url);
    wr.Timeout = timeout; // in milliseconds.
    resp = (HttpWebResponse) wr.GetResponse();
    stream = resp.GetResponseStream();
    reader = new StreamReader(stream);
    try { strResponse = reader.ReadToEnd(); } 
    finally { reader.Close(); }
    resp = null;
  }
  catch (Exception ex) { 
    this.output_response("::" + ex.Message); 
  }
  finally { 
    if (resp != null) resp.Close();
  }
  return strResponse;
}


///
/// Example of HTTP Post request.
/// Call the following function: e.g. 
/// string html = HttpPost("http://abcde.com", "a=1&b=2");
///
private string HttpPost(string URI, string Parameters) 
{
  WebRequest req = WebRequest.Create(URI);
  //req.Proxy = new System.Net.WebProxy(ProxyString, true);

  req.ContentType = "application/x-www-form-urlencoded";
  req.Method = "POST";

  byte [] bytes = System.Text.Encoding.ASCII.GetBytes(Parameters);
  req.ContentLength = bytes.Length;

  Stream os = req.GetRequestStream ();
  os.Write (bytes, 0, bytes.Length); 
  os.Close ();

  WebResponse resp = req.GetResponse();
  if (resp== null) return null;

  StreamReader sr = new StreamReader(resp.GetResponseStream());
  return sr.ReadToEnd().Trim();
}


///
/// Example of using HTTP Post to log into a site.
///
private void login() {
  HTMLDocument myDoc = new HTMLDocumentClass();
  myDoc = (HTMLDocument) axWebBrowser1.Document;

  try 
  {
    HTMLInputElement oUser = null, oPass = null;

    oUser = (HTMLInputElement) myDoc.all.item("username", 0);
    oPass = (HTMLInputElement) myDoc.all.item("password", 0);

    if (oUser == null || oPass == null) return;

    oUser.value = "username_value";
    oPass.value = "password_value";

    HTMLFormElement frm = (HTMLFormElement) myDoc.all.item("login", 0);
    frm.submit();

    this.Task = 1; 
  } 
  catch (Exception ex) 
  {
    this.showInfo("test() error: " + ex.Message);
  }
}


///
/// Example of using HTTP Post to download an Excel file.
///
private void getExcel() {
  HttpWebResponse resp = null;
  Stream stream;
  string filename = this.downloadFolder + "/download.xls";

  try 
  {
    wr = (HttpWebRequest) WebRequest.Create(this.excel_url);
    wr.Method = "POST";
    wr.ContentType = "application/x-www-form-urlencoded";
    wr.ContentLength = byteArray.Length;
    wr.Timeout = 100000; // in ms. Set this bigger than the site timeout value.
    MyUtil.appendLog("wr.contentlength: " + wr.ContentLength);

    wr.CookieContainer = new CookieContainer();
    wr.CookieContainer.SetCookies(new Uri(this.excel_url) , 
        ((mshtml.HTMLDocumentClass) this.axWebBrowser1.Document).cookie.ToString());
    MyUtil.appendLog("cookie: " + 
        ((mshtml.HTMLDocumentClass) this.axWebBrowser1.Document).cookie);

    // Send post request
    Stream dataStream = wr.GetRequestStream();
    dataStream.Write(byteArray, 0, byteArray.Length);
    dataStream.Close();
    resp = (HttpWebResponse) wr.GetResponse();
    resp.Cookies = wr.CookieContainer.GetCookies(resp.ResponseUri);
    MyUtil.appendLog("resp status: " + resp.StatusDescription);

    stream = resp.GetResponseStream();

    //MyUtil.appendLog("resp header: " + resp.ContentType );
    // Header: usually 'txt/html'. here is 'application/vnd.ms-excel'
    StreamReader reader = new StreamReader(stream);
    try 
    {
      MyUtil.saveFile(filename, reader.ReadToEnd());
      this.downloadSucceed = true;
    } 
    catch (Exception ex) 
    { 
      MyUtil.FileDelete(filename);
      this.downloadSucceed = false;
    }
    finally 
    {
      reader.Close();
    }
  } 
  catch (Exception ex) 
  {
    this.downloadSucceed = false;
    MyUtil.FileDelete(filename);
  } 
  finally 
  {
    if (resp != null) { resp.Close(); }
  }
}

///
/// Another way of navigating website: use the axWebBrowser control.
///
this.axWebBrowser1.Navigate(url);

[1] The user32 SendInput windows API. The SendInput function synthesizes 
    keystrokes, mouse motions, and button clicks to the currently active window. 
[2] Capturing binary download via code through axwebbrowser1.

Saturday, July 24, 2010

Per capita revenue of high tech companies

Here is an article on per capita revenue of some high tech companies (In Chinese).

Here's the list of world's largest public and private business's gross annual revenue, number of employees and other information.

It is interesting to see the data.

Tuesday, July 20, 2010

Sams Teach Yourself TCP/IP in 24 Hours

Sams Teach Yourself TCP/IP in 24 Hours (4th Edition) - by Joe Casad, 2008, 456 pages. ISBN-10: 0672329964.

I caught this book in sight at Barns and Noble. It is a short book to read, to the point and clear on basic concepts. Short is good.


I. TCP/IP basics

II. TCP/IP Protocol system

2. Correspondence of TCP/IP model and OSI model.
  TCP/IP                      OSI         Relevant protocols
  ----------------------------------------------------------------------------------
  Application layer           A,P,S          
  Transport layer             T           TCP/UDP
  Internet layer              N           IP, ARP, RARP, ICMP, router(RIP, OSPF etc.)
  Network access layer        D,P         FTS, FDDI, PPP, 802.11, 802.3, 802.16

  note: 802.3 - ethernet, 802.11 - wireless, 802.16 - winmax

3. Network access layer
  - physical addressing.
  - LAN - ethernet.
    CSMA/CS (Carrier Sense Multiple Access with Collision Detect)
    ethernet frame: preamble|dest addr|sr addr|length|data|FCS(CRC)

4. Internet layer
  - IP, IP header
  - IP addressing: class A(8/24), B(16/16), C(24/8), D, E
  - ARP (Address Resolution Protocol)
  - RARP (Reverse ARP)
  - ICMP
  - BGP (Border Gateway Protocol), RIP (Routing Information Protocol)

5. Subnetting & CIDR (Classless Inter-Domain Routing)
  - subnet mask/host ID
  - split and combine networks.

6. Transport layer
  - TCP/UDP
  - ports, sockets
  - multiplexing, demultiplexing.
  - TCP: stream, resequencing, flow control, precedence & security, graceful close.

7. Application layer
  - use socket/port to communicate with transport layer.
  - target of multiplexing/de-multiplexing

III. Networking with TCP/IP

8. Routing
  - routing table
  - IP forwarding
  - Dynamic routing algorithm
    - DV (Distance vector). e.g. RIP
    - LS (Link state). e.g. OSPF (Open Shortest Path First)
      LS is more popular than DV now.
  - Complex network routing
    1) core router - backbone network (GGP)
    2) exterior router - on border of autonomous systems (EGP: e.g. BGP)
    3) interior router - within autonomous system (IGP: OSPF/RIP)
  - classless routing: CIDR.
  - OSPF: implemented as routed on unix/linux, builds SPT (Shortest Path Tree)

10. Firewall
  - DMZ (Demilitarized Zone)
  - rules
  - proxy service
  - reverse proxy

11. Name resolution
  - DNS (Domain Name Server)

IV. TCP/IP utilities.

V. TCP/IP and the internet

18. Email
  - outgoing email: SMTP
  - incoming email: POP3/IMAP

19. Streaming and casting
  - stack:
    RTP
    UDP
    Internet layer
    Network Access layer

VI. Advanced topics

C Traps and Pitfalls

C Traps and Pitfalls - by Andrew Koenig, 1989, 160 pages. ISBN-10: 0201179288.

Andrew Koenig wrote a small reference manual on C programming based on his experience when he worked at AT&T lab. Since it received wide acceptance, he added more material and resulted in this book. It's highly recommended to any serious C programmers. Some contents are kind of outdated, such as those about problems of early day compilers. But many points are still ubiquitously applicable.


- Introduction

- Chapter 1. Lexical pitfalls
  1.3 greedy principle in evaluating statement.
      e.g. a --- b is: a-- - b, 
           y = x/*p is: y=x plus the start of a comment! 
                        Should write as y = x / *p or y = x / (*p).
           a +++++ b is: a++ ++ + b. This has compile error since a++ is not l-value.
      Be careful, if a number starts with 0, it's an octal number.

- Chapter 2. Syntax pitfalls
  2.1 Function declaration.
      This casts address 0 as the pointer to function void f() and call it.
      (* (void (*)()) 0)(); 
  2.2 Operator precedence.
      if (a == 'a' || b = 'b' || c == 'c') {} is equivalent to:
      if ((a == 'a' || b) = ('b' || c == 'c')) {}
  2.6 dangling else.
      if (a) 
        if (b) b ++;
      else { c ++; }
      is equivalent to:
      if (a) {
        if (b) b ++;
        else { c ++; }
      }

Chapter 3. Semantic pitfalls
  3.6 off-by-1 error - is the most prevalent.
      One solution in C: 0-based C array, using asymmetric boundary.
      e.g. x >= 16 && x < 38. 38-16 is exactly the size of the range.
  3.9 Check overflow.
      method 1: if ((unsigned) a + (unsigned) b > INT_MAX) // overflowed
      method 2: if (a > INT_MAX - b) // overflowed

Chapter 4. Linking
  lint.

Chapter 5. Library functions
  5.1 int getchar() - return types does not match function, can cause problem.
  5.2 I/O buffer. 
      setbuf(stdout, (char *) 0); // no buffer
  5.4 errno
  5.5 signal, longjmp

Chapter 6. Preprocessor
  - macro
  - use of -- or ++ can cause unexpected side effects in macro.
  6.3 assert
      #define assert(e) \
              ((void) ((e) || _assert_error(__FILE__, __LINE__)))

Chapter 7. Portability
  7.5 >> and divide by 2. >> is faster and works correctly when var is positive.
      e.g. replace "mid = (left + right) / 2" with "mid = (left + right) >> 1".
  7.11 literal string can represent an array.
      e.g. "0123456789"[n%10] is equivalent to: a[] = "0123456789"; a[n%10];
           This is used in systems where 0-9 may not be consecutive.

Chapter 8. Suggestions and answers.
  - Optimize binary search: 
    1) use >> instead of 1/2, 2) use pointer instead of array indexing.
  - Little/big ending.
  - getchar(), putchar() have both macro and function definitions. macro is faster.
  - atol().

Appendix A
  - printf. %p, %n, %06d is equivalent to %.6d in most cases.
  - varargs
  - stdarg

Author's suggestions on using C++:
  - avoid pointer
  - use library
  - use class

Wednesday, July 7, 2010

Smale's Sphere Eversion Paradox

Interesting video demonstrates the process and explains why.

Tuesday, July 6, 2010

Some algorithm/system resources

- A Collection of Dice Problems. For probability issues.
- Lock free algorithms.
- Sorting and searching
- The Art of Debugging with GDB, DDD, and Eclipse. For debugging multi-threaded applications.
- A Memory Allocator. About implementation of malloc() etc.
- Some Storage Management Techniques for Container Classes. C++ stuff.
- Beej's Guide to Network Programming Using Internet Sockets. One page guide.
- Some interview questions.
- Technical Interview Blog (Java, 2007)

Friday, July 2, 2010

Dynamic Programming

Some links to DP:

- Dynamic Programming (in Chinese)
- Dynamic Programming Practice Problems
- Book chapter on Dynamic Programming

Friday, June 18, 2010

Adding PHP snippets through the Drupal user interface

Enable PHP input filter: Go to Administer -> Site building -> Modules, and check PHP Filter under core - optional. Now when create new content, a PHP option will appear under input formats - select this to paste a PHP code snippet. This opens a powerful door to creating dynamic content on a Drupal site.

See Drupal: A beginner's guide to using snippets.

Often a PHP snippet is written inside a drupal module block. What if you want to access Drupal database from an external PHP file? Below is an example. See here for reference.

<?php
// Root path of drupal site, can be obtained using getcwd().
$path = "/web/1/www.hawaii.edu/drupalc/";
chdir($path);

// include needed files
include_once('includes/bootstrap.inc');
include_once('includes/database.inc');
//include_once('includes/database.mysql.inc'); // Disable this for drupal 6.

// Launch drupal start: configuration and database bootstrap
conf_init();
drupal_bootstrap(DRUPAL_BOOTSTRAP_CONFIGURATION);
drupal_bootstrap(DRUPAL_BOOTSTRAP_DATABASE);

// Page start. Output page as an Excel file download.
header("Content-type: application/vnd.ms-excel");
header("Content-Disposition: attachment; filename=\"excel_download.xls\"");
header("Pragma: no-cache");
header("Expires: 0");

// table header
echo "<table border='1'>";
db_query("SELECT * FROM users"); // access to database.
// more processing ...
echo "</table>";

?>

Friday, June 4, 2010

C#.Net recover password by email

In C#.Net 2005, the asp:PasswordRecovery control by default use User Name to recover password. But sometimes the user may forget user name, and User Email is used instead. For this we can do it this way (from here):

In aspx page:
<asp:PasswordRecovery Id="PasswordRecovery1" runat="server" OnVerifyingUser="PasswordRecovery1_VerifyingUser">
</asp:PasswordRecovery>

In Code behind:
protected void PasswordRecovery1_VerifyingUser(object sender, LoginCancelEventArgs e) {
PasswordRecovery1.UserName = Membership.GetUserNameByEmail(PasswordRecovery1.Us erName);
}

Thursday, June 3, 2010

Javascript resources

- Unbelievably easy javascript for sortable table header columns: Available here.
- Calendar date picker: datepickercontrol. Available here.
- Slider: Easy slider.

Wednesday, June 2, 2010

The Longest Palindrome problem

See here for a linear solution.

Another blog for linear solution.

Another linear solution is to use suffix tree.

More Effective C++

Basics

Pointers and References.
- A reference must be initialized. There is no null reference.
- result of the following is undefined:
char *pc = 0; // set pointer to null
char& rc = *pc; // make reference refer to
- Reference is more efficient, b/c there is no need to test its validity.
- Pointers should generally be tested against NULL first.
- A pointer can be reassigned, but a reference does not change.
- certain operators need to use reference, e.g., [].

Prefer C++-style casts. (over C-style cast: (type) expression)
- static_cast (similar to C-style cast in fxn): static_cast(expression)
- const_cast: cast away the constness or volatileness of an expression. enforced by compiler
- dynamic_cast: perform safe casts down or across an inheritance hierarchy. Failed casts are indicated by a null pointer (when casting pointers) or an exception (when casting references).
- reinterpret_cast: result is implementation-defined. rarely portable.

Never treat arrays polymorphically.
- not able to distinguish between base type and derived type for correct polymorphism.

Avoid gratuitous default constructors.

Operators

Be wary of user-defined conversion functions.
- Single-argument constructors
- Implicit type conversion operators (better avoid)

Distinguish between prefix and postfix forms of increment and decrement operators.
- E.g. Class UPInt. prefix ++ returns reference, postfix ++ returns const object.
- i ++++ is inhibited.
- i ++ is less efficient because it creates a temporary copy of its return value.

Never overload &&, ||, or ,.
- otherwise it'll lose short-circuit semantics, and sequence of evaluation becomes uncertain.

Understand the different meanings of new and delete.
- new operator: 1) allocates memory, 2) calls constructor
- operator new: does memory allocation only. knows nothing about constructor
- placement new
- Deletion and Memory Deallocation
- Array

Exceptions

Use destructors to prevent resource leaks.
- pointer operation may lead to memory leak if exception is thrown
- a possible solution is to use local object instead of pointer, with help of smart_ptr, or STL auto_ptr.

Prevent resource leaks in constructors.
- destructor deletes only FULLY constructed objects. So if an exception is thrown in constructor, destructor won't be called.

Prevent exceptions from leaving destructors.
- if control leaves a destructor due to an exception while another exception is active, C++ terminates the program.
- stack unwinding.

Understand how throwing an exception differs from passing a parameter or calling a virtual function. (more reading needed)

Catch exceptions by reference.
- four standard exceptions: 1) bad_alloc (thrown when operator new (see Item 8) can't satisfy a memory request), 2) bad_cast (thrown when a dynamic_cast to a reference fails; see Item 2), 3) bad_typeid (thrown when dynamic_cast is applied to a null pointer), and 4) bad_exception (available for unexpected exceptions).

Use exception specifications judiciously. (more reading needed)

Understand the costs of exception handling.

Efficiency

Remember the 80-20 rule.

Consider using lazy evaluation. (more reading needed)

Amortize the cost of expected computations.
- over eager evaluation, caching
- prefetching

Understand the origin of temporary objects.

Facilitate the return value optimization.

Overload to avoid implicit type conversions.

Consider using op= instead of stand-alone op.

Consider alternative libraries.

Understand the costs of virtual functions, multiple inheritance, virtual base classes, and RTTI.
- virtual table (vtable)
- RTTI: runtime type identification

Techniques

Virtualizing constructors and non-member functions.

Limiting the number of objects of a class.

Requiring or prohibiting heap-based objects.

Smart pointers.

Reference counting.
- a simple form of garbage collection.

Proxy classes.

Making functions virtual with respect to more than one object.

Miscellany

Program in the future tense.

Make non-leaf classes abstract.

Understand how to combine C++ and C in the same program.
- name mangling
- initialization of statics
- dynamic memory allocation
- data structure compatibility

Familiarize yourself with the language standard.

Recommended Reading

An auto_ptr Implementation

Tuesday, June 1, 2010

Effective C++

Scott Meyers' Effective C++ (1997)

Shifting from C to C++

Prefer const and inline to #define
- Value by #define does not go to symbol table, but processed by preprocessor.
- For const pointer, needs to be like const char * const a = "zzz";
- Class-specific constants: in class declaration: "static const int a;", need to define outside class.

Prefer <iostream> to <stdio.h>
- stdio.h is not type-safe and not extensible.
- e.g. friend ostream& operator<<(ostream& s, const Rational& r);
- iostream is in std (preferred), iostream.h is in global range.

Prefer new/delete to malloc/free
- Reason: malloc/delete do not know constructor/destructor

Prefer C++ style comments (// ...) over C'(/* ... */)
- Note some preprocessors only recognize /* ... */

Memory management

Use the same form in corresponding uses of new and delete.
- Example:
string *stringPtr1 = new string;
string *stringPtr2 = new string[100];
...
delete stringPtr1; // delete an object
delete [] stringPtr2; // delete an array of

Use delete on pointer members in destructors
- Delete a NULL pointer does no harm (free a NULL pointer causes error though)
- One way to avoid using delete is to use smart pointers (e.g. auto_ptr in STL)

Be prepared for out-of-memory conditions (more reading needed..)
- When new fails, it throws an exception std::bad_alloc (in old compilers, it may return NULL)
- assert is a macro. It does not work when NDEBUG is defined.

Adhere to convention when writing operator new and operator delete. (more reading needed..)

Avoid hiding the "normal" form of new.
- Declare a function called "operator new" inside the class would block access to the "normal" form of new.
- Two solutions: 1) overload operator new, 2) provide default value for additional parameters.
- Example:

class X {
public:
  void f();
  static void * operator new(size_t size, new_handler p);
  static void * operator new(size_t size)
  { return ::operator new(size); }
};
X *px1 = new (specialErrorHandler) X;  // calls X::operator new(size_t, new_handler)
X* px2 = new X;                        // calls X::operator new(size_t)

Write operator delete if you write operator new. (new/delete should be paired) (more reading needed..)

Constructors, Destructors and Assignment Operators

Declare a copy constructor and an assignment operator for classes with dynamically allocated memory.

Prefer initialization to assignment in constructors. (more reading needed..)

List members in an initialization list in the order in which they are declared.
- Otherwise there is overhead for compiler to track information.

Make sure base classes have virtual destructors.
- Base class virtual destructor should be used when base class has virtual fxns.
- No virtual fxns in a base class often means it's not suitable to be a base class.
- When need an abstract class, it may be convenient to declare destructor as pure virtual destructor, but then a definition of it should also be defined.

Have operator= return a reference to *this. (more reading needed..)
- So as to be able to chain assignments together (assignment is right associative)

Assign to all data members in operator=. (more reading needed..)

Check for assignment to self in operator=. (more reading needed..)

Classes and Functions: Design and Declaration

Strive for class interfaces that are complete and minimal.

Differentiate among member functions, non-member functions, and friend functions.
- Member fxns can be virtual, non-member fxns can't.
- Operator>> and operator<< are never members.
- Only non-member functions get type conversions on their left-most argument.
- Everything else should be a member function.

Avoid data members in the public interface.
- Use access/inline functions instead of data members.

Use const whenever possible. (more reading needed)
- const value and const pointer
- mutable
- const_cast()

Prefer pass-by-reference to pass-by-value.
- The meaning of passing an object by value is defined by the copy constructor of that object's class. This can be expensive.
- Bad use: Student returnStudent(Student s) { return s; }
- Good use: const Student& returnStudent(const Student& s) { return s; }
- Slicing problem. (Pass by base class type will cut off derived class members.)
- Aliasing.
- Reference is implemented by pointer.

Don't try to return a reference when you must return an object.
- E.g., operator* should return an object instead of reference.

Choose carefully between function overloading and parameter defaulting.
- 2 questions: 1) is there a value you can use for a default? 2) how many algorithms do you want to use?

Avoid overloading on a pointer and a numerical type.

Guard against potential ambiguity. (more reading needed)

Explicitly disallow use of implicitly generated member functions you don't want.
- E.g. operator=

Partition the global namespace. (more reading needed)

Classes and Functions: Implementation

Avoid returning "handles" to internal data.

Avoid member functions that return non-const pointers or references to members less accessible than themselves.

Never return a reference to a local object or to a dereferenced pointer initialized by new within the function.
- e.g. this is bad because the difficulty of applying delete:
inline const Rational& operator*(const Rational& lhs, const Rational& rhs)
{ Rational *result = new Rational(lhs.n * rhs.n, lhs.d * rhs.d); return *result; }

Postpone variable definitions as long as possible.
- instead of "string encrypted; encrypted = password;", use "string encrypted = password;", this avoids calling default constructor on string.

Item 33: Use inlining judiciously.
- inline function used extensively may increase code size a lot.
- inline, like "register", is only a hint to compiler.
- uninlined inline function, in old rule, is treated as static and included into every translation unit.
- library needs careful consideration on inline function, inline functions make it impossible to provide binary upgrades to the inline functions in a library - all clients have to recompile.
- inline function should avoid using static variable.
- most debuggers have problem with inline function.

Minimize compilation dependencies between files.
- should do: replacement of dependencies on class definitions with dependencies on class declarations
- Avoid using objects when object references and pointers will do.
- Use class declarations instead of class definitions whenever you can.
- Don't #include header files in your header files unless your headers won't compile without them.
- Handle/body class, envelope/letter class.

Inheritance and Object-Oriented Design

Make sure public inheritance models "isa."
- isa == public inheritance
- Two other common inter-class relationships are "has-a" and "is-implemented-in-terms-of."

Differentiate between inheritance of interface and inheritance of implementation. (more reading needed)

Never redefine an inherited nonvirtual function.
- nonvirtual - statically bound
- virtual - dynamically bound

Never redefine an inherited default parameter value.
- default parameters are statically bound!

Avoid casts down the inheritance hierarchy.
- down cast: from a base class pointer to a derived class pointer
- down cast leads to a maintenance nightmare
- safe downcasting: by using dynamic_cast

Model "has-a" or "is-implemented-in-terms-of" through layering.
- difference between isa and is-implemented-in-terms-of

Differentiate between inheritance and templates. (more reading needed)

Use private inheritance judiciously. (more reading needed)
- compilers will generally not convert a derived class object into a base class object if the inheritance relationship between the classes is private.
- members inherited from a private base class become private members of the derived class, even if they were protected or public in the base class.
- private inheritance means is-implemented-in-terms-of.
- use layering whenever you can, use private inheritance whenever you must.
- template-induced code bloat. It is not a good thing.

Use multiple inheritance (MI) judiciously. (more reading needed)
- MI leads to many problems, one is ambiguity (diamond inheritance).

Say what you mean; understand what you're saying.

Miscellaneous

Know what functions C++ silently writes and calls.
- default constructor, destructor, assignment/copy constructor, address-of operator, const address-of operator.

Prefer compile-time and link-time errors to runtime errors.

Ensure that non-local static objects are initialized before they're used. (more reading needed)

Pay attention to compiler warnings.

Familiarize yourself with the standard library.

Improve your understanding of C++.

Sunday, May 30, 2010

The Longest Increasing Subsequence problem

The Longest Increasing Subsequence problem can be solved by these methods:

1) Sort the sequence L into L_sorted, then find LCS(L, L_sorted). Since sort takes O(NlogN) time and LCS takes O(N^2) time, total it takes O(N^2) time.
2) There is a DP method [2]. It's O(N^2) and can be improved to O(NlogN) by using binary search.
3) Find the longest path in a directed acyclic graph. The source provides no detail on this solution. But it should be like building a tree with internal links (a DAG). It won't be easy.
4) Patience sort [3]. This is a card game and easy to understand by taking a deck of cards to try out oneself. Similar to the DP method, this is also O(N^2) and can be improved to O(NlogN) by using binary search. I prefer this method, since it's easy to find both the length and one of the LIS sequences.

Patience sort is an interesting sort [4]. This is one case where greedy strategy works optimally. Here are some excerpts from [4] about the history of patience sort:

The name "patience sorting" comes from Mallows (1973), who credits A.S.C. Ross for its invention for the motivation of sorting a deck of cards. Mallow's analysis was done in 1960 but not published until a long time later. Robert Floyd discovered this
independently and exchanged idea on it with Donald Knuth in private letters. Hammersley (1972) independently recognized its use as an algorithm for computing the length of LIS.

A related problem of LIS is LCIS: Longest Continuous Increasing Subsequence. This is much easier. A one-pass traversal of the sequence will do. we just keep 2 variables: the length of the current LCIS, and the ending location of it. At the end we know both the length and the location.

[1] http://en.wikipedia.org/wiki/Longest_increasing_subsequence
[2] http://blog.programfan.com/article.asp?id=13086
[3] http://en.wikipedia.org/wiki/Patience_sort
[4] Longest Increasing Subsequences: From Patience Sorting to the Baik-Dieft-Johansson Theorem. Caroline Gates, D. Aldous, Bull. Amer. Math. Soc., 36:413-32 (1999)

Tuesday, May 25, 2010

ASP upload file size upper limit

When use multipart/form-data type to upload file in ASP, there is a problem of file size limit. On windows server 2003, this limit is about 200KB (204800 bytes). This is not enough since that's really a small file size in today's standards.

The solution on windows server 2003 (IIS 6.0) is to set a bigger value on variable AspMaxRequestEntityAllowed, which is in systemroot\System32\Inetsrv\Metabase.xml. On earlier versions of IIS, the value of AspMaxRequestEntityAllowed can be set in the registry.

Now since big file upload can take long, it may need to set ASP ScriptTimeout value bigger than default (90 seconds). Can do this by "<%Server.ScriptTimeout[=NumSeconds]%>" [3], or in IIS6.0 can set this in IIS manager [4].

References:
[1] What is the limit on Form / POST parameters?
[2] Description of the MaxClientRequestBuffer Registry Value
[3] ASP ScriptTimeout Property
[4] Setting the ASP Script Timeout (IIS 6.0)

Friday, May 21, 2010

10 Papers Every Programmer Should Read

Interesting.

- 10 Papers Every Programmer Should Read (At Least Twice)
- 10 Papers Every Software Architect Should Read (At Least Twice)

Thursday, May 20, 2010

Disk scheduling algorithms

Several common disk scheduling algorithms:

First Come First Served (FCFS)
Shortest Seek Time First (SSTF)
SCAN (Elevator algorithm)
Circular SCAN (C-SCAN)
LOOK
Circular LOOK (C-LOOK)

References:
[1] Disk scheduling
[2] Wiki: elevator algorithm
[3] Example of calculation

Lecture by Vladimir Vapnik

Lecture on Empirical Inference by Vladimir Vapnik.

Coin tossing

Coin tossing is a game form ancient time. Yet there are interesting issues related and research going on.

One example is to simulate an unbiased coin with a biased one. von Neumann gave a simple method: HT is counted as Head, TH is counted as Tail, HH and TT are discarded. This can have extension to be more efficient in the sense of needed less number of tosses per decision though, e.g., HHTT can be counted as Head, TTHH can be counted as Tail as well [2].

Search "biased coin toss" would give some interesting research papers on relevant issues. Some examples are:

[1] Optimal random number generation from a biased coin (2005)
[2] Tossing a Biased Coin
[3] Tree algorithm for unbiased coin tossing with a biased coin (1984)
[4] "Topic 5: random generation" from CSE 103: PROBABILITY AND STATISTICS -- Readings

Wednesday, May 12, 2010

OLE with Perl

Perl can do many things beyond normal expectation. One example is with OLE objects. If you have MS Office installed on your computer, you can try the following code. It creates a word document and saves as C:\test.doc. You will see the Word document automatically opening up, adding lines and closing down. This demonstrates the capability of Perl to manipulate MS Office files. Application is such as automatic business report generation.

#!\usr\bin\perl -w
# http://www.adp-gmbh.ch/perl/word.html
# http://www.xav.com/perl/faq/Windows/ActivePerl-Winfaq12.html
# http://www.ngbdigital.com/perl_ole_word.html

use warnings;
use strict;

use Win32::OLE;

my $word = CreateObject Win32::OLE 'Word.Application' or die $!;
$word->{'Visible'} = 1;

my $filename = "C:\\test.doc";
my $document = $word->Documents->Add;

my $selection = $word->Selection;

$selection -> TypeText("Hello HomeTom");
$selection -> TypeParagraph;
$selection -> TypeText("How are you doing today?");
$selection -> TypeParagraph;

$selection -> TypeText("Great. How about you?");
$selection -> {'Style'} = "Heading 1";

$selection -> TypeParagraph;

my $heading_1 = $document->Styles("Heading 1");
my $heading_1_font = $heading_1 -> Font;

$heading_1_font -> {Name} = "Bookmann";
$heading_1_font -> {Size} = 20;
$heading_1_font -> {Bold} = 1;

# Save As
$word->ActiveDocument->SaveAs({FileName => $filename});

$selection -> Typeparagraph;
$selection -> TypeText("Now save and exit after 3 seconds");

sleep 3;

$word->Documents->Close;
$word->Quit;

1;

Monday, May 3, 2010

Min edge coverage problem

Problem: Given a directed graph, find a minimal set of vertices which touch all edges
within a graph.

This is a NP-complete problem. No polynomial time solution exists.

One solution is to get powerset of all vertices, then for each vertice subset, check if it covers all edges in the graph. Return the subset with the smallest size.


min_size = Infi;
set_index = -1;
Generate all possible vertex set S;
for all set si in S
  if union of edges covered by each vertex in set si == E
    if (si.size() < min_size)
      min_size=si.size();
      set_index = i;
return i;

This solution is from Careercup Top 150 Questions.

Sunday, May 2, 2010

Architecture of high-throughput, scalable web application

The points here are taken from this link (In Chinese).


A. Some notes from the book Building Scalable Web Sites (ISBN 0596102356, 2006, 352 pages).

1. Scale up a web application
   - vertically scale up: increase setting of single machine (memory, CPU).
   - horizontally scale up: add more machines
2. Redundancy
   - Back up: hot back up (online backup), cold back up (offline backup).
3. Load balancing
   - session/session-less load balancing
   - hardware/software load balancing
4. Cache

B. Article by Ni Haitao.

1. Use static html: convert dynamic content to static html.
2. Separate image/graphics server from application server.
3. Use database cluster instead of single database.
4. Use Cache. E.g.
   Apache - mod_proxy, squid; 
   Linux - memcached; 
   PHP - Pear cache, eaccelerator, Apc, XCache.
5. Use mirror site.
6. Load balancing.
   - hardware 4-layer exchange
   - software 4-layer exchange
   - 7-layer exchange