Wednesday, December 23, 2009

CVS

Quick start to CVS on linux.

CVSNT/CVS 2.x client/server downloads for windows, linux and Mac OS X:
http://www.march-hare.com/cvspro/

WinCVS client also can be separately downloaded from WinCVS download.



Set up CVSNT 2.5.04 on WindowsXP or Windows 2003 server:
(The following steps are extracted from here)

1. Preparation
1) Prepare a NTFS partition for CVSNT. [this is not necessary]
2) In folder's Tools -> Folder Options -> View -> Advanced Settings,
Cancel "Use Simple File Sharing (recommended)". [this is not necessary]
3) Create two folders:
- D:\Temp, set its security settting to "fully control" by all users.
- D:\CVSRepo. This will be where repositories are stored in.
2. Installation of CVSNT. Use default settings.
3. Configure
1) First create two windows system accounts, say CVS_Mgr and CVS_Usr.
2) Open CVS control panel, in tab "About", Click on "Stop" to stop CVSNT service.
3) In "server settings" tab, choose "Temporary" folder to the previous created D:\Temp.
Choose "CVS_Mgr" or "CVS_Usr" for "Run as user".
4) In "Repository configuration" tab, add new repository,
set "Location" to "D:\CVSRepo", and set "Name" to "\CVSRepo".
Leave "Default repository" unchecked and the rest 3 checkboxes checked.
5) Now in CVS control panel, in tab "About", Click on "Start" to start CVSNT service.
6) Add users. Do this from command line in a DOS windows.
Let CVS_Mgr belong to the Administrators group.
In command line, type these:
set cvsroot=:sspi:<computer name>:/CVSRepo
cvs passwd ­-a CVS_Mgr
cvs passwd ­-a CVS_Usr
It will prompt for the passwords of these 2 users.
Now the setup is done.
7) Use the following to test it:
In a DOS window, type
set cvsroot=:pserver:CVS_Mgr@<computer name>:/CVSRepo
cvs login
cvs ls
This will list the current files in the repository.
Now one can connect to the CVSNT server from CVS clients on other machines.

Saturday, December 12, 2009

Bloom Filter

One problem is to determine if a given value x exists in a set S. When the size of the set S is huge, there can be computational cost and storage difficulty.

Bloom filter solves this problem by a way similar to this:
1) preprocess the set: for every value w in the set S, compute a feature vector (v_w1, v_w2, .. v_wn). In the entire vector space, set these as 1, set the rest as 0.
2) compute the feature vector for the given value x (v_x1, v_x2, ..., v_xn). Now if all of v_x1, v_x2, ..., v_xn are 1, then it's possible the value x is in the set S.

The issue is possible false-positive: x is not in the set, but its feature vector are all 1s. To solve the issue, the key is to choose good vector computation function(s). Usually the probability of false positive is 0.0001 or below.

The advantage, on the other hand, is that there is NO false-negative: if computation shows the value x does not exist, then this is 100% correct. Therefore bloom filter is very reliable and efficient in ruling out a value that does not exist.

See this page for a more detailed explanation for bloom filter, and a piece of comment.
More topics of the application of math from this author are here.

Read/Write Excel on Windows & Linux

On windows read/write Excel is easy. Set up ODBC for read and write connections on source and destination Excel files. For some reason Microsoft does not allow delete records in Excel through the ODBC interface (actually might be in any other interfaces), so delete can be done by replacing the current Excel with a blank Excel at the File System level.

In linux, there are modules created by people in PHP to do this, such as PHPExcel. So you download the library and use it. In Perl, the modules Spreadsheet::ParseExcel and Spreadsheet:WriteExcel are written by Takanori Kawai and John McNamara in 2000 and tincluded into CPAN. If your Perl installation does not come with this, install them (e.g., using PPM on windows or CPAN on linux). Some example code are here. This code works for Excel up to version 2000. I didn't try on more recent versions. On windows Perl also can manipulate Excel using the Win32::OLE package which is recommended.

To write Excel file, there is a shortcut to output to a *.csv file. A *.csv file uses comma as delimiter and is opened by Excel, then you can save it as a true Excel file. To escape comma in a *.csv file, quote the entry by double quotes, e.g., "a,b". To include a double quote in an quoted entry, replace it with two double quotes, e.g., "a""b".

Thursday, December 10, 2009

Network tools

To find out what server (IIS/Apache etc.) a site is running:
http://uptime.netcraft.com/up/graph
or: http://whois.domaintools.com/thedomainname.com

Many tools for analyzing web stuff:
http://www.dnsstuff.com/tools/

Saturday, December 5, 2009

DOS batch file to backup files

Batch file backup_mystuff.bat:

echo off
REM
REM This script backs up files.
REM Files listed in %exclude_list% are excluded.
REM
REM Reference: http://www.ahuka.com/backup/backup3.html
REM By: XC
REM Created on: 12/5/2009
REM Last modified: 12/5/2009
REM
echo.
echo == Back Up Files ==
echo.

REM
REM Use this, and !var! after assignment.
REM Otherwise the input value will be buffered until next execution.
REM
SETLOCAL ENABLEDELAYEDEXPANSION

set "dstDir=mystuff_%date:~10,4%-%date:~4,2%-%date:~7,2%"
set srcDir=D:\wwwroot\mystuff
set exclude_list=mystuff_exclude_list.txt

if EXIST %dstDir% (
(set USRINPUT=)
set /P USRINPUT=Delete existing directory %dstDir% [y/n]:
REM echo Your input was: !USRINPUT!

if "!USRINPUT!" == "y" goto:go_on
if "!USRINPUT!" == "Y" goto:go_on
REM If input is not 'y' or 'Y', exit.
goto:EOF

:go_on
echo Delete existing directory %dstDir%...
rmdir /S /Q %dstDir%
)

echo.Copy files from %srcDir% to %dstDir%, please wait...
echo.

mkdir %str%
xcopy /E /V /Y /Q /EXCLUDE:%exclude_list% %srcDir%\* %dstDir%\.
echo.

And example mystuff_exclude_list.txt file:

D:\wwwroot\mystuff\download
D:\wwwroot\mystuff\aspnet_client

Monday, November 30, 2009

A Perl script to count LOC (Lines Of Code)


#!/usr/bin/perl

#
# This script counts number of lines in files of specified type.
# The counting is done recursively into subdirectory.
# @Usage: perl getLOC.pl [dir|file]
# If no argument is provided, start from current dir ".".
# @By: XC
# @Created on: 11/29/2009
#

print "\n- Line Counter -\n\n";

# specify file type(s) here. Use "[]" to escape ".".
my @types = ("[.]c", "[.]h");
#my @types = ("[.]cs"); #, "[.]aspx");
#my @types = ("[.]asp");

my $dirname = ".";
my $total_loc = 0;

# Get input directory name.
my $argc = $#ARGV + 1;
if ($argc > 0) {
$dirname = $ARGV[0];
}

# Process the starting directory or file.
if (-d $dirname) {
processDIR($dirname);
} else {
if (inTypesArray($dirname)) { countLOC($dirname); }
}

# Output total line count.
print "\n[$dirname] Total Lines: $total_loc\n";

1;


#
# Recursively process directory.
#
sub processDIR() {
my ($dirname) = @_;
my $file;

opendir(DIR, $dirname) or die "can't opendir $dirname: $!";
# Exclude "." and "..".
my @files = grep { !/^\.{1,2}$/ } readdir (DIR);
closedir(DIR);
#sort @files;

foreach (@files) {
$file = "$dirname/$_";

if (-d $file) {
processDIR($file); # Is directory. Recursion.
}
elsif (inTypesArray($file)) {
countLOC($file);
}
}
}


#
# Determine if this file is of specified type.
#
sub inTypesArray() {
my ($f) = @_;
my $t;
foreach $t (@types) {
if ($f =~ /$t$/) { return 1; }
}
return 0;
}


#
# Count number of lines in the file.
#
sub countLOC() {
my ($file) = @_;
my $loc = 0;
my @lines;
my $line_ct;

open(FILE, "$file");
while(<FILE>) {
@lines = split(/\r/, $_);
$line_ct = @lines;
#$loc ++;
$loc += $line_ct;
}
close(FILE);
print "[$file] Lines: $loc\n";

$total_loc += $loc;
}

Thursday, November 12, 2009

Latex and paper preparation

IEEE - Author Digital Tool Box
Here there are templates in both Word and Latex formats.

Latex tips:
- Use \include to include a section would cause the included content to start on a new page. To avoid starting on a new page, use \input instead of \include.
- To install new x.sty file on linux: copy the file to /usr/share/texmf/tex/latex/x/x.sty, then run texhash. Sometimes one may also run texconfig. On Windows there is a GUI tool for updating the hash.

Wednesday, October 28, 2009

Linux admin notes

10/28/2009

- Install software on Unix/Linux
- ./configure, make, su, make install
- Redhat: rpm. Debian: apt-get. Mandrake: urpm. Fedora: yum. Slackware: . Gentoo: Emerge.
- Install binary (.BIN, .SH):
- Install package:

Thursday, September 17, 2009

Linkers and Loaders


== Linkers and Loaders ==
By John R. Levine. 2000. ISBN 1-55860-496-0

- linkers and loaders are part of the software toolkit for almost
as long as there have been computers.
- This book is for:
students, programmers, computer language designers and developers.
- All the linker writers in the world could probably fit in one room,
and half of them would already have this book because they reviewed
the manuscript.

Chapter 1. Linking and Loading

1.1 What do linkers and loaders do?
- Basic job of linker/loader: binds more abstract names to more
concrete names. (name management, address binding)

1.2 Address binding: A historical perspective
- Linker and Loader divides the job: Linker do part of address binding,
assign relative addresses. Loader do final step of assigning actual addresses.

1.3 Linking and Loading
- linker does 1) symbol resolution, loader does 2) program loading.
Either can do 3) relocation.
- There are linking loaders that do all 3 functions
- Both patch object code

- Two-pass linking: linking is fundamentally a 2-pass process:
step 1) collecting info, step 2) linking
object files + shared lib + normal lib + linker control files + cmd line args -->
(linker) -->
Debug symbol file + Executable file + link/load map
- Object code libraries
- Relocation and code modification

1.4 Compiler drivers
- assembly code --> object code --> link object code and library together
- Linker command languages. Ways of passing commands to a linker:
1) command line, 2) intermixed with obj files,
3) embedded in obj files, 4) separate config language.

1.5 Linking: A true-life example

Chapter 2. Architecture issues

- Architecture: 1) hardward (program addressing, instruction formats), 2) OS.

2.1 ABI (Application Binary Interfaces)
- procedure call etc.

2.2 Memory addresses
- Byte order & alignment
IBM/Motorola: big endian
Intel/DEC: little endian
- misalignment: fault, or loss of performance
- register. size: program address

2.3 Address formation
- clean: 360/370/390
- simple: SPARC (RISC): v8 (32-bit), v9 (64-bit)
similar to other RISC arch: MIPS, Alpha
- irregular: x86

2.4 Instruction formats
- opcode operand
- direct/register addressing, base/indexed addressing
- fixed/variable length instruction
SPARC: all 4 bytes
370: 2/4/6 bytes
x86: 1-14 bytes

2.5 Procedure calls and addressibility
- abandon direct addressing for shorter instructions at the cost of
more complicated programming.
- bootstrapping for non-direct addressing
- procedure calls
- stack frame
arguments/local variables - on stack
local/global static variables - on heap

Scalable System Design

  • Access rights for different roles

  • - one header file for each role v.s. one header file (with lots of if/else) for all roles
    - assign ID to each page, assign ID list to each role

  • Slicing system to layers

  • - DB access, business logic, HTML, CSS
    - uniqueness
    - reusability
    - accumulation of code/solution

  • System analysis

  • - Client requirements analysis
    - technical requirements analysis
    - risk points, risk control
    - configuration, separate control points from code
    - data dictionary, allow user to adjust this themselves

  • Communicating to clients

  • - guide/educate clients

  • Team building

  • - source control
    - no new assignment before end of current assignment period even if current assignment is done before schedule
    - grow together
    - mutual tech evaluation/no boss participation

    Friday, August 21, 2009

    Manipulate PDF in .NET

    To convert PDF into image, we have the GFL SDK/GFLAx library as discussed in an earlier post.

    Now the problem is how to convert image into PDF, or how to draw onto PDF. One needs to rely on 3rd party module for this function.

    PDFSharp (http://www.pdfsharp.net/) works well for this. It can create new PDF file, or draw text and image onto existing PDF files. It even can generate barcode image (but guess it can't do recognization). The current version is 1.3, providing both source and assembly download at sourceforge.net. The source code can be used in one's own application, unless is for commercial purpose and needs support. It is written from scratch in C#. The only limit is that it requires .NET version 2.0 or above. To use this in .NET 1.1 or from other framework such as J2EE/LAMP, I think one can do something like a web service call.

    Some people say in web applications PDFSharp cannot run under medium security level. I didn't have this problem, probably because I'm running it on a trusted server, so there is no security restriction.

    One last word: it seems that PDF is frequently used in business applications. Now we have these open source projects that allows PDF convertion to and from other formats. Good to have these.

    Friday, July 24, 2009

    C# DataGrid custom paging

    In .NET 1.1 (I don't know the later version at this time), datagrid paging navigation is troublesome since it does not include link to the first and last page, and does not show the total number of pages. The pager row can contain no other information, which is also inconvenient when you want to put something there, e.g., a checkbox says "select all" for all checkboxes on the rows in the datagrid.

    This is a way of imitating the paging manually, and allows the freedom of specifying "First, Prev, Next, Last" links, as well as total pages and other information on the pager row.

    The "Prev, ..., 11, 12, ..., Next" part can be done with the following function. Then you can add "First" and "Last" as link buttons. You can then provide the page count as a Label. Alternatively, the following example also adds the page count as a simulation of the link button "Last".

    In .aspx.cs page put:

    /// <summary>
    /// This function writes a customized navigation bar for datagrid.
    ///
    /// Note that lblNext and lblPrev can be replaced with image icons.
    ///
    /// Example of Calling this function:
    /// writeDataGridNavBar(
    /// this.DataGrid1.PageCount,
    /// this.DataGrid1.CurrentPageIndex,
    /// this.DataGrid1.PagerStyle.PageButtonCount,
    /// 1
    /// );
    ///
    /// The last parameter "1" here is obtained this way:
    /// Check the aspx page with default paging, look at the links to "1, 2, ...",
    /// javascript:__doPostBack('DataGrid1$_ctl1$_ctl4','')
    /// ^
    /// Pass this number as the last parameter.
    ///
    /// The optional ctrl_LastPage in the code below is similarly obtained from the
    /// "Last" link button.
    ///
    /// </summary>
    /// <param name="totalPage">DataGrid.PageCount</param>
    /// <param name="currentPage">DataGrid.CurrentPageIndex</param>
    /// <param name="pageButtonCount">DataGrid.PagerStyle.PageButtonCount</param>
    /// <param name="ctl_val"></param>
    /// <returns></returns>
    ///
    /// @Author: HomeTom
    /// @Date: 7/24/2009
    ///
    public string writeDataGridNavBar(
    int PageCount, int CurrentPageIndex, int PageButtonCount, int ctl_val)
    {
    string lblNext = "Next";
    string lblPrev = "Prev";
    string ctrl = "javascript:__doPostBack('DataGrid1$_ctl" + ctl_val + "$_ctl";
    // Optional. Use this only when use the "Last" link button.
    string ctrl_LastPage = "javascript:__doPostBack('btnLastPage','')";
    string s = "";
    int i, j, tmp;

    int startPage =
    ((int) (Math.Floor((CurrentPageIndex * 1.0)/PageButtonCount)) * PageButtonCount);

    tmp = PageCount - PageButtonCount;
    if (tmp > 0 && tmp < startPage) { startPage = tmp; }

    if (CurrentPageIndex == 0) { s += lblPrev + " "; }
    else
    {
    j = CurrentPageIndex - startPage;
    if (startPage == 0) j -= 1;
    s += "<a href=\"" + ctrl + j + "', '')\">" + lblPrev + "</a> ";
    }

    if (startPage > 0) { s += "<a href=\"" + ctrl + "0','')\">...</a> "; }

    for (i = 0; i < PageButtonCount && (i + startPage) < PageCount; i ++)
    {
    tmp = startPage + i + 1;
    j = (startPage == 0) ? i : (i + 1);
    if (tmp == CurrentPageIndex + 1) { s += tmp + " "; }
    else { s += "<a href=\"" + ctrl + j + "','')\">" + tmp + "</a> "; }
    }
    if (startPage + PageButtonCount < PageCount - 1) {
    j = (startPage == 0) ? PageButtonCount : (PageButtonCount + 1);
    s += "<a href=\"" + ctrl + j + "','')\">...</a> ";
    }
    if (startPage + PageButtonCount < PageCount)
    {
    s += "<a href=\"" + ctrl_LastPage + "\">" + PageCount + "</a> ";
    }

    if (CurrentPageIndex >= PageCount - 1) { s += lblNext; }
    else
    {
    j = CurrentPageIndex - startPage + 1;
    if (startPage > 0) j += 1;
    s += "<a href=\"" + ctrl + j + "','')\">" + lblNext + "</a>";
    }

    return s;
    }

    Note that if don't want to use the page count link at the end then replace the red region with the following code:

    if (startPage + PageButtonCount < PageCount) {
    j = (startPage == 0) ? PageButtonCount : (PageButtonCount + 1);
    s += "<a href=\"" + ctrl + j + "','')\">...</a> ";
    }

    In .aspx page put:

    <asp:linkbutton id="btnFirstPage" onclick="DataGrid1_CustomPaging_First" Runat="server">First</asp:linkbutton>
    <asp:label id="lblPageNavBar" Runat="server"></asp:label>
    <asp:linkbutton id="btnLastPage" onclick="DataGrid1_CustomPaging_Last" Runat="server">Last</asp:linkbutton>


    Now set DataGrid1.PagerStyle.visible to False.
    Leave DataGrid1.AllowPaging as True, and DataGrid1.AllowCustomPaging as False.

    That's it! You are simulating datagrid's default paging with the freedom of customizing the style!

    --Appendix on 8/20/2009

    Now it's clear that this does not work in later versions of .NET. First the links' format is not DataGrid1$_ctl1$_ctl4, but like DataGrid1$ctl01$ctl04. More importantly, if set the visible property of the link buttons and datagrid pager to false, then __doPostBack won't work for these controls, as they are cleaned from the output. Therefore the above scheme works only for .NET 1.1 :(

    In later versions of .NET there is a pager template that can be used to format paging. Hope that's flexible enough.

    Another thought is that, the use of datagrid and dataview in later versions of .NET, are mostly for the ease of paging and sorting. Think carefully, I don't see anything else that datagrid and dataview can do special. If these can be handled from scratch, then there is no need to use these cumbersome controls. Actually I would prefer such a build-from-scratch approach, it avoids the burden of version incompatibility and allows the largest flexibility. Once the template is done, it can be used all the time without any more learning curve.

    Types of parameters in C#

    There are 4 types of parameter passing in C#:
    1) Value: pass by value.
    2) Out: like pointer in C/C++, allow return value. Don't have to be initialized first.
    3) Ref: like reference in C++, allow return value. Must be initialized first.
    4) Params: for variable length parameter list.

    More detailed explanation and examples:
    http://www.csharphelp.com/archives/archive225.html

    Friday, July 3, 2009

    Statistics and Analysis of IT technique usage in different area

    Statistics and Analysis of IT technique usage in different area (in Chinese)

    Summary:
    C/C++ is strong in system level software.
    .NET is strong in enterprise applications.
    Java is strong on web server side. Php is good too.
    Ajax/JavaScript is strong for RIA (Rich Interface Application). Flex is 1/10, and Silverlight is 1/100.

    Wednesday, July 1, 2009

    Use ADODB.RecordSet in C#

    Usually one can use System.Data.SqlClient to deal with operations on MSSQL server. The corresponding objects are SqlConnection, SqlCommand, SqlDataReader etc.

    There is one scenario like this: the user needs to read something from each row in a table, and use the obtained information to update that row. One needs to cycle through all the records, and then use "UPDATE" query for the purpose. If SqlDataReader is used, the efficiency will be slow. SqlDataReader is read only, and uses the current connection exclusively as well. The user needs to open a second connection, and use the second connection to do the update based on a key.

    In ADODB.RecordSet, one can use RecordSet.Update() function to directly update the current row and write back to database. No second connection is needed, and no key-based query is needed. This is much more efficient. A similar mechanism in SqlClient may exist, but I don't know yet at this time.

    The following is an example using ADODB.RecordSet in C#. Note one needs to add ADODB to the references list of the current C# project first.

    Parameters used by RecordSet can be found at http://www.w3schools.com/ADO/met_rs_open.asp.
    private void updateTbl_ADODB() {
        ADODB.Connection conn = new ADODB.ConnectionClass();
        ADODB.Recordset rs = new ADODB.RecordsetClass();
    
        try {
            string strConn = 
                "Provider=SQLOLEDB;Initial Catalog=[database];Data Source=[server];";
            conn.Open(strConn, [user], [pwd], 0);
    
            string sql = "SELECT * FROM tbl";
            rs.Open(sql, conn, ADODB.CursorTypeEnum.adOpenForwardOnly, 
                    ADODB.LockTypeEnum.adLockOptimistic, -1);
    
            while (rs.EOF == false)  {
                rs.Update("field_name", [new_value]);
                rs.MoveNext();
            }
    
        } catch (Exception e) {
            MessageBox.Show(this, "Error" + e.Message);
        } finally {
            conn.Close();
        }
    }

    Wednesday, April 29, 2009

    More Programming Pearls - Reading notes

    ==## Part I: Programming techniques ##==

    ==Column 1== Profilers

    - Profilers: 1) line-count, 2) procedure-time
    - e.g. Optimize prime testing
    * test until square root
    * rule out 2, 3, 5
    * use multiplication instead of division
    * test only previous primes
    * simple sieve of Evatosthenes
    (complexity: time - O((nlogn)(loglogn)), space - O(n)).
    (Complexity with optimizations such as wheel factorization: time - O(n), space -
    O(n1 / 2loglogn / logn)

    - A specialized profiler
    - Building profiler: 1) insert counter to source code, 2) run it, 3) collect counter output.

    ==Column 2== Associative arrays

    - i.e., Associative array in AWK, or Hash in perl.
    - e.g., A finite state machine simulator
    - e.g., Topological sorting

    ==Column 3== Confession of a coder

    - Scaffolding for testing and debugging
    - e.g. Binary search
    - e.g. k-th smallest selection
    - e.g. A subroutine library

    ==Column 4== Self-describing data

    - Name-value pairs
    - Provenances in programming. (metadata, e.g., history of the code)
    - e.g. A sorting lab

    ==## Part II: Tricks of the trade ##==

    ==Column 5== Cutting the gordian knot (finding a clever solution to a complex problem)

    - Gordius tied the knot. Asia was the promised prize to anyone who can untie it. Alexander the Great approached it in 333 B.C. He drew his sword and slashed through the knot. Asia was soon his.
    - Mail sorting: instead of let a clerk do the sorting, just let the mailman drop to different mailbox.
    - Data transmission: instead of an automobile courier, use a pigeon.
    - Random sample: draw from a box.

    ==Column 6== Bumper-sticker computer science

    - pi seconds is a nano century
    - Plagiarism is the sincerest form of flattery
    - Coding
    * If you can't write it down in English, you can't code it.
    - User Interface
    - Debugging
    - Performance
    - Documentation
    - Managing software
    - Miscellaneous rules

    ==Column 7== The envelope is back (Back of envelop calculation)

    - Order of magnitude.
    - Little's Law (Law of system flow and congestion): the average of things in the system is the product of the average rate at which things leave the system and the average time each one spends in the system.

    ==Column 8== The furbelow memorandum

    - Larger programmer stuff team leads to larger cost than expected
    - Fred Brooke's Mythical Man Month (1975).

    ==## Part III: I/O Fit for Humans ##==

    ==Column 9== Little language

    - The PIC language

    ==Column 10== Document design

    - Tables
    - 3 design principles: interaction, consistency, minimalism
    - Figures
    - Text
    - Medium

    ==Column 11== Graphic output

    ==Column 12== A survey of surveys

    ==## Part IV: Algorithms ##==

    ==Column 13== Random sampling

    - A flawed algorithm
    - * Floyd's algorithm
    - * Random permutations

    ==Column 14== Birth of a cruncher

    - The problem
    - Newton iteration
    - Decide starting point
    - Coding

    ==Column 15== k-th smallest selection

    - Problem
    - Program.
    * O(n log(n)) algorithm
    * An O(n) algorithm by L.A.R. Hoare: by quick-sort
    - Analysis
    - Principles

    Thursday, April 16, 2009

    Programming Pearls - Reading notes

    ==## Part I ##==Preliminaries

    ==Column 1== Cracking the Oyster

    The major point is find the optimal solution of a problem, instead of going straight with a rash solution.
    The coding example is soring with bit vector (aka bitmap).

    ==Column 2== Aha! Algorithms

    - binary search
    * search
    * find bug by setting checking points in a binary search pattern
    * find missing element in an integer range
    * finding root for equation: bisection method in numerical analysis

    - the power of primitives
    Problem: rotating an array ab to ba.
    Solutions:
    * copying. but this is space inefficient
    * juggling
    * recursive swapping
    * define primitive action reverse(): a b -> a^r b -> a^r b^r -> b a

    - sorting
    Problem: find anagrams in a dictionary.
    Solution: get signature for each word.

    ==Column 3== Data structures programs

    - Use of array
    - Structuring data
    - Powerful tools for specialized data

    Don't write a big program when a little one will do.

    ==Column 4== Writing correct programs

    - binary search - hard to get right
    - program verification, invariant

    ==Column 5== A small matter of programming

    - Use assertion for correctness
    - Scaffolding
    - Automated testing
    - debugging
    - Timing

    ==## Part II ##==Performance

    ==Column 6== Perspective on Performance

    - A case study: Andrew Appel's many-body simulation program
    - Work at many level to achieve performance improvement:
    * problem definition
    * system structure
    * algorithms and data structures
    * code tuning
    * hardware

    ==Column 7==The back of the envelop

    - Calculation by reasonable estimation
    - Quick check: Test by dimension
    - Rules of thumb: e.g., 1) Rule of 72 (for exponential increase), 2) pi seconds is a nanocentury.
    - Performance estimates, and little experiments
    - Safety factors: compensate ignorance with extra safe factors
    - Little's Law: queue size = consumption rate * average wait time

    ==Column 8==Algorithm design techniques

    - Problem: range of array for max sum
    - Solutions:
    * cubic
    * quadratic
    * Divide and conquer (n log(n))
    * scanning (linear)

    ==Column 9==Code tuning

    - Prevent premature optimization
    - Optimization should be made on the bottle-neck part - profiling the program

    ==Column 10==Squeezing space

    - The key is simplicity
    - Example: sparse matrix representation of grid.
    - When simplicity is not sufficient, there are skills to better utilize space:
    * recompute
    * sparse data structure
    * data compression
    * allocation policies
    * garbage collection

    ==## Part III ##==The Product

    ==Column 11==Sorting

    - Insertion sort
    - Quick sort

    ==Column 12== A sample problem

    - Problem: sampling: select m from n integers
    * by selection
    * by shuffling
    - Principles: understand the problem, specify an abstraction, explore design space, implement, retrospect.

    ==Column 13== Searching

    - Problem: store a set of integers (w/o associated data).
    - linear structure
    - binary search trees: STL, BST, BST*, Bins, Bins*, BitVec
    - structures for integers

    ==Column 14== Heaps

    - Heap, Priority Queue, Heap sort
    Comment: the material here can be found in any data structure and algorithm textbook, nothing new.

    == Column 15== Strings of pearls

    - We are surrounded by strings
    - Words. 1) Map, Set, 2) Hash (no worst case guarantee, no order information)
    - Phrases.
    * the longest substring problem - solved by suffix array
    - Generating sentences

    ==Appendix 1== A catalog of algorithms

    - Sorting
    - Searching
    - Other Set algorithms
    - Algorithms on Strings
    - Vector and Matrix algorithms
    - Random objects
    - Numerical algorithms

    ==Appendix 4== Rules for code tuning

    - Space-for-time rules
    - Time-for-space rules
    - Loop rules
    - Logic rules
    - Procedure rules
    - Expression rules

    Blog Archive

    Followers