Open source project on large scale online machine learning: The Vowpal Wabbit Open Source Project
Another website on AI and social science.
Wednesday, August 5, 2009
Friday, July 24, 2009
C# DataGrid custom paging
In .NET 1.1 (I don't know the later version at this time), datagrid paging navigation is troublesome since it does not include link to the first and last page, and does not show the total number of pages. The pager row can contain no other information, which is also inconvenient when you want to put something there, e.g., a checkbox says "select all" for all checkboxes on the rows in the datagrid.
This is a way of imitating the paging manually, and allows the freedom of specifying "First, Prev, Next, Last" links, as well as total pages and other information on the pager row.
The "Prev, ..., 11, 12, ..., Next" part can be done with the following function. Then you can add "First" and "Last" as link buttons. You can then provide the page count as a Label. Alternatively, the following example also adds the page count as a simulation of the link button "Last".
In .aspx.cs page put:
Note that if don't want to use the page count link at the end then replace the red region with the following code:
In .aspx page put:
<asp:linkbutton id="btnFirstPage" onclick="DataGrid1_CustomPaging_First" Runat="server">First</asp:linkbutton>
<asp:label id="lblPageNavBar" Runat="server"></asp:label>
<asp:linkbutton id="btnLastPage" onclick="DataGrid1_CustomPaging_Last" Runat="server">Last</asp:linkbutton>
Now set DataGrid1.PagerStyle.visible to False.
Leave DataGrid1.AllowPaging as True, and DataGrid1.AllowCustomPaging as False.
That's it! You are simulating datagrid's default paging with the freedom of customizing the style!
--Appendix on 8/20/2009
Now it's clear that this does not work in later versions of .NET. First the links' format is not DataGrid1$_ctl1$_ctl4, but like DataGrid1$ctl01$ctl04. More importantly, if set the visible property of the link buttons and datagrid pager to false, then __doPostBack won't work for these controls, as they are cleaned from the output. Therefore the above scheme works only for .NET 1.1 :(
In later versions of .NET there is a pager template that can be used to format paging. Hope that's flexible enough.
Another thought is that, the use of datagrid and dataview in later versions of .NET, are mostly for the ease of paging and sorting. Think carefully, I don't see anything else that datagrid and dataview can do special. If these can be handled from scratch, then there is no need to use these cumbersome controls. Actually I would prefer such a build-from-scratch approach, it avoids the burden of version incompatibility and allows the largest flexibility. Once the template is done, it can be used all the time without any more learning curve.
This is a way of imitating the paging manually, and allows the freedom of specifying "First, Prev, Next, Last" links, as well as total pages and other information on the pager row.
The "Prev, ..., 11, 12, ..., Next" part can be done with the following function. Then you can add "First" and "Last" as link buttons. You can then provide the page count as a Label. Alternatively, the following example also adds the page count as a simulation of the link button "Last".
In .aspx.cs page put:
/// <summary>
/// This function writes a customized navigation bar for datagrid.
///
/// Note that lblNext and lblPrev can be replaced with image icons.
///
/// Example of Calling this function:
/// writeDataGridNavBar(
/// this.DataGrid1.PageCount,
/// this.DataGrid1.CurrentPageIndex,
/// this.DataGrid1.PagerStyle.PageButtonCount,
/// 1
/// );
///
/// The last parameter "1" here is obtained this way:
/// Check the aspx page with default paging, look at the links to "1, 2, ...",
/// javascript:__doPostBack('DataGrid1$_ctl1$_ctl4','')
/// ^
/// Pass this number as the last parameter.
///
/// The optional ctrl_LastPage in the code below is similarly obtained from the
/// "Last" link button.
///
/// </summary>
/// <param name="totalPage">DataGrid.PageCount</param>
/// <param name="currentPage">DataGrid.CurrentPageIndex</param>
/// <param name="pageButtonCount">DataGrid.PagerStyle.PageButtonCount</param>
/// <param name="ctl_val"></param>
/// <returns></returns>
///
/// @Author: HomeTom
/// @Date: 7/24/2009
///
public string writeDataGridNavBar(
int PageCount, int CurrentPageIndex, int PageButtonCount, int ctl_val)
{
string lblNext = "Next";
string lblPrev = "Prev";
string ctrl = "javascript:__doPostBack('DataGrid1$_ctl" + ctl_val + "$_ctl";
// Optional. Use this only when use the "Last" link button.
string ctrl_LastPage = "javascript:__doPostBack('btnLastPage','')";
string s = "";
int i, j, tmp;
int startPage =
((int) (Math.Floor((CurrentPageIndex * 1.0)/PageButtonCount)) * PageButtonCount);
tmp = PageCount - PageButtonCount;
if (tmp > 0 && tmp < startPage) { startPage = tmp; }
if (CurrentPageIndex == 0) { s += lblPrev + " "; }
else
{
j = CurrentPageIndex - startPage;
if (startPage == 0) j -= 1;
s += "<a href=\"" + ctrl + j + "', '')\">" + lblPrev + "</a> ";
}
if (startPage > 0) { s += "<a href=\"" + ctrl + "0','')\">...</a> "; }
for (i = 0; i < PageButtonCount && (i + startPage) < PageCount; i ++)
{
tmp = startPage + i + 1;
j = (startPage == 0) ? i : (i + 1);
if (tmp == CurrentPageIndex + 1) { s += tmp + " "; }
else { s += "<a href=\"" + ctrl + j + "','')\">" + tmp + "</a> "; }
}
if (startPage + PageButtonCount < PageCount - 1) {
j = (startPage == 0) ? PageButtonCount : (PageButtonCount + 1);
s += "<a href=\"" + ctrl + j + "','')\">...</a> ";
}
if (startPage + PageButtonCount < PageCount)
{
s += "<a href=\"" + ctrl_LastPage + "\">" + PageCount + "</a> ";
}
if (CurrentPageIndex >= PageCount - 1) { s += lblNext; }
else
{
j = CurrentPageIndex - startPage + 1;
if (startPage > 0) j += 1;
s += "<a href=\"" + ctrl + j + "','')\">" + lblNext + "</a>";
}
return s;
}
Note that if don't want to use the page count link at the end then replace the red region with the following code:
if (startPage + PageButtonCount < PageCount) {
j = (startPage == 0) ? PageButtonCount : (PageButtonCount + 1);
s += "<a href=\"" + ctrl + j + "','')\">...</a> ";
}
In .aspx page put:
<asp:linkbutton id="btnFirstPage" onclick="DataGrid1_CustomPaging_First" Runat="server">First</asp:linkbutton>
<asp:label id="lblPageNavBar" Runat="server"></asp:label>
<asp:linkbutton id="btnLastPage" onclick="DataGrid1_CustomPaging_Last" Runat="server">Last</asp:linkbutton>
Now set DataGrid1.PagerStyle.visible to False.
Leave DataGrid1.AllowPaging as True, and DataGrid1.AllowCustomPaging as False.
That's it! You are simulating datagrid's default paging with the freedom of customizing the style!
--Appendix on 8/20/2009
Now it's clear that this does not work in later versions of .NET. First the links' format is not DataGrid1$_ctl1$_ctl4, but like DataGrid1$ctl01$ctl04. More importantly, if set the visible property of the link buttons and datagrid pager to false, then __doPostBack won't work for these controls, as they are cleaned from the output. Therefore the above scheme works only for .NET 1.1 :(
In later versions of .NET there is a pager template that can be used to format paging. Hope that's flexible enough.
Another thought is that, the use of datagrid and dataview in later versions of .NET, are mostly for the ease of paging and sorting. Think carefully, I don't see anything else that datagrid and dataview can do special. If these can be handled from scratch, then there is no need to use these cumbersome controls. Actually I would prefer such a build-from-scratch approach, it avoids the burden of version incompatibility and allows the largest flexibility. Once the template is done, it can be used all the time without any more learning curve.
Types of parameters in C#
There are 4 types of parameter passing in C#:
1) Value: pass by value.
2) Out: like pointer in C/C++, allow return value. Don't have to be initialized first.
3) Ref: like reference in C++, allow return value. Must be initialized first.
4) Params: for variable length parameter list.
More detailed explanation and examples:
http://www.csharphelp.com/archives/archive225.html
1) Value: pass by value.
2) Out: like pointer in C/C++, allow return value. Don't have to be initialized first.
3) Ref: like reference in C++, allow return value. Must be initialized first.
4) Params: for variable length parameter list.
More detailed explanation and examples:
http://www.csharphelp.com/archives/archive225.html
Friday, July 3, 2009
Statistics and Analysis of IT technique usage in different area
Statistics and Analysis of IT technique usage in different area (in Chinese)
Summary:
C/C++ is strong in system level software.
.NET is strong in enterprise applications.
Java is strong on web server side. Php is good too.
Ajax/JavaScript is strong for RIA (Rich Interface Application). Flex is 1/10, and Silverlight is 1/100.
Summary:
C/C++ is strong in system level software.
.NET is strong in enterprise applications.
Java is strong on web server side. Php is good too.
Ajax/JavaScript is strong for RIA (Rich Interface Application). Flex is 1/10, and Silverlight is 1/100.
Wednesday, July 1, 2009
Use ADODB.RecordSet in C#
Usually one can use System.Data.SqlClient to deal with operations on MSSQL server. The corresponding objects are SqlConnection, SqlCommand, SqlDataReader etc.
There is one scenario like this: the user needs to read something from each row in a table, and use the obtained information to update that row. One needs to cycle through all the records, and then use "UPDATE" query for the purpose. If SqlDataReader is used, the efficiency will be slow. SqlDataReader is read only, and uses the current connection exclusively as well. The user needs to open a second connection, and use the second connection to do the update based on a key.
In ADODB.RecordSet, one can use RecordSet.Update() function to directly update the current row and write back to database. No second connection is needed, and no key-based query is needed. This is much more efficient. A similar mechanism in SqlClient may exist, but I don't know yet at this time.
The following is an example using ADODB.RecordSet in C#. Note one needs to add ADODB to the references list of the current C# project first.
Parameters used by RecordSet can be found at http://www.w3schools.com/ADO/met_rs_open.asp.
There is one scenario like this: the user needs to read something from each row in a table, and use the obtained information to update that row. One needs to cycle through all the records, and then use "UPDATE" query for the purpose. If SqlDataReader is used, the efficiency will be slow. SqlDataReader is read only, and uses the current connection exclusively as well. The user needs to open a second connection, and use the second connection to do the update based on a key.
In ADODB.RecordSet, one can use RecordSet.Update() function to directly update the current row and write back to database. No second connection is needed, and no key-based query is needed. This is much more efficient. A similar mechanism in SqlClient may exist, but I don't know yet at this time.
The following is an example using ADODB.RecordSet in C#. Note one needs to add ADODB to the references list of the current C# project first.
Parameters used by RecordSet can be found at http://www.w3schools.com/ADO/met_rs_open.asp.
private void updateTbl_ADODB() {
ADODB.Connection conn = new ADODB.ConnectionClass();
ADODB.Recordset rs = new ADODB.RecordsetClass();
try {
string strConn =
"Provider=SQLOLEDB;Initial Catalog=[database];Data Source=[server];";
conn.Open(strConn, [user], [pwd], 0);
string sql = "SELECT * FROM tbl";
rs.Open(sql, conn, ADODB.CursorTypeEnum.adOpenForwardOnly,
ADODB.LockTypeEnum.adLockOptimistic, -1);
while (rs.EOF == false) {
rs.Update("field_name", [new_value]);
rs.MoveNext();
}
} catch (Exception e) {
MessageBox.Show(this, "Error" + e.Message);
} finally {
conn.Close();
}
}
Wednesday, April 29, 2009
More Programming Pearls - Reading notes
==## Part I: Programming techniques ##==
==Column 1== Profilers
- Profilers: 1) line-count, 2) procedure-time
- e.g. Optimize prime testing
* test until square root
* rule out 2, 3, 5
* use multiplication instead of division
* test only previous primes
* simple sieve of Evatosthenes
(complexity: time - O((nlogn)(loglogn)), space - O(n)).
(Complexity with optimizations such as wheel factorization: time - O(n), space - O(n1 / 2loglogn / logn)
- A specialized profiler
- Building profiler: 1) insert counter to source code, 2) run it, 3) collect counter output.
==Column 2== Associative arrays
- i.e., Associative array in AWK, or Hash in perl.
- e.g., A finite state machine simulator
- e.g., Topological sorting
==Column 3== Confession of a coder
- Scaffolding for testing and debugging
- e.g. Binary search
- e.g. k-th smallest selection
- e.g. A subroutine library
==Column 4== Self-describing data
- Name-value pairs
- Provenances in programming. (metadata, e.g., history of the code)
- e.g. A sorting lab
==## Part II: Tricks of the trade ##==
==Column 5== Cutting the gordian knot (finding a clever solution to a complex problem)
- Gordius tied the knot. Asia was the promised prize to anyone who can untie it. Alexander the Great approached it in 333 B.C. He drew his sword and slashed through the knot. Asia was soon his.
- Mail sorting: instead of let a clerk do the sorting, just let the mailman drop to different mailbox.
- Data transmission: instead of an automobile courier, use a pigeon.
- Random sample: draw from a box.
==Column 6== Bumper-sticker computer science
- pi seconds is a nano century
- Plagiarism is the sincerest form of flattery
- Coding
* If you can't write it down in English, you can't code it.
- User Interface
- Debugging
- Performance
- Documentation
- Managing software
- Miscellaneous rules
==Column 7== The envelope is back (Back of envelop calculation)
- Order of magnitude.
- Little's Law (Law of system flow and congestion): the average of things in the system is the product of the average rate at which things leave the system and the average time each one spends in the system.
==Column 8== The furbelow memorandum
- Larger programmer stuff team leads to larger cost than expected
- Fred Brooke's Mythical Man Month (1975).
==## Part III: I/O Fit for Humans ##==
==Column 9== Little language
- The PIC language
==Column 10== Document design
- Tables
- 3 design principles: interaction, consistency, minimalism
- Figures
- Text
- Medium
==Column 11== Graphic output
==Column 12== A survey of surveys
==## Part IV: Algorithms ##==
==Column 13== Random sampling
- A flawed algorithm
- * Floyd's algorithm
- * Random permutations
==Column 14== Birth of a cruncher
- The problem
- Newton iteration
- Decide starting point
- Coding
==Column 15== k-th smallest selection
- Problem
- Program.
* O(n log(n)) algorithm
* An O(n) algorithm by L.A.R. Hoare: by quick-sort
- Analysis
- Principles
==Column 1== Profilers
- Profilers: 1) line-count, 2) procedure-time
- e.g. Optimize prime testing
* test until square root
* rule out 2, 3, 5
* use multiplication instead of division
* test only previous primes
* simple sieve of Evatosthenes
(complexity: time - O((nlogn)(loglogn)), space - O(n)).
(Complexity with optimizations such as wheel factorization: time - O(n), space - O(n1 / 2loglogn / logn)
- A specialized profiler
- Building profiler: 1) insert counter to source code, 2) run it, 3) collect counter output.
==Column 2== Associative arrays
- i.e., Associative array in AWK, or Hash in perl.
- e.g., A finite state machine simulator
- e.g., Topological sorting
==Column 3== Confession of a coder
- Scaffolding for testing and debugging
- e.g. Binary search
- e.g. k-th smallest selection
- e.g. A subroutine library
==Column 4== Self-describing data
- Name-value pairs
- Provenances in programming. (metadata, e.g., history of the code)
- e.g. A sorting lab
==## Part II: Tricks of the trade ##==
==Column 5== Cutting the gordian knot (finding a clever solution to a complex problem)
- Gordius tied the knot. Asia was the promised prize to anyone who can untie it. Alexander the Great approached it in 333 B.C. He drew his sword and slashed through the knot. Asia was soon his.
- Mail sorting: instead of let a clerk do the sorting, just let the mailman drop to different mailbox.
- Data transmission: instead of an automobile courier, use a pigeon.
- Random sample: draw from a box.
==Column 6== Bumper-sticker computer science
- pi seconds is a nano century
- Plagiarism is the sincerest form of flattery
- Coding
* If you can't write it down in English, you can't code it.
- User Interface
- Debugging
- Performance
- Documentation
- Managing software
- Miscellaneous rules
==Column 7== The envelope is back (Back of envelop calculation)
- Order of magnitude.
- Little's Law (Law of system flow and congestion): the average of things in the system is the product of the average rate at which things leave the system and the average time each one spends in the system.
==Column 8== The furbelow memorandum
- Larger programmer stuff team leads to larger cost than expected
- Fred Brooke's Mythical Man Month (1975).
==## Part III: I/O Fit for Humans ##==
==Column 9== Little language
- The PIC language
==Column 10== Document design
- Tables
- 3 design principles: interaction, consistency, minimalism
- Figures
- Text
- Medium
==Column 11== Graphic output
==Column 12== A survey of surveys
==## Part IV: Algorithms ##==
==Column 13== Random sampling
- A flawed algorithm
- * Floyd's algorithm
- * Random permutations
==Column 14== Birth of a cruncher
- The problem
- Newton iteration
- Decide starting point
- Coding
==Column 15== k-th smallest selection
- Problem
- Program.
* O(n log(n)) algorithm
* An O(n) algorithm by L.A.R. Hoare: by quick-sort
- Analysis
- Principles
Thursday, April 16, 2009
Programming Pearls - Reading notes
==## Part I ##==Preliminaries
==Column 1== Cracking the Oyster
The major point is find the optimal solution of a problem, instead of going straight with a rash solution.
The coding example is soring with bit vector (aka bitmap).
==Column 2== Aha! Algorithms
- binary search
* search
* find bug by setting checking points in a binary search pattern
* find missing element in an integer range
* finding root for equation: bisection method in numerical analysis
- the power of primitives
Problem: rotating an array ab to ba.
Solutions:
* copying. but this is space inefficient
* juggling
* recursive swapping
* define primitive action reverse(): a b -> a^r b -> a^r b^r -> b a
- sorting
Problem: find anagrams in a dictionary.
Solution: get signature for each word.
==Column 3== Data structures programs
- Use of array
- Structuring data
- Powerful tools for specialized data
Don't write a big program when a little one will do.
==Column 4== Writing correct programs
- binary search - hard to get right
- program verification, invariant
==Column 5== A small matter of programming
- Use assertion for correctness
- Scaffolding
- Automated testing
- debugging
- Timing
==## Part II ##==Performance
==Column 6== Perspective on Performance
- A case study: Andrew Appel's many-body simulation program
- Work at many level to achieve performance improvement:
* problem definition
* system structure
* algorithms and data structures
* code tuning
* hardware
==Column 7==The back of the envelop
- Calculation by reasonable estimation
- Quick check: Test by dimension
- Rules of thumb: e.g., 1) Rule of 72 (for exponential increase), 2) pi seconds is a nanocentury.
- Performance estimates, and little experiments
- Safety factors: compensate ignorance with extra safe factors
- Little's Law: queue size = consumption rate * average wait time
==Column 8==Algorithm design techniques
- Problem: range of array for max sum
- Solutions:
* cubic
* quadratic
* Divide and conquer (n log(n))
* scanning (linear)
==Column 9==Code tuning
- Prevent premature optimization
- Optimization should be made on the bottle-neck part - profiling the program
==Column 10==Squeezing space
- The key is simplicity
- Example: sparse matrix representation of grid.
- When simplicity is not sufficient, there are skills to better utilize space:
* recompute
* sparse data structure
* data compression
* allocation policies
* garbage collection
==## Part III ##==The Product
==Column 11==Sorting
- Insertion sort
- Quick sort
==Column 12== A sample problem
- Problem: sampling: select m from n integers
* by selection
* by shuffling
- Principles: understand the problem, specify an abstraction, explore design space, implement, retrospect.
==Column 13== Searching
- Problem: store a set of integers (w/o associated data).
- linear structure
- binary search trees: STL, BST, BST*, Bins, Bins*, BitVec
- structures for integers
==Column 14== Heaps
- Heap, Priority Queue, Heap sort
Comment: the material here can be found in any data structure and algorithm textbook, nothing new.
== Column 15== Strings of pearls
- We are surrounded by strings
- Words. 1) Map, Set, 2) Hash (no worst case guarantee, no order information)
- Phrases.
* the longest substring problem - solved by suffix array
- Generating sentences
==Appendix 1== A catalog of algorithms
- Sorting
- Searching
- Other Set algorithms
- Algorithms on Strings
- Vector and Matrix algorithms
- Random objects
- Numerical algorithms
==Appendix 4== Rules for code tuning
- Space-for-time rules
- Time-for-space rules
- Loop rules
- Logic rules
- Procedure rules
- Expression rules
==Column 1== Cracking the Oyster
The major point is find the optimal solution of a problem, instead of going straight with a rash solution.
The coding example is soring with bit vector (aka bitmap).
==Column 2== Aha! Algorithms
- binary search
* search
* find bug by setting checking points in a binary search pattern
* find missing element in an integer range
* finding root for equation: bisection method in numerical analysis
- the power of primitives
Problem: rotating an array ab to ba.
Solutions:
* copying. but this is space inefficient
* juggling
* recursive swapping
* define primitive action reverse(): a b -> a^r b -> a^r b^r -> b a
- sorting
Problem: find anagrams in a dictionary.
Solution: get signature for each word.
==Column 3== Data structures programs
- Use of array
- Structuring data
- Powerful tools for specialized data
Don't write a big program when a little one will do.
==Column 4== Writing correct programs
- binary search - hard to get right
- program verification, invariant
==Column 5== A small matter of programming
- Use assertion for correctness
- Scaffolding
- Automated testing
- debugging
- Timing
==## Part II ##==Performance
==Column 6== Perspective on Performance
- A case study: Andrew Appel's many-body simulation program
- Work at many level to achieve performance improvement:
* problem definition
* system structure
* algorithms and data structures
* code tuning
* hardware
==Column 7==The back of the envelop
- Calculation by reasonable estimation
- Quick check: Test by dimension
- Rules of thumb: e.g., 1) Rule of 72 (for exponential increase), 2) pi seconds is a nanocentury.
- Performance estimates, and little experiments
- Safety factors: compensate ignorance with extra safe factors
- Little's Law: queue size = consumption rate * average wait time
==Column 8==Algorithm design techniques
- Problem: range of array for max sum
- Solutions:
* cubic
* quadratic
* Divide and conquer (n log(n))
* scanning (linear)
==Column 9==Code tuning
- Prevent premature optimization
- Optimization should be made on the bottle-neck part - profiling the program
==Column 10==Squeezing space
- The key is simplicity
- Example: sparse matrix representation of grid.
- When simplicity is not sufficient, there are skills to better utilize space:
* recompute
* sparse data structure
* data compression
* allocation policies
* garbage collection
==## Part III ##==The Product
==Column 11==Sorting
- Insertion sort
- Quick sort
==Column 12== A sample problem
- Problem: sampling: select m from n integers
* by selection
* by shuffling
- Principles: understand the problem, specify an abstraction, explore design space, implement, retrospect.
==Column 13== Searching
- Problem: store a set of integers (w/o associated data).
- linear structure
- binary search trees: STL, BST, BST*, Bins, Bins*, BitVec
- structures for integers
==Column 14== Heaps
- Heap, Priority Queue, Heap sort
Comment: the material here can be found in any data structure and algorithm textbook, nothing new.
== Column 15== Strings of pearls
- We are surrounded by strings
- Words. 1) Map, Set, 2) Hash (no worst case guarantee, no order information)
- Phrases.
* the longest substring problem - solved by suffix array
- Generating sentences
==Appendix 1== A catalog of algorithms
- Sorting
- Searching
- Other Set algorithms
- Algorithms on Strings
- Vector and Matrix algorithms
- Random objects
- Numerical algorithms
==Appendix 4== Rules for code tuning
- Space-for-time rules
- Time-for-space rules
- Loop rules
- Logic rules
- Procedure rules
- Expression rules
Subscribe to:
Posts (Atom)