Wednesday, July 30, 2014

Flash an Android Phone

A friend got an old Android phone to use for Android development testing. It was a Nexus S 4G, Android build version was GWK74 (2.7.3 Gingerbread). So I spent some time to investigate how to do this. It worked and here is the note.

In this case, when boot into fastboot mode, it complains "Fastboot Mode No Boot or Recovery IMG". So need to download the images.

Also during the process, windows needs 2 device drivers to detect the phone:  1) when android is on, need google device driver for Nexus S 4G, which comes with ADT (in my case, it's in D:\Android\android-sdk\extras\google\usb_drive).  2) when android is in fastboot mode, need another driver called "Android Bootloader Interface driver". After a lot of search, this is available with pdaNet [1].

Next, need to download the images. This can be downloaded from Factory Images for Nexus Devices [2].  For Nexus S 4G, the corresponding section is at the end of the page: "Factory Images sojus for Nexus S 4G (d720)". 3 versions are available: 2.3.7 (GWK74), 4.0.4 (IMM76D) and 4.1.1 (JRO03R). The phone previously had 2.3.7 (GWK74), now we want a more modern version, so choose 4.1.1 (JRO03R).

Now, for actual steps to flash the phone, see [3]. Note in the last step 10, the commands are the same as in flash-all.sh and flash-all.bat of the package in [2]. The steps in [3] are copied below. In an established environment (no need to setup any drivers), the steps involved are 5, 6, 7, 9, 10, 11.

1. Install the Android SDK and Eclipse.
    Eclipse probably isn’t necessary but it is nice to have.
2. Launch the Android SDK Manager.
3. In the Android SDK manager verify that the Google USB Driver is installed and up to date.
4. Connect the target phone to the PC using USB. Make sure USB Debugging is enabled on the phone.
5. From the command line, display all adb devices
    Use the command: adb devices

    This command is in the Android SDK platform-tools folder. I have this added to my Path. ADB should list out all attached Android devices.
    If no devices are listed, make sure you have the Google Android USB driver installed.
    Windows Device Manager should show a device of “Android Composite ADB Interface”.

6. Use ADB to reboot the phone into Fastboot mode
    Use the command: adb reboot bootloader
    This command is in the Android SDK platform-tools folder.

7. Verify that fastboot can access the phone
    Use the command: fastboot devices
    This should list out all attached Android devices in Fastboot mode. Notice that the device is no longer visible to ADB. “adb devices” no longer lists the device. However, “fastboot devices” should.

8. Install the Android Bootloader Interface driver if needed
    If Fastboot Devices lists the phone, this step is not necessary. If Fastboot does not list the phone or shows
    You may need to install the Android Bootloader Interface driver. This can be from the pdaNet.

9. Download and expand the device images to use.
    For this phone, Google publishes the standard images at https://developers.google.com/android/nexus/images#sojusgwk74

10. Execute the commands from flash-all.sh to flash the device.
    Change to the folder with the expanded images and execute the fastboot commands from the flash-all.sh file.


    fastboot flash bootloader bootloader-crespo4g-d720sprlc1.img
    fastboot reboot-bootloader
    fastboot flash radio radio-crespo4g-d720sprlf2.img
    fastboot reboot-bootloader
    fastboot -w update image-sojus-jro03r.zip



Note the last step "fastboot -w update image-sojus-jro03r.zip" may fail, possible because it cannot decode the zip file for the images, so do it manually: first uncompress the zip file, which contains the images, then in DOS, use these commands (refer to the end of [4]), which I call step 11:

11. Flash the images.
    fastboot flash recovery recovery.img
    fastboot flash boot boot.img
    fastboot flash userdata userdata.img 
    fastboot flash system system.img

12. Recovery 
    In the Android screen, use sound dial to choose "Recovery", then press power button to confirm. 
    It takes a few minutes to recover from the flashed images, then automatically boot into the new Android 4.1.1 system.



The entire process can be summarized into 3 steps:

1) setup device drivers so computer can detect the phone in both normal and fastboot modes.
2) download image package for your device.
3) use Android SDK tools adb and fastboot to do the flash.


References:

[1] pdaNet
[2] Factory Images for Nexus Devices
[3] Flash Nexus S 4G
[4] http://wiki.cyanogenmod.org/w/Doc:_fastboot_intro


Monday, July 28, 2014

Google form and Sign in with linkedin

== Google form ==

Now working with some friends on a registration function.

I just learned Google forms is a convenient tool to create simple registration forms. You can do it from either Google Forms or Google Drive. This form will be linked to a spreadsheet online, such that all records are stored there.

The submission of a form is much better if it 1) sends a confirmation email, and also 2) include a link to edit the submission.  It's also good to 3) have a dashboard that displays submitted information, which you often want an interface independent from the spreadsheet.  These can all be done with Google Forms API.

1) and 2) need writing a javascript function.

See Email confirmations from Google Forms for how to set up a script triggered by form submission action.

This is my code to include edit link:

function myFunction(e) {
  if (typeof e == 'undefined') {
    Logger.log("e is undefined");
    return;
  }

  //var userName = e.values[1];
  //var userEmail = e.values[2];
  var userName = e.namedValues["Name"][0]; // From a field whose name is "Name".
  var userEmail = e.namedValues["Email"][0]; // From a field whose name is "Email".
  if (userEmail == '') return;
 
  var subject = "Form Submitted";
 
  var form = FormApp.openById('[form id]');
  var formResponses = form.getResponses(); // All responses/rows in spreadsheet.
  var formResponse = formResponses[formResponses.length-1]; // Get the just submitted item - last row.
  //Logger.log("formResponses.length = " + formResponses.length);
 
  var message = "Thank you, " + userName + " for finishing the survey.\n\n";
  message += "You can see the current list at [dash board page link]\n\n";
  message += "You can edit your information at: " + formResponse.getEditResponseUrl() + "\n\n";
  message += "Have a good day.";
  MailApp.sendEmail (userEmail, subject, message);
}


Note in the code above, the "form id" must be the id of the form, and not the spreadsheet. The code itself it a code of the spreadsheet.

Here is another piece of code that works equally well, but should be embedded in the form, and not the spreadsheet. This code is better in that it does not need to specify any form id. I prefer this one.

function onFormSubmit(e) {
  if (typeof e == 'undefined') {
    Logger.log("e is undefined");
    return;
  }
 
  var itemResponses = e.response.getItemResponses();
 
  /*
   for (var i = 0; i < itemResponses.length; i++) {
     var itemResponse = itemResponses[i];
     Logger.log('Response #%s to the question "%s" was "%s"',
         (i + 1).toString(),
         itemResponse.getItem().getTitle(),
         itemResponse.getResponse());
   }
  */
 
  var subject = "Form Submitted";
  var userName = itemResponses[0].getResponse();
  var userEmail = itemResponses[4].getResponse();
  var message = "Thank you, " + userName + " for finishing the survey.\n\n";
  message += "You can edit your information at: " + e.response.getEditResponseUrl() + "\n\n";
  message += "Have a good day.";
 
  MailApp.sendEmail (userEmail, subject, message);
}


3) Displays submitted information not using the spreadsheet.

See Query a Google Spreadsheet like a Database with Google Visualization API Query Language. This shows how to display a table containing selected spreadsheet columns. The link will be below (replace group id and group id with your values):

https://docs.google.com/spreadsheets/d/[form id]/gviz/tq?tqx=out:html&tq&gid=[group id]

If you want to display only selected columns, e.g., columns A and B, you can specify this with the tq parameter: tq=SELECT+A,B, so the links becomes:

https://docs.google.com/spreadsheets/d/[form id]/gviz/tq?tqx=out:html&tq=SELECT+A,B&gid=[group id]

One concern of giving people this link is security: they can modify the value of tq to see all fields. To overcome this is easy: set up a php page that read in the contents and display, this way the url is hidden. It is also really easy to set up, just 1 line of php code is needed:

<?php
echo file_get_contents("https://docs.google.com/spreadsheets/d/[form id]/gviz/tq?tqx=out:html&tq=SELECT+A,B&gid=[group id]");
?>


== Reliability issue ==

Well, it seems the script in google forms are not always reliably triggered. The above code stops to function without any reason. Searched on line for "google form script trigger not reliable", it seems many other people had similar experience. Free lunch is not always tasty.

 
== Sign in with Linkedin ==

Say you want people to fill the above form, but not everyone, only those who registered with linked in. So what you do is to set up a page that requires linkedin authentication, then forward people to the above link. For details, see [1][2][3][4].

Following example code in [4]. The code to set up such a page is in appendix. 

Note that the Google form itself is not protected by session. So if anyone knows the url of the form, he will be able to register the form. I have not studied about ways to do this. There may not be a way of doing it, since it's not a full-fledged website anyway. One can change the setting of make a Google form private/public or accessible to only some people, that's what you can do if you don't want it public.


References:

[1] Sign In With LinkedIn
[2] Linkedin authentication documentation - Important. [3] below is linked from here.
[3] Linkedin developer network - Register here to get a linkedin application account. Important.
[4] Linkedin authentication code sample in PHP - Useful


Appendix. Authentication with Linkedin.

<?php
// Change these 5 fields.
define('API_KEY',      '...');
define('API_SECRET',   '...');
define('REDIRECT_URI', 'http://...');
define('SCOPE',        ''); //r_fullprofile r_emailaddress rw_nus');
$reg_url = "https://docs.google.com/forms/d/[form id]/viewform?c=0&w=1&usp=mail_form_link";

// You'll probably use a database
session_name('linkedin');
session_start();

$user = fetch('GET', '/v1/people/~:(firstName,lastName)');

// OAuth 2 Control Flow
if (isset($_GET['error'])) {
    // LinkedIn returned an error
    //print $_GET['error'] . ': ' . $_GET['error_description'];
    //exit;
} elseif (isset($_GET['code'])) {
    // User authorized your application
    if ($_SESSION['state'] == $_GET['state']) {
        // Get token so you can make API calls
        getAccessToken();
    } else {
        // CSRF attack? Or did you mix up your states?
        exit;
    }
} elseif (isset($_GET['logout'])) {
    $_SESSION = array();
} elseif (isset($_GET['login'])) {
    if ((empty($_SESSION['expires_at'])) || (time() > $_SESSION['expires_at'])) {
        // Token has expired, clear the state
        $_SESSION = array();
    }
    if (empty($_SESSION['access_token'])) {
        // Start authorization process
        getAuthorizationCode();
    }
    else {
        print "?";
    }
}

$user = fetch('GET', '/v1/people/~:(firstName,lastName)');
if ($user->firstName == '' && $user->lastName == '') {
    print "Please <a href='" . $_SERVER['PHP_SELF'] . "?login=1'>log into linkedin
</a> before registration.<br/>";
    exit;
} else {
    header("Location: $reg_url");
    //echo file_get_contents($reg_url);
    //exit;
    //print "Hello $user->firstName $user->lastName. Click here to go to
<a href='$reg_url'>registration form</a>.";
    //print "
<br/><a href='" . $_SERVER['PHP_SELF'] . "?logout=1'>logout</a>";
    exit;
}


function getAuthorizationCode() {
    $_SESSION['state'] = uniqid('', true); // unique long string.
    $params = array('response_type' => 'code',
                    'client_id' => API_KEY,
                    'scope' => SCOPE,
                    'state' => $_SESSION['state'],
                    'redirect_uri' => REDIRECT_URI,
              );

    // Authentication request
    $url = 'https://www.linkedin.com/uas/oauth2/authorization?' . http_build_query($params);
    
    // Needed to identify request when it returns to us
    $_SESSION['state'] = $params['state'];

    // Redirect user to authenticate
    header("Location: $url");
    exit;
}
    
function getAccessToken() {
    $params = array('grant_type' => 'authorization_code',
                    'client_id' => API_KEY,
                    'client_secret' => API_SECRET,
                    'code' => $_GET['code'],
                    'redirect_uri' => REDIRECT_URI,
              );
    
    // Access Token request
    $url = 'https://www.linkedin.com/uas/oauth2/accessToken?' . http_build_query($params);
    
    // Tell streams to make a POST request
    $context = stream_context_create(
                    array('http' =>
                        array('method' => 'POST',
                        )
                    )
                );

    // Retrieve access token information
    $response = file_get_contents($url, false, $context);

    // Native PHP object, please
    $token = json_decode($response);

    // Store access token and expiration time
    $_SESSION['access_token'] = $token->access_token; // guard this!
    $_SESSION['expires_in']   = $token->expires_in; // relative time (in seconds)
    $_SESSION['expires_at']   = time() + $_SESSION['expires_in']; // absolute time
    
    return true;
}

function fetch($method, $resource, $body = '') {
    $params = array('oauth2_access_token' => $_SESSION['access_token'],
                    'format' => 'json',
              );
    
    // Need to use HTTPS
    $url = 'https://api.linkedin.com' . $resource . '?' . http_build_query($params);
    // Tell streams to make a (GET, POST, PUT, or DELETE) request
    $context = stream_context_create(
                    array('http' =>
                        array('method' => $method,
                        )
                    )
                );

    // Hocus Pocus
    $response = file_get_contents($url, false, $context);

    // Native PHP object, please
    return json_decode($response);
}

?>


Saturday, July 26, 2014

Platform independent GUI development

For platform independent GUI development, some choices are Qt, GTK, Java.

[1] GUI development in Perl: wxPerl, Perl/Tk, Perl/Qt and Perl/KDE, gtk2-perl
[2] GTK2 tutorial
[3] Qt: best cross platform GUI Applications
[4] Comparison of GUI development tools for Linux
[5] Qt Project - download installation package and more
[6] Java: Lesson: modifying the look and feel

Saturday, July 19, 2014

IT - The Big Picture

== Importance of Daemon/Service and Socket Programming ==

Linux/Unix daemon and windows service are important. They reside in memory as long-running processes, often start up automatically when the machine is booted.

Any piece of software, either a database or web/file/mail/etc server, is a linux/unix daemon or windows service. NoSQL applications Hadoop, mongoDB, memcached, etc., all are daemons/services or make use of it.

For this reason, it is good to write templates for linux/unix daemon and windows service. When need a relevant utility, we can quickly add the functionality into the daemon/service template and make a product.

Socket programming is another technology tightly coupled and integrated with Daemon/Service. This comes from the need of communication between the daemon/service and their clients.  Protocol and port are two key players.  Different servers use different protocols, and listen on different ports.

Most enterprise IT and software jobs today are on information system, used to be C/S, now B/S. Most of them use databases, process data, implement business logic, have back end and front end. Back end use database and web server, front end use web programming languages. Database + Web/Mobile UI + business logic is the mainstream of IT and software jobs today. Technologies outside this realm feels exotic to many programmers.

However, when it comes to creating new software tools, Daemon/Service and Socket programming come into play.


== General Technologies In Wide Use ==

Operating System and Programming Languages are the basics of everything.

Database and Network are another duo on the top list, which we kind of already mentioned.

Compiler techniques is used widely.  If a new domain language is needed for a new software, then a compiler/interpreter is written. LR or LL, or something simpler for specific domains.  Parsing, AST and JIT techniques are used by editors for text highlighting and intellisense, or building the project tree like in Eclipse. Static analysis is used for cross reference, debugging and code refactoring.

Security is an eternal topic. Software engineering is about the practice.

Web, Cloud and Mobile are the applications of these, on which Social media is based. They comprise the notion of high tech for the public. Web and Game are two public industries that make use of a wide variety of computer technologies.


== Domain Technologies ==

Artificial intelligence. In a broader sense, AI also includes Data Mining, Machine Learning, NLP etc. These are often intensive in probability and statistics, parsing theory, and mathematics.

Multimedia applications need Computer graphics, Computer vision, Image Processing and Visualization.

Informatics for different domains, such as Medical informatics, Bioinformatics, GIS etc. These are more like field applications of CS techniques, but some can be very intense.

HPC and scientific computing are often for government and academia projects.

Robotics/Vision generally need support from the government or large corporation.

Embed devices and programming is another broad field, since so many devices use chips internally.

HCI is softer, often correlates with phycology and cognitive science. It plays an important role in the better adoption of technology.

Operation research, this is more on the mathematics side.


== Fundamentals ==

On the basics of the fundamentals are math, discrete math, numerical analysis, computation and automata theory. Probability and Statistics. Algorithms and data structures. 

Then, on the software side, it is Operating system and Programming languages design. On the hardware side, it is Computer Architecture, micro processors, memory, and other device physics in Electrical Engineering.


== Today's Hot Topics == 

It has been PC in the 1980s, Internet in the 1990s, Search in 2000s.

Today, the hypes are: Big data (Cloud, NoSQL, Hadoop, DM, ML, NLP), Mobile, Social media.

Of these, Big data builds the platform through large corporations. Mobile and Social media are the interface to the general public, where business profits are made.


== Conclusion ==

So, based on this big picture, one may ask him/her self, where am I? What do I know, and what do I want to know next, based on job prospects or personal interest?


Write a crawler in Perl

Now improve a web crawler in Perl, based on a script I wrote several years ago. This can be useful if there is a need to download a bundle of files from a website.

The crawl and storage parts are there now. Some general considerations are below. These considerations, of course, are independent of implementation language.

- Crawl
  - keeps links crawled, in queue or hash
  - get header: status code (200, 404 etc.), content-size, content-type, modification time
  - broken links
  - dynamic page
  - special chars in url if create local folder using url path
  - relative url - handled by relevant lib.
  - wait interval between requests
  - robot.txt [11]
  - mime types
  - referrer
  - html parsing
  - header 302 redirect, and redirect level
  - javascript submit/redirect
  - non-stardard tags

- Storage
  - use web structure, or flat (need to resolve file name conflicts)
  - store progress: in file or database: link_queue, non_link_queue, current pointer in link_queue.

- Data Analysis and mining
  - text mining
  - reverse index
  - NLP etc.

- Rank Analysis
  - web link graph
  - page rank


== Compile Perl to executable ==

Seems PAR is a good choice [1][2].


== Other Perl Crawlers ==

[5] is a good introduction on the modules to use to write a crawler in Perl.
[4] is a simple one. [6] seems more involved.

[10] is a good introduction to general principles of web crawler.


== GUI with Perl/Tk ==

With Tk it's easy to make event driven GUI interface [7][8][9].

Tried to install Tk. Type:

sudo Perl -MCPAN -e shell
> install Tk

But there is error that prevents the installation to finish:

t/wm-tcl.t ................... 119/315 
#   Failed test 'attempting to resize a gridded toplevel to a value bigger'
#   at t/wm-tcl.t line 1153.
#          got: '4'
#     expected: '6'

#   Failed test at t/wm-tcl.t line 1155.
#          got: '4'
#     expected: '5'
t/wm-tcl.t ................... 312/315 # Looks like you failed 2 tests of 315.
t/wm-tcl.t ................... Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/315 subtests 
(less 43 skipped subtests: 270 okay)
(31 TODO tests unexpectedly succeeded)

Test Summary Report
-------------------
t/listbox.t                (Wstat: 0 Tests: 537 Failed: 0)
  TODO passed:   320, 322, 328, 502
t/text.t                   (Wstat: 0 Tests: 415 Failed: 0)
  TODO passed:   121
t/wm-tcl.t                 (Wstat: 512 Tests: 315 Failed: 2)
  Failed tests:  160-161
  TODO passed:   64, 86-87, 154-157, 164-165, 171-176, 221-224
                237-239, 264-265, 275-276, 280-283, 300
  Non-zero exit status: 2
t/zzScrolled.t             (Wstat: 0 Tests: 94 Failed: 0)
  TODO passed:   52, 66, 80, 94
Files=74, Tests=4348, 55 wallclock secs ( 0.84 usr  0.25 sys + 14.66 cusr  1.57 csys = 17.32 CPU)
Result: FAIL
Failed 1/74 test programs. 2/4348 subtests failed.
make: *** [test_dynamic] Error 255
  SREZIC/Tk-804.032.tar.gz
  /usr/bin/make test -- NOT OK
//hint// to see the cpan-testers results for installing this module, try:
  reports SREZIC/Tk-804.032.tar.gz
Running make install
  make test had returned bad status, won't install without force
Failed during this command:
 SREZIC/Tk-804.032.tar.gz                     : make_test NO

Only 2 tests failed out of many on a resize issue, shouldn't be serious. So use force option to install and it worked:

sudo perl -fi Tk


References:

[1] Create self-contained Perl executables, Part II
[2] PAR: Perl Archiving Toolkit
[3] Compiling or packaging an executable from perl code on windows

[4] Web scraping with modern perl
[5] Web crawling with Perl
[6] spider.pl - Example Perl program to spider web servers

[7] Tk:UserGuid
[8] Learning Perl/Tk: Graphical User Interfaces with Perl
[9] Book: Mastering Perl/Tk

[10] Wiki: web crawler
[11] Robots Exclusion Standard


Monday, July 14, 2014

Supporting PHP, ASP, JSP/Servlet and Tomcat in Perl Web Server

Last time we implemented a small but functional HTTP web server in Perl, which works like Apache by serving static contents. When that was done, it became instantly clear how and why a HTTP web server, such as Apache, works that way.  It also became somewhat clear how Apache uses extensions to work with non-static content, such as PHP, JSP, ASP etc.

For example, when a PHP file test.php is requested, in the Perl web server, just call something like this:
system("php test.php");
then grab the output and send it back to the client. The basic principle is as simple as that.

Now I'm looking at JEE, which uses Tomcat application server and a connector module in the middle to work with Apache.  I'm thinking I should be able to extend my Perl web server to work with  Tomcat, and thus the JSP/Servlet/JEE stack as well.

To do this is easy: in a config file, tell the Perl web server which paths should be mapped to Tomcat. Then, when a request coming for that path, establish a TCP client, transfer the request URL to Tomcat, which basically means to send a request to "http://localhost:8080/path", and receive the response from Tomcat server, then send it back to the Perl web server client.

Actually, this can be easily verified by using telnet. Type:
telnet localhost 8080
This will establish a connection session to Tomcat server. Next type this request:
GET /
The Tomcat server will send the index page back, and close the connection.

Basically, what the Perl web server should do is exactly the same. To implement this, we need to be able to code a web client in Perl.  We can either build it from scratch in socket programming, or use the LWP module, as in reference [1]. 

To work with ASP or ASP.NET, the Perl web server can work as a proxy, passing the request to an internal IIS web server, and sends back the response.

This way, the Perl web server in principle can work with any other web technologies.

== Create Web Browser with a GUI ==

Chapter 7 of [1] is on graphical examples in Perl/Tk. This basically demonstrates how to implement your own web browser with a GUI (not just command line interface), similar to firefox or any other popular web browsers.  And if we can do it in Perl/Tk, we can also create the GUI interface in Java or C/C++. Following this way, we can reinvent the entire wheel of the internet world [2][3]. [3] talks about the libwww module of Python, which was written by Tim Berners-Lee, and contains many functions needed by a web application including a browser.

One of the most difficult part of this is the amount of work involved in html/javascript/css parser and renderer. In 2006 Netscape wanted to create a new one from scratch, they failed after 3 years.  Many of current browsers are based on a rendering engine, this is WebKit [6][7] for Safari and Chrome (before 28), blink for Opera and Chrome (28+) [11][12], Gecko for Firefox [10], and Trident for IE [8][9].

A list of browser html rendering engines can be found in [4][5], including Amaya, Blink, Gecho, KHTML, Presto, Tasman, Trident and WebKit.


References:

[1] Web Client Programming in Perl
[2] Where should you start Coding a Web Browser?
[3] W3C Blog: Build Your Own Browser
[4] Comparison of layout engines (HTML)
[5] Web browser engine
[6] The WebKit Open Source Project
[7] Wiki: WebKit
[8] Wiki: Trident (layout engine)
[9] Internet Explorer Architecture
[10] Wiki: Gecko
[11] Wiki: Blink (layout engine)
[12] The Chromium Projects: Blink


Friday, July 11, 2014

Setup Ubuntu in VirtualBox

Now set up Ubuntu in VirtualBox on my windows machine.


== Download and Install Ubuntu ==

Get Ubuntu Desktop from http://www.ubuntu.com/ and install.
Also install VitualBox Guest Addition after this.

-- X sessions and Command startx --
Note: it seems if install Ubuntu desktop in VMWare, it boots into command line. Use this to start desktop: startx

There can be multiple X sessions going simultaneously, and can be accessed using combination keys Ctrl-Alt-X, where X is F1, F2, ... F7. The default seems to be Ctrl-Alt-F7. If you start another X session inside this, then it's Ctrl-Alt-F1. An X session, when not fully loaded, can be killed by Ctrl-Z. See more: man startx.

The two possible problems below are both related to Ubuntu in VM only. The first happened to me.

-- Possible login issue --
It's possible that after you reboot machine, then you won't be able to log in. Each time it returns to the login screen after a few seconds. It might be caused by that your ~/.Xauthority file is owned by root:root. To fix, press Ctrl-Alt-F1 to enter session 1, log in console from there, and type: sudo chown me:me ~/.Xauthority, where "me" is your account name. See [7].

-- 3D acceleration --
3D acceleration by GPU guratantees after speed, otherwise 3D acceleration is done by software and slow. See [8].


== Install LAMP ==

-- Apache2 --
sudo apt-get install apache2

-- PHP --
sudo apt-get install php5

-- PHP Client (not necessary, this is console client) --
sudo apt-get install php5-cli

-- restart apache (not necessary, since it's already restarted) --
Use either of the 2 commands below:
sudo /etc/init.d/apache2 restart
sudo service apache2 restart

-- MySQL --
sudo apt-get install mysql-server libapache2-mod-auth-mysql php5-mysql

Then optionally do the following:
-- activate mysql --
sudo mysql_install_db
-- change root password, and other security settings --
sudo /usr/bin/mysql_secure_installation


== Verify LAMP Installation ==

Now, Apache root is: /var/www/html/. There is already a index.html file there.
This can be visited as http://localhost

Create a php file phptest.php under the root, with content: <?php phpinfo(); ?>
This can be visited as http://localhost/phptest.php

Visit mysql from command line:
> mysql -u root -p


== Network Setup ==

You want to be able to access guest's webserver from host so it's more useful.

In VMWare Player, with networking set to NAT you can access the guest from host using guest's IP:

http://[guest ip]/

However, it seems VirtualBox does not allow this.

Recommendation for VirtualBox is to use Bridged Network, then host can access guest. However the guest loses ability to access Internet.

One way in this case is to use NAT but enable port forwarding, so host will route such request to guest.

To do this (you can change setting with guest running), go to Settings -> Adapter 1 (NAT), click on button "Port Forwarding", and enter this:

Name: Rule 1 (default)
Protocol: TCP
Host IP: (leave blank)
Host Port: 81
Guest IP:  (leave blank, or use the IP found by ifconfig)
Guest Port: 80

Then in host browser, type: http://localhost:81, you will see 10.0.2.15:80 (which is not accessible if visit directly from host).

See:
- https://www.virtualbox.org/manual/ch06.html#natforward
- http://askubuntu.com/questions/353235/how-to-visit-a-site-hosted-by-my-virtual-machine


-- Note 1 --

For multiple machines hosted on VirtualBox, if they are all started, it seems they will have the same IP (found by ifconfig). I don't know how to reset their IP address (like statically), yet. But with Port Forwarding you can visit webservers of all guests from host. For example, forward the second guest's port 80 to port 82 of host.

What's more, guests can see each other. For instance, using the example above, guest 1 can see guest 2 by visit: http://[host IP]:81, and guest 2 can see guest 1 by visit: http://[host IP]:82.

-- Note 2 --

For VMWare, Port Forwarding can be setup for NAT network by using the "virtual network editor".  However, the "virtual network editor" is available only to VMWare Workstation, and not available to VMWare Player.

To add the editor to VMWare Player, you can follow instructions here:
- http://www.eightforums.com/virtualization/5137-how-add-virtual-network-editor-vmware-player.html


== Install JEE Environment ==

-- Java --
See section Other Utilities below.

-- Tomcat --
To install, type:
sudo apt-get install tomcat7
Then should be able to visit http://localhost:8080.

Optionally you can install tomcat7-admin:
sudo apt-get install tomcat7-admin

Alternatively, you can download source from tomcat homepage and build it.

-- Eclipse --
Download from eclipse.com/download, extract it, then move it to a place, e.g., /home/me/Eclipse/eclipse.
And create a soft link on desktop: ln -s /home/me/Eclipse/eclipse/eclipse ~/Desktop/eclipse


== Install Mono.NET ==

See [2][3][4] for more information on Mono. To install type:
sudo apt-get install mono-complete
Install gtk-sharp2 for GTK+ support:
sudo apt-get install gtk-sharp2
Then type this to show version and enter C# shell:
csharp --version
Type this to show mono version, use mcs or gmcs, which is "csc" under windows:
mcs --version
To compile a csharp file hello.cs [4], type:
mcs hello.cs
To compile a gtk-csharp file hello_gtk.cs [4], type:
mac hello_gtk.cs -pkg:gtk-sharp-2.0
To compile a winform file hello_winform.cs [4], type (note it's gmcs, not mcs!):
gmcs hello_winform.cs -pkg:dotnet
To run aspx, install xsp2:
sudo apt-get install mono-xsp2 
This will report:
 * You have an incomplete /etc/xsp2/debian.webapp
 * To fix it, you need to install at least one package for xsp2 (like asp.net2-examples)
This tell us the xsp2 server configuration is in /etc/xsp2. To fix it (note it's actually asp.net-examples):
sudo apt-get install asp.net-examples
Now start xsp2 server. It uses port 8080 by default, but this is in conflict with Tomcat, so use a different port, say 8082. Run the following command from the same folder as hello.aspx:
xsp2 --port 8082
Then visit http://localhost:8082/hello.aspx, it works!
To run xsp2 as a daemon, refer to [11]. Also can run this for more help:
xsp2 --help

Or if install runtime only, type:
sudo apt-get installmono-runtime


== Install memcached ==

sudo apt-get install memcached

Now memcached should have been automatically started under user memcache. This can be verified by one of the following 3 ways:
1) use System Monitor to view all processes to find out.
2) use command: top -u memcache
3) use command: ps -aux | grep memcached. You can see the command of the process is:
    /usr/bin/memcached -m 64 -p 11211 -u memcache -l 127.0.0.1

-- start/stop/restart/status --
sudo service memcached start/stop/restart/status

-- access in console --
telnet localhost 11211
stats


== Other Utilities ==

-- Java --
Type "java" or "javac" will tell you it's not installed, and suggests the packages to install. I choose this:
sudo apt-get install openjdk-7-jdk

After installation, see version: javac -version, java -version

-- Perl --
Type "perl -v". It's installed.

-- Python --
Type "python -V". It's installed.

Try the mandelbrot drawing python script. It reports numpy not installed.
Install NumPy and SciPy (packages for scientific computing with Python):
sudo apt-get install python-numpy
sudo apt-get install python-scipy

Then run the mandelbrot.py file:
python mandelbrot.py
Compile it:
python -m py_compile mandelbrot.py
And run it:
./mandelbroth.pyc

-- Gtk+ --

See [5][6] for more information. To install GTK+ 3.0 type:
sudo apt-get install libgtk-3-dev

-- Make --
Type "make -v", it's installed.

-- Gcc --
Type "gcc -v", it's installed.

-- G++ --
Type "g++ -v", it's installed.

-- Git --
Type "git", it's not installed. To install type:
sudo apt-get install git

-- Svn --
Note installed. To install type:
sudo apt-get install subversion

== Uninstall a package using apt-get ==

Below commands are from [11].

apt-get remove packagename
This will remove the binaries, but not the configuration or data files of the package packagename. It will also leave dependencies installed with it on installation time untouched.

apt-get purge packagename, or
apt-get remove --purge packagename
This will remove about everything regarding the package packagename, but not the dependencies installed with it on installation. Both commands are equivalent.

apt-get autoremove
removes orphaned packages, i.e. installed packages that used to be installed as an dependency, but aren't any longer. Use this after removing a package which had installed dependencies you're no longer interested in.

Or could try this:
dpkg --purge --force-depends application


== Mount a VirtualBox shared folder ==

See [12]. Assume host is windows, guest is Ubuntu.

1) First you need to install Guest Additions for VirtualBox.

2) Next  open Devices -> Shared Folders Settings, and choose a folder on Host to share, say name it "shared_folder".

3) In Ubuntu guest, type these commands:

sudo mkdir /mnt/host/
sudo mount -t vboxsf shared_folder /mnt/host/

That's it.

Finally, you can create a soft link to the shared folder on desktop as shortcut:

ln -s /mnt/host  ~/Desktop/host


References:

[1] How To Install Linux, Apache, MySQL, PHP (LAMP) stack on Ubuntu
[2] AWS .NET SDK on Mono/Linux
[3] mono-project.com
[4] Mono basics
[5] How do I install GTK+ 3.0
[6] The GTK+ project
[7] Can't login into Ubuntu VM
[8] 3D Acceleration with Ubuntu Guests
[9] How To Install Apache Tomcat on Ubuntu 12.04
[10] How to install Mono XSP as a daemon on Debian?
[11] What is the correct way to completely remove an application?
[12] How to mount a VirtualBox shared folder?

Thursday, July 10, 2014

Design of a tinyurl service

A tinyurl service compresses a long url into a short one. This saves space, and is useful for scenarios such as twitter or weibo, where each character counts.  The down side is possible spam use.  Concerns include longevity of a short url.

== Single Server ==

What a tinyurl service does is essentially: long-url <==> short_url. The conversion is double-sided, i.e., a two-way function. A site registers its url at the service, and then provides the short url to visitors. The visitors visit the short url, which then redirects them to the original (mostly longer) url.

A common way of thinking is to hash the long url into a short one. However this is not correct, because hash is a one-way function.

Many ways can be used to do the compression. One of the most simple but also effective one, is to have a database table set up this way:

Table T_Url_Conversion (
    ID : int PRIMARY_KEY AUTO_INC,
    Original_url : varchar,
    Short_url : varchar
)

Then the auto-incremental primary key ID is used to do the conversion: (ID, 10) <==> (short_url, BASE). Whenever you insert a new original_url, the query can return the new inserted ID, and use it to derive the short_url, save this short_url and send it to cilent.

Here the BASE can be 62 for a-zA-Z0-9, or 36 for a-z0-9. So this is a conversion of numbers between base-10 and base-62 or base-36. If use base-62, and with a short_url length of 6 characters, that's a space of 62^6 =~ 57 billion short urls to use.

One concern is some short url strings will be reserved, say for at least 2 reasons: 1) urls reserved by certain clients for branding purpose, 2) dirty words that are avoided by clients. Such reserved strings can be stored in another table, and do a check upon new short_url generation.

Follow this design, one can easily setup a website providing short url service. This idea is not new, other people already came up with it [1][2][3].

== Multiple Servers ==

[1] also discussed the case of distributed design. However the scheme to do the assignment is not clear: when receiving a long url it can be hashed to a value and choose a target storage server according to this value.  This may work if the fictitious universal hashing function can evenly dispatch the requests to different servers.  But a more serious issue is, when receiving a short url, how can you know which server it belongs to?

I think one solution can be like this:

-- Solution I --
Say there are 3 servers, and a load-balancing machine that does the assignment in a round-robin fashion: for insertion requests 3k, 3k+1, 3k+2, assign them to servers 1, 2 and 3 respectively. On each server, the ID starts at 1, 2 and 3, and all increment by 3. This way, the first server stores ID 1, 4, 7 .., the second server stores ID 2, 5, 8 ..., the third server stores ID 3, 6, 9.. So when you receive a short url and converts it back to N, then you can go to server N%3.

The potential shortcoming of this method, is when you need to add a fourth server, then the split method is broken. A possible solution can be dividing 3k to 6k' and 6k' + 3. Here is another solution that can be better:

-- Solution II --
Since we already know that by only using 6 characters, the short_url space is about 57 billion, so this distributed scenario comes up only because of heavy load, not for lack of short_url space. So we may have the luxury to use an extra character to specify which server to go: say the short_url is abc, then we append a last digit to specify the server to go. For example, if it goes to server 2, then we make the short_url abc2. This way, the load-balancer continues with the round-robin modulo assignment algorithm when new servers are added, and short_urls can easily find their way. Also, now each server does not have to increment their ID by server_count now, and can use continuous ID.  Seems now everyone can live happily ever after.

It is further easy to see that, using this method, with one appended digit, one can have up to 62 servers on a base-62 system. This should be enough for such a simple service.

It is also easy to see that the initial long_url assignment approach of load-balancer is independent from the short_url assignment method.The long_url assignment now can actually use a universal-hashing method, but a round-robin method still works well and is indeed more simple with just a counter, and is fault-tolerant: losing the counter for a while does not actually matter.

References:
[1] System Design for Big Data [tinyurl]
[2] URL Shortening: Hashes In Practice, 21 Aug 2007
[3] How to code a URL shortener?


Memcached

Today start to look at Memcached.

Memcached [1][7] is a distributed in-memory key/value cache system. It is open source under the BSD license.  The first version was written in Perl by Brad Fitzpatrick in May, 2003 for his website Live Journal (as one can notice, this site runs very fast).  Then it was re-written in C by Anatoly Vorobey.  It uses server-client architecture. Multiple servers don't talk to each other, the client hashes the key and chooses a target server to store the data. It's designed for unix/linux, but was ported to windows too. That's all about it.

Memcached homepage is at [1]. Here you can understand what it's about, and download it. There is a small wiki [6] that contains basic information about it. The current version is 1.4.20 as of July 2014.

== Install ==

1) To install from package (recommended, especially if you are deploying to multiple servers):

Ubuntu & Debian: apt-get install memcached
Redhat/Fedora: yum install memcached
FreeBSD/(Mac ?): portmaster databases/memcached

2) Although less recommended, you can install from source too:

First you will need libevent as pre-requisite:
Ubuntu: apt-get install libevent-dev
Redhat/Fedora: yum install libevent-devel

Then install memcached:
wget http://memcached.org/latest
tar -zxvf memcached-1.x.x.tar.gz
cd memcached-1.x.x
./configure [--prefix=/usr/local/memcached]
make && make test
sudo make install


== Run from console ==

memcached [-m 64] -p 11211

Here -m specifies the memory allocated, unit is MB. -p is the port used, and 11211 is the default.

-- Test --

You can test memcached using the telnet interface: telnet localhost 11211
Then these commands can be used [2][3][4]:

get [key]
set [key flag timeout size]
add [key flag timeout size]
replace [key flag timeout size]
append [key flag timeout size]
prepend [key flag timeout size]
incr [key int_value]
decr [key int_value]
delete [key]
flush_all [ |timeout]
stats [ |slabs|malloc|items|detail|sizes|reset]
version
verbosity
quit

An example console session is:

telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
version
VERSION 1.4.20  
verbosity 10   # this will cause the server side echo telnet client input.
OK
stats
...                  
stats slabs
...
stats items
...
add mykey1 0 3600 5  # "mykey1" is the key, "0" is a 32-bit unsigne int  flag, "3600" is expiration (seconds), "5" is data size.
12345            # this is the data you store at key "mykey1".
STORED      
get mykey1
12345
END
quit
Connection closed by foreign host.

== Run as daemon ==

memcached -d

Or:
sudo service memcached stop
sudo service memcached start
sudo service memcached restart


Or:
sudo /etc/init.d/memcached start
sudo /etc/init.d/memcached stop
sudo /etc/init.d/memcached restart

Now if you run "top" command, you can see "memcached" in list of processes. You can run multiple versions of memcached using different ports. For more details, refer to [5] etc.

== Configuration of server, client and cluster ==

See [6].

== More things about using memcached, maintenance and development ==

See [6].

== Source code ==

The source code is available by instruction at [8] on its homepage [1]. A count of LOC on the current version is:

[./memcached/assoc.c] Lines: 293
[./memcached/assoc.h] Lines: 9
[./memcached/cache.c] Lines: 148
[./memcached/cache.h] Lines: 116
[./memcached/daemon.c] Lines: 89
[./memcached/globals.c] Lines: 25
[./memcached/hash.c] Lines: 21
[./memcached/hash.h] Lines: 14
[./memcached/items.c] Lines: 936
[./memcached/items.h] Lines: 37
[./memcached/jenkins_hash.c] Lines: 431
[./memcached/jenkins_hash.h] Lines: 15
[./memcached/memcached.c] Lines: 5646
[./memcached/memcached.h] Lines: 610
[./memcached/murmur3_hash.c] Lines: 124
[./memcached/murmur3_hash.h] Lines: 19
[./memcached/protocol_binary.h] Lines: 470
[./memcached/sasl_defs.c] Lines: 190
[./memcached/sasl_defs.h] Lines: 31
[./memcached/sizes.c] Lines: 29
[./memcached/slabs.c] Lines: 882
[./memcached/slabs.h] Lines: 49
[./memcached/solaris_priv.c] Lines: 44
[./memcached/stats.c] Lines: 375
[./memcached/stats.h] Lines: 8
[./memcached/testapp.c] Lines: 1967
[./memcached/thread.c] Lines: 854
[./memcached/timedrun.c] Lines: 102
[./memcached/trace.h] Lines: 71
[./memcached/util.c] Lines: 144
[./memcached/util.h] Lines: 33

[.] Total Lines: 13782


== Windows version ==

Memcached was designed for unix/linux. However, windows version is also available due to community support [9][10]. [11] is an example that seems no longer available.  One major vendor seems to be North Scale labs, they first provide memcached for windows as a stand-alone application, then combines it into their NoSQL product MemBase, and finally combines into CouchBase [12].


References:

[1] http://memcached.org
[2] Memcached telnet command summary
[3] Memcache Telnet Interface
[4] github: memcached / doc / protocol.txt 
[5] stackoverflow: stop and restart memcached server
[6] Memcached wiki 
[7] Wiki: memcached
[8] Obtain memcached source 
[9] memcached 1.4.4 Windows 32-bit binary now available!
[10] Memcached on Windows (x64)
[11] Installing Memcache on Windows
[12] http://www.couchbase.com/


What JavaScript can do in browser for game programming

This is what javascript can do in browser for game programming now:

http://www.babylonjs.com/

This is from Microsoft developer, and features engine for many 3D effects that used to be available on desktop applications only. Note you will need a browser that supports WebGL, such as Chome.

WebGL [1][2] makes use of GPU (Graphics Processing Unit) to render 2D and 3D graphics in browser.

References:
[1] Wiki: WebGL
[2] Compatibility table for support of WebGL in desktop and mobile browsers


Tuesday, July 8, 2014

The Mandelbrot Set and Fractals

The Mandelbrot set and associated Julia set are examples of beautiful fractal graphics derived from mathematics. Here is Wolfram Mathworld's introduction to the Mandelbrot set. Some relevant books are:

[1] Amazon: The Fractal Geometry of Nature, by Benoit B. Mandelbrot, 1982.

This books, however, does not have good review as the writing is said to be not so good. A review on Amazon is: "It is not an easily readable book. 1. It is not well-organized 2. It does not cover necessary things in detail 3. Frustratingly long in some parts." Books on this topic that are said better by Amazon reviewers:

[2] Feder, Fractals; Turcotte, Fractals and Chaos in Geology and Geophysics.
[3] The Science of Fractal Images, edited by Peitgen and Saupe. The math is clear; the algorithms are plainly stated for the PC enthusiast with some simple programming skills; and the color plates are astounding.




The first Mandelbrot set image was created in 1978. The Python program below from [4] will draw such an image. Note to run the Python program you need Numpy, which stands for Numerical Python, and is a scientific computation module of Python. You can download the win32 installation package from [6], or amd64 version from [7]. On my system, I have Python 2.7.1 and use Numpy 1.8.1. 

#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Copyright (c) 2013, P. Lutus http://arachnoid.com
# Released under the GPL http://www.gnu.org/licenses/gpl.html

from numpy import arange

# dimensional parameters for the set

yl = -1.2
yh = 1.2
ys = .05

xl = -2.05
xh = 0.55
xs = 0.03

def mandelbrot(c):
  z = 0
  for n in range(10):
    z = z*z + c
    if(abs(z) > 2):
      return '.'
  return '*'
 
s = ''
for y in arange(yl,yh,ys):
  for x in arange(xl,xh,xs):
    s += mandelbrot(complex(x,y))
  s += '\n'
 
print(s)


[5] contains an example program to draw the Mandelbrot set image in iPhone.

[8] is about doing the drawing in JavaScript. It includes the formula of Linas Vepstas algorithm to draw a smoothed-color version of the image:

\mu = n + 1 - \frac{1}{\log 2}\log \log P_c^n(0)


References:

[4] Mandelbrot Set - An exploration in pure mathematics
[5] A Mandelbrot Set Visualization on the iPhone
[6] Sourceforge: Numerical Python
[7] Unofficial Windows Binaries for Python Extension Packages
[8] Visualizing the Mandelbrot set with JavaScript


A Game of M.C. Escher style

Monument Valley - very nice game on geometries like in the paintings of M.C. Escher. Only shortcoming as many reviewers complain, is the game is too short, only 10 levels and can be finished in 1 hour. Guess they need to put in more levels later.

It proves again that fine combination of computer programming with arts will make a good product.

Youtube: Monument Valley - Gameplay Walkthrough ALL Levels 1 - 10 (1080p) (59m 36s)



Estimate size of MSSQL Table with indices

How to estimate the size of a MSSQL table with indices?

It's not enough to just multiply number of rows with the size of each row, because there are other entities involved, mostly the index. Relevant concepts are clustered/non-clustered index, unique/non-unique index, fill factor of an index, fix/variable length column, page size (8192 or 8K bytes), Null bitmap etc.

For MSSQL 2008, see MSDN articles [1][2][3]. For other versions of MSSQL, relevant links are in articles [1][2].

[1] Estimating the Size of a Clustered Index (MSSQL 2008)
[2] Estimating the Size of a Nonclustered Index (MSSQL 2008)
[3] Estimating the Size of a Table with a Clustered Index (MSSQL 2000)

Note that "Step 1. Calculate the Space Used to Store Data in the Leaf Level" in [1] is basically copied from [3], the only difference in [1] is the addition of "3. If the clustered index is nonunique, account for the uniqueifier column:". This actually calculates the combined size of both data and index in the leaf level nodes. In that the title of article [1] is inaccurate.

A query that directly read data and index information from the database is below (from here):

with pages as (
    SELECT object_id, SUM (reserved_page_count) as reserved_pages, SUM (used_page_count) as used_pages,
            SUM (case 
                    when (index_id < 2) then (in_row_data_page_count + lob_used_page_count + row_overflow_used_page_count)
                    else lob_used_page_count + row_overflow_used_page_count
                 end) as pages
    FROM sys.dm_db_partition_stats
    group by object_id
), extra as (
    SELECT p.object_id, sum(reserved_page_count) as reserved_pages, sum(used_page_count) as used_pages
    FROM sys.dm_db_partition_stats p, sys.internal_tables it
    WHERE it.internal_type IN (202,204,211,212,213,214,215,216) AND p.object_id = it.object_id
    group by p.object_id
)
SELECT object_schema_name(p.object_id) + '.' + object_name(p.object_id) as TableName,  
       (p.reserved_pages + isnull(e.reserved_pages, 0)) * 8 as reserved_kb,
        pages * 8 as data_kb,
        (CASE WHEN p.used_pages + isnull(e.used_pages, 0) > pages 
              THEN (p.used_pages + isnull(e.used_pages, 0) - pages) ELSE 0 END) * 8 as index_kb,
        (CASE WHEN p.reserved_pages + isnull(e.reserved_pages, 0) > p.used_pages + isnull(e.used_pages, 0) 
         THEN (p.reserved_pages + isnull(e.reserved_pages, 0) - p.used_pages + isnull(e.used_pages, 0)) else 0 end) * 8 as unused_kb
from pages p
left outer join extra e on p.object_id = e.object_id

Blog Archive

Followers