Wednesday, August 25, 2010

Design Patterns

Design Patterns - Elements of Reusable Object-Oriented Software. This is a classical book. The authors won ACM 2010 SIGSOFT outstanding research award for their contribution to software engineering for. The four authors are classed the GoF (Gang of Four). The design patterns in their book is called the GoF patterns.

Design patterns are solutions abstracted from repeatedly occurring design problems and can be reused in similar situations. Each has a pattern name, associated problem, solution and consequence.

Some design patterns are bounded to languages features, so is easier to implement in some languages than the others. For example, the Template pattern is easy to do in C++ and Java, since C++ provides template and Java provides generics.

In this book, designed patterns are divided into 3 categories based on purpose. The following notes are extracted from the book.

A. Creational

Class:

1. Factory method
Define an interface for creating an object, but let subclasses decide which class to instantiate. It lets a class defer instantiation to subclasses.

Object:

2. Abstract Factory
Provide an interface for creating families of related or dependent objects w/o specifying their concrete classes.

Isn't this the interface concept in C++/Java? Obviously it's related to polymorphism.

3. Builder
Separate the construction of a complex object from its representation so that the same construction process can create different representations.

4. Prototype
Specify the kinds of objects to create using a prototypical instance, and create new objects by copying this prototype.

5. Singleton
Ensure a class only has one instance, and provide a global point of access to it.

B. Structural

Class:

6. Adapter (class)
Convert the interface of a class into another interface clients expect. It lets classes work together that couldn't otherwise because of incompatible interfaces.

Object:

7. Adapter (object)

8. Bridge
Decouple an abstraction from its implementation so that the two can vary independently.

This is the abstraction and encapsulation principles of OOP.

9. Composite
Compose objects into tree structures to represent part-whole hierarchies. It lets clients treat individual objects and compositions of objects uniformly.

OOP uses inheritance for IS-A relationship, and uses composition for HAS-A relationship.

10. Decorator
Attach additional responsibilities to an object dynamically. It provides a flexible alternative to subclassing for extending functionality.

11. Facade
Provide a unified interface to a set of interfaces in a subsystem. Facade defines a higher-level interface that makes the subsystem easier to use.

12. Flyweight
Use sharing to support large numbers of fine-grained objects efficiently.

13. Proxy
Provide a surrogate or placeholder for another object to control access to it.

C. Behavioral

Class:

14. Interpreter
Given a language, define a representation for its grammar along with an interpreter that uses the representation to interpret sentences in the language.

Sounds related to compiler/interpreter.

15. Template method
Define the skeleton of an algorithm in an operation, deferring some steps to subclasses. Template method lets subclasses redefine certain steps of an algorithm w/o changing the algorithm's structure.

Object:

16. Chain of Responsibility
Avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request. Chain the receiving objects and pass the request along the chain until an object handles it.

One example to this is to catch a series of exceptions in C++/Java.

17. Command
Encapsulate a request as an object, thereby letting you parameterize clients with different requests, queue or log requests, and support undoable operations.

18. Iterator
Provide a way to access the elements of an aggregate object sequentially w/o exposing its underlying representation.

This occurs abundantly in C++/Java.

19. Mediator
Define an object that encapsulates how a set of objects interact. It promotes loose coupling by keeping objects from referring to each other explicitly, and it lets you vary their interaction independently.

20. Memento
W/o violating encapsulation, capture and externalize an object's internal state so that the object can be restored to this state later.

21. Observer
Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.

For example, Java uses wait(), notify() and notifyAll() methods for threads communication.

22. State
Allow an object to alter its behavior when its internal state changes. It will appear to change it class.

One example for this is the workflow state management as in my work.

23. Strategy
Define a family of algorithms, encapsulate each one, and make them interchangeable. It lets the algorithm vary independently from clients that use it.

24. Visitor
Represent an operation to be performed on the elements of an object structure. Visitor lets you define a new operation w/o changing the classes of the elements on which it operates.

The famous MVC is not included in list because it's a combination of multiple patterns. Use Smalltalk MVC as an example. The VC relationship uses the Strategy pattern. MVC also uses Factory method to specify the default controller class for a view, and Decorator pattern to add scrolling to a view.

A new comer at OOD can start with the simplest and most common patterns:
Creational: Abstract Factory, Factory
Structural: Adapter, Composite, Decorator
Behavioral: Observer, Strategy, Template(! Yeah, this is behavioral, not structural)

Seems like I already had experience with at least these design patterns:
Creational: Singleton, Factory
Structural: Adapter, Bridge, Composite
Behavioral: Interpreter, Template, Chain of Responsibility, Iterator, Observer, State

[1] Amazon: Design patterns. By Erich Gamma, Richard Helm, Ralph Johnson, John Vlissides. QA 76.64 .D47 1994.
[2] Wiki - design patterns. Short but comprehensive list of design patterns from different sources.
[3] Design pattern implementation in C# and VB.NET. Good link with real code examples. E.g., load balancer using the Singleton pattern.

Tuesday, August 24, 2010

Tips on IMS development

Here are some tips on information management system web application development from my work. Most are platform independent and can be easily applied to different platforms and technologies. They can also be applied to non-web applications, and are actually general design tips.

  • Menu. The menu can be dynamically generated from database. This allows flexible configuration and update. If there are multiple roles, each can be configured to draw from the pool of menu items. For this first get a javascript menu template, then create it dynamically. For better performance and void regenerate it each time, such menu can be generated once at creation time, then be stored for later use.

  • Roles and permissions. There can be many roles. The permission can be configured this way: for each item that needs permission control, say we need read/write/delete, then can use a string such as "742172". Here each digit stands for the permission setting of one role. The value of a digit is the combined sum of 4/2/1, which stands for read/write/delete, similar to unix.

  • Workflow. A table can be used for this, which records stage and stage transition settings. Each role may have different permissions at different stages. Such a table can easily store these information.

  • Email. If there are emails to be sent, an admin interface can be provided to configure the subject, body and other fields of emails. For specific information, provide some macros that the administrator can use, which will be replaced by program later.

  • HTML Form. Can create a self-defined framework, instead of using that of such as .NET's control. This allows more flexibility, and can be more efficient since .NET's control may keep more information than necessary. To store the state/parameters, such as when you need to ask for confirmation of user, use hidden HTML control.

  • HTML Form fields. The form fields settings can be managed in the database. So there is no need to go to code for update. Everything can be configured from database. Like label management below, adding a new field can automatically insert itself to a table in database with default settings. Given each form and each form field (if needed) a class name, and control display properties from CSS.

  • Label management. An admin interface can be provided for this. When a tag is put on a page, it can automatically check if it's in database already, if not then insert itself in as a unique tag name. This allows domain experts to change it, and frees programmers from such ad-hoc, trivial but time-consuming and distracting chores.

  • The use of coding. Any systematic categorizing paradigms can be implemented by coding for ease of control and ease of versioning. Say we can use the starting code of 001 to stand for version 1, 002 to stand for version 2, etc. This can apply to many things and is highly useful.

  • Use classes as much as possible, decouple business logic from interface files.


Sunday, August 22, 2010

Computer Systems: A Programmer’s Perspective

Computer Systems: A Programmer’s Perspective. 1st edition (August 23, 2002). 2nd edition (February 14, 2010). By R. E. Bryant, D. R. O’Hallaron.

This books introduces computer systems in a broad way, including operating system, architecture, compiler, network and more. It is a good reference for system programmers to improve their understanding on computer systems and improve programming skills.
Chapter 1. Introduction

- Compilation
- Processors Read and Interpret Instructions Stored in Memory
Hardware Organization of a System.
- Cache
- cache hierarchy (registers, on-chip L1 cache, off-chip L2 cache, memory, local disk, remote storage)
- OS: process, thread, virtual memory, files
- Inter system communication by network.

Part I Program Structure and Execution

Chapter 2 Representing and Manipulating Information
Chapter 3 Machine-Level Representation of C Programs
Chapter 4 Processor Architecture

Chapter 5 Optimizing Program Performance

5.1 Capabilities and Limitations of Optimizing Compilers
5.2 Expressing Program Performance
5.3 Program Example
5.4 Eliminating Loop Inefficiencies
5.5 Reducing Procedure Calls
5.6 Eliminating Unneeded Memory References
5.7 Understanding Modern Processors
5.7.1 Overall Operation
5.7.2 Functional Unit Performance
5.7.3 A Closer Look at Processor Operation
- Translating Instructions into Operations
- Processing of Operations by the Execution Unit
- Scheduling of Operations with Unlimited Resources
- Scheduling of Operations with Resource Constraints
5.8 Reducing Loop Overhead
5.9 Converting to Pointer Code
5.10 Enhancing Parallelism
5.10.1 Loop Splitting
5.10.2 Register Spilling
5.10.3 Limits to Parallelism
5.11 Putting it Together: Summary of Results for Optimizing Combining Code
5.11.1 Floating-Point Performance Anomaly
5.11.2 Changing Platforms
5.12 Branch Prediction and Misprediction Penalties
5.13 Understanding Memory Performance
5.13.1 Load Latency
5.13.2 Store Latency
5.14 Life in the Real World: Performance Improvement Techniques
5.15 Identifying and Eliminating Performance Bottlenecks
5.15.1 Program Profiling
5.15.2 Using a Profiler to Guide Optimization
5.15.3 Amdahl’s Law
5.16 Summary

Chapter 6 The Memory Hierarchy

- A memory system is a hierarchy of storage devices with different capacities, costs, and access times.
- SRAM, DRAM, disk.
- Storage technologies
- Random-access memory
- SRAM - bistable memory cell.
- DRAM - capacitors.
- Nonvolatile memory: ROM, PROM, EPROM, EEPROM.
- Accessing main memory.
- Disk storage
- geometry, capacity, operation (seek time, rotational latency, transfer time),
logical disk blocks, accessing disks
- Storage technology trends.
- price/performance trade-off
- Locality: temporal/spatial locality.
- Locality of references to program data.
- Locality of Instruction Fetches.
- Summary
- The memory hierarchy
+ Different storage technologies have widely different access times.
+ Well-written programs tend to exhibit good locality.
- Caching in the Memory Hierarchy
- Cache hits
- Cache misses
- Cache Management
- Summary of Memory Hierarchy Concepts
- Exploiting temporal/spatial locality
- Cache Memories
- CPU registers, L1 cache, L2 cache, main DRAM memory, and disk storage
- Writing Cache-friendly Code
+ Make the common case go fast.
+ Minimize the number of cache misses in each inner loop.
- Putting it Together: The Impact of Caches on Program Performance
- The Memory Mountain
- Rearranging Loops to Increase Spatial Locality
- Using Blocking to Increase Temporal Locality
- Summary

Part II Running Programs on a System

Chapter 7 Linking
Chapter 8 Exceptional Control Flow
Chapter 9 Measuring Program Execution Time

Chapter 10 Virtual Memory

10.1 Physical and Virtual Addressing
10.2 Address Spaces
10.3 VM as a Tool for Caching
+ Set of virtual pages: Unallocated/Cached/Uncached
10.3.1 DRAM Cache Organization
10.3.2 Page Tables
10.3.3 Page Hits
10.3.4 Page Faults
10.3.5 Allocating Pages
10.3.6 Locality to the Rescue Again
10.4 VM as a Tool for Memory Management
10.4.1 Simplifying Linking
10.4.2 Simplifying Sharing
10.4.3 Simplifying Memory Allocation
10.4.4 Simplifying Loading
10.5 VM as a Tool for Memory Protection
10.6 Address Translation
10.6.1 Integrating Caches and VM
10.6.2 Speeding up Address Translation with a TLB
10.6.3 Multi-level Page Tables
10.6.4 Putting it Together: End-to-end Address Translation
10.7 Case Study: The Pentium/Linux Memory System
10.8 Memory Mapping
10.8.1 Shared Objects Revisited
10.8.2 The fork Function Revisited
10.8.3 The execve Function Revisited
10.8.4 User-level Memory Mapping with the mmap Function
10.9 Dynamic Memory Allocation
10.9.1 The malloc and free Functions
10.9.2 Why Dynamic Memory Allocation?
10.9.3 Allocator Requirements and Goals
- Requirements:
- Handling arbitrary request sequences
- Making immediate responses to requests
- Using only the heap.
- Aligning blocks (alignment requirement).
- Not modifying allocated blocks.
- Goals:
- Goal 1: Maximizing throughput.
- Goal 2: Maximizing memory utilization.
10.9.4 Fragmentation
- internal fragmentation:
occurs when an allocated block is larger than the payload.
e.g. to satisfy memory alignment.
- external fragmentation:
there is enough aggregate free memory to satisfy an
allocate request, but no single free block is large enough
- External fragmentation is much more difficult to quantify than internal
fragmentation
10.9.5 Implementation Issues
10.9.6 Implicit Free Lists
10.9.7 Placing Allocated Blocks
10.9.8 Splitting Free Blocks
10.9.9 Getting Additional Heap Memory
10.9.10 Coalescing Free Blocks
10.9.11 Coalescing with Boundary Tags
10.9.12 Putting it Together: Implementing a Simple Allocator
10.9.13 Explicit Free Lists
10.9.14 Segregated Free Lists
10.10 Garbage Collection
10.10.1 Garbage Collector Basics
- A garbage collector views memory as a directed reachability graph.
- The nodes of the graph are partitioned into 1) root nodes and 2) heap nodes.
10.10.2 Mark&Sweep Garbage Collectors
10.10.3 Conservative Mark&Sweep for C Programs
10.11 Common Memory-related Bugs in C Programs
10.11.1 Dereferencing Bad Pointers
10.11.2 Reading Uninitialized Memory
10.11.3 Allowing Stack Buffer Overflows
10.11.4 Assuming that Pointers and the Objects they Point to Are the Same Size
10.11.5 Making Off-by-one Errors
10.11.6 Referencing a Pointer Instead of the Object it Points to
10.11.7 Misunderstanding Pointer Arithmetic
10.11.8 Referencing Non-existent Variables
10.11.9 Referencing Data in Free Heap Blocks
10.11.10 Introducing Memory Leaks
10.12 Summary

Part III Interaction and Communication Between Programs

Chapter 11 Concurrent Programming with Threads

- A thread is a unit of execution, associated with a process, with its own
thread ID, stack, stack pointer, program counter, condition codes, and
general-purpose registers.
11.1 Basic Thread Concepts
- process context: 1) program context, 2) kernel context.
11.2 Thread Control
- Pthreads defines about 60 functions
11.2.1 Creating Threads
11.2.2 Terminating Threads
11.2.3 Reaping Terminated Threads
11.2.4 Detaching Threads
11.3 Shared Variables in Threaded Programs
11.3.1 Threads Memory Model
11.3.2 Mapping Variables to Memory
11.3.3 Shared Variables
11.4 Synchronizing Threads with Semaphores
11.4.1 Sequential Consistency
11.4.2 Progress Graphs
11.4.3 Protecting Shared Variables with Semaphores
11.4.4 Posix Semaphores
11.4.5 SignalingWith Semaphores
11.5 Synchronizing Threads with Mutex and Condition Variables
11.5.1 Mutex Variables
11.5.2 Condition Variables
11.5.3 Barrier Synchronization
11.5.4 Timeout Waiting
11.6 Thread-safe and Reentrant Functions
- A function is thread-safe if and only if it will always produce correct results
when called repeatedly within multiple concurrent threads.
- Four (non-disjoint) classes of thread-unsafe functions:
1) Failing to protect shared variables.
2) Relying on state across multiple function invocations.
3) Returning a pointer to a static variable.
4) Calling thread-unsafe functions.
11.6.1 Reentrant Functions
- Property: do not reference any shared data when called by multiple threads.
- Reentrant functions are typically more efficient than non-reentrant
thread-safe functions because they require no synchronization operations.
- Reentrant functions is a subset of thread-safe functions.
11.6.2 Thread-safe Library Functions
11.7 Other Synchronization Errors
11.7.1 Races
11.7.2 Deadlocks
11.8 Summary

Chapter 12 Network Programming

12.1 Client-Server Programming Model
- The fundamental operation in the client-server model is the transaction.
12.2 Networks
12.3 The Global IP Internet
12.3.1 IP Addresses
12.3.2 Internet Domain Names
12.3.3 Internet Connections
12.4 Unix file I/O
12.4.1 The read and write Functions
12.4.2 Robust File I/OWith the readn and writen Functions
12.4.3 Robust Input of Text Lines Using the readline Function
12.4.4 The stat Function
12.4.5 The dup2 Function
12.4.6 The close Function
12.4.7 Other Unix I/O Functions
12.4.8 Unix I/O vs. Standard I/O
12.5 The Sockets Interface
12.5.1 Socket Address Structures
12.5.2 The socket Function
12.5.3 The connect Function
12.5.4 The bind Function
12.5.5 The listen Function
12.5.6 The accept Function
12.5.7 Example Echo Client and Server
12.6 Concurrent Servers
12.6.1 Concurrent Servers Based on Processes
12.6.2 Concurrent Servers Based on Threads
12.7 Web Servers
12.7.3 HTTP Transactions
12.8 Putting it Together: The TINY Web Server
12.9 Summary

Appendix A Error handling
Appendix B Solutions to Practice Problems

Friday, August 20, 2010

memcached

Memcached - distributed key-value pair memory caching system. Very easy to use.

Download:
- http://memcached.org/: C source code.
- NorthScale Membase server - based on memcached: server for various platforms. Clients can be C, .NET, Java, Perl, PHP, Ruby.

Wednesday, August 18, 2010

JEE

It's been some time since I used J2EE in class. Well it's called JEE nowadays. Today reviewed JEE tools.
Tutorials: 
- Spring Tutorial. (Took about 5 hours to finish, spreaded into 3 days)
- Hibernate Tutorial.
- Maven Tutorial. (Took half an hour to finish)
- Another Maven Tutorial.
- Oracle's JEE 5 tutorial

The environment is:
- OS: Windows XP
- Java: 1.60
- Ant: 1.7 (http://ant.apache.org/manual/index.html)
[Need to setup environment variables ANT_HOME and Java_HOME]
- Apache Tomcat: 6.0.29 (http://tomcat.apache.org/download-60.cgi#6.0.29)
Install as a service. (32-bit/64-bit Windows Service Installer). Use port 8080.
Local site: http://localhost:8080
Homepage root is: $CATALINA_HOME/webapps/ROOT/index.html
$CATALINA_HOME is installation site of Tomcat.
- Spring: 2.5.0
Most recent version is at http://www.springsource.org/download
But I need version 2.5.0, which can be downloaded from
https://olex.openlogic.com/packages/spring
This contains many other utilities. The most recent version package does not.
- Maven: 2.2.1. Obtained here. [Need to set up M2_HOME]
- Eclipse: Java EE IDE for web developers. (http://www.eclipse.org/downloads/)

A little note:
- Step 5.5 code for JdbcProductDaoTests.java should include the following two imports to function:
import org.springframework.test.AbstractTransactionalDataSourceSpringContextTests;
import springapp.domain.Product;
Here the first imported class is in library spring-test.jar.
If eclipse can't find this then manually add it to properties->Java Build Path->Libraries.

Some concepts:
- SSH: Struts, Spring, Hibernate.
- EJB3.0+struts2+iBATIS
- jsp+Servlet+jdbc is the basics for all Java frameworks.
- jQuery: light-weighted javascript framework.
- Dojo, extjs: javascript frameworks. Used with JSF.
- SOA: Service Oriented Application
- Apache lucene: text search engine library written in Java
- Nutch: Open-source web-search software, built on Lucene Java.
- Webwork: Java web-application development framework.
- NoSQL
- Local search. e.g. foursquare

Some thoughts so far:
- A lot of work for simple application in Spring.
- JEE tools apply software engineering concepts, that may be the reason
it's said to be suitable for large web application development. On the
other hand this makes it heavy-weighted. Setup environment and getting
familiar with the tools cost lots of time. On this, .NET, PHP, ASP are
easier to catch up.
- Eclipse eats big memory (over 200MB).
- Maven is like Ant.

Thursday, August 12, 2010

CGI in C

Write CGI in C was a primitive idea. It's nonetheless interesting to know how it works.

It can be done by renaming the compiled executable with the suffix .cgi (this suffix needs be setup in Apache config file). Then put this into the server's cgi-bin folder. Visit http://server/cgi-bin/*.cgi should do.

See this short introduction on Getting Started with CGI Programming in C.

Blog Archive

Followers