Frontend JavaScript Framework Alternatives

Jul 16th, 2014 | Comments

MV* framework

BackBonejs
Angularjs
Emberjs
KnockoutJS
Dojo
Spinejs
MVC extension framework
MarionetteJS
Thorax

template

Handlebars
Mustache
JST
ECO

build automation

Grunt

module loader

Requirejs
Browserify

package manager

Bower

workflow automation

Yeoman
Brunch
Catero

code analysis

JSLint
JSHint

unit testing

Jasmine
QUnit
Mocha

Reference:

http://todomvc.com/

Install Ruby&Vim Command Without Root

May 30th, 2014 | Comments

This post intend to write down the step for installing ruby and vim command line tools, without root access. In addition, ncurses is the prerequisities of installing ruby and vim. Also, please install ruby first in order to enable rubyinterp.

download ncurses
tar xvf ncurses-5.9.tar.gz
sh ./configure --prefix=/home/bobby/tools/ncurses ##--with-shared
make
make install
download ruby
sh ./configure --prefix=/home/bobby/tools/ruby1.9 --with-tlib=ncurses
make
make install
download from vim
bzip2 -cd vimxxx.tar.bz2 | tar xvf -
edit .bash_profile set CPPLAGS="-I/home/bobby/tools/ncurses/include" LDFLAGS="-L/home/bobby/tools/ncurses/lib" export CPPFLAGS LDFLAGS
sh ./configure --prefix=/home/bobby/tools/vim7.4 --disable-selinux --enbale-gui=no --without-x --disable-gpm --disable-nls --with-tilib=ncurses --enable-multibyte --enable-rubyinterp --enable-perlinterp --enable-pythoninterp
make
make install
edit ~/.bashrc alias vim="/home/bobby/tools/vim7.4/bin/vim"
edit ~/.vimrc set synatx on set nocompatible set backspace=2
setup vimruntime in order to let vim find the syntax env, please edit .bash_profile export VIMRUNTIME=/home/bobby/tools/preinstall/vim74/runtime

Java Performance - OS Monitoring

May 21st, 2014 | Comments

Bottom Up Approach

Bottom up begins at the lowest level of the software stack, at the CPU level looking at statistics such as CPU cache misses, inefficient use of CPU instructions, and then working up the software stack at what constructs or idioms are used by the application.

Choosing the Right CPU Architecture

One of the major design points behind the SPARC T-series processors is to address CPU cache misses by introducing multiple hardware threads per core.

CPU Utilization

A system with a single CPU socket with a quad core processor with hyperthreading disabled will show four CPUs in the GNOME System Monitor and report four virtual processors using the Java API Runtime.availableProcessors().

xosview
vmstat
mpstat
top

Which java thread is consuming CPU?

jstack

Monitoring Linux CPU Scheduler Run Queue

vmstat

Memory Utilization

top
/proc/meminf

However, the following vmstat output from a Linux system illustrates a system that is experiencing swapping. P36

Monitoring Lock Contention on Linux

pidstat -w -I -p 9391 5

Hence, 3500 divided by 2, the num- ber of virtual processors = 1750. 1750 * 80,000 = 140,000,000. The number of clock cycles in 1 second on a 3.0GHz processor is 3,000,000,000. Thus, the percentage of clock cycles wasted on context switches is 140,000,000/3,000,000,000 = 4.7%.

The cost of a voluntary context switch at a processor clock cycle level is an expensive operation, generally upwards of about 80,000 clock cycles.

Again applying the general guideline of 3% to 5% of clock cycles spent in voluntary context switches implies a Java application that may be suffering from lock contention.

Quick Lock Contention Monitoring

Isolating Hot Locks

A common practice to find contended locks in a Java application has been to periodically take thread dumps and look for threads that tend to be blocked on the same lock across several thread dumps.

Monitoring Involuntary Context Switches

In contrast to voluntary context switching where an executing thread voluntarily takes itself off the CPU, involun- tary thread context switches occur when a thread is taken off the CPU as a result of an expiring time quantum or has been preempted by a higher priority thread.

Involuntary context switches can also be monitored on Linux using pidstat -w. High involuntary context switches are an indication there are more threads ready to run than there are virtual processors available to run them. As a result it is common to observe a high run queue depth in vmstat, high CPU utilization, and a high number of migrations (migrations are the next topic in this section) in conjunction with a large number of involuntary context switches.

On Linux, creation of processor sets and assigning applications to those processor sets can be accomplished using the Linux taskset command.

Monitoring Thread Migrations

As a general guideline, Java applications scaling across multiple cores or virtual processors and observing migrations greater than 500 per second could benefit from binding Java applications to processor sets.

Network I/O Utilization

netstat -i 
nicstat

Disk I/O Utilization

iostat -xm

One of the challenges with monitoring disk I/O utilization is identifying which files are being read or written to and which application is the source of the disk activity.

At the application level any strategy to minimize disk activity will help such as reducing the number of read and write operations using buffered input and output streams or integrating a caching data structure into the application to reduce or eliminate disk interaction.

Additional Command Line Tools

sar

Monitoring CPU Utilization on SPARC T-Series Systems

Python Environment Setup

Apr 29th, 2014 | Comments

Python is a script language, the main idea for learning python is to utilize python to create easy hands-on automation script on linux env instead of shell.

Here, list the steps to configure PyDev, an eclipse plugin, inside eclipse environment.

Access PyDev webpage: http://pydev.org/manual_101_install.html
Download PyDev certificate
cd %JAVA_HOME%/bin
Run command: keytool.exe -import -file C:/download/pydev_certificate.cer -keystore %JAVA_HOME%/jre/lib/security/cacerts
Input default JDK cacerts password: changeit
Certificate installment completed
Download eclipse plugin on marketplace, or mannually download it and extract into dropins directory like: (dropins/pydev/eclipse/…)
Edit eclipse.ini, convert -vm=… JDK must be 1.7
Access JPython webpage: https://wiki.python.org/jython/InstallationInstructions
Download Jython 2.5.4rc1 – Traditional Installer at link: http://jython.org/downloads.html
Run command: java -jar jython_installer-2.5.2.jar —console
Configure JPython interceptor in PvDev eclipse plugin

Operating System Concepts - IO System

Apr 27th, 2014 | Comments

I/O Hardware

The device communicates with the machine via a connection point, or port — for example, a serial port.

If devices use a common set of wires, the connection is called a bus. A bus is a set of wires and a rigidly defined protocol that specifies a set of messages that can be sent on the wires.

This figure shows a PCI bus (the common PC system bus) that connects the processor–memory subsystem to the fast devices and an expansion bus that connects relatively slow devices, such as the keyboard and serial and USB ports.

A controller is a collection of electronics that can operate a port, a bus, or a device. A serial-port controller is a simple device controller. It is a single chip (or portion of a chip) in the computer that controls the signals on the wires of a serial port.

By contrast, a SCSI bus controller is not simple. Because the SCSI protocol is complex, the SCSI bus controller is often implemented as a separate circuit board (or a host adapter) that plugs into the computer. It typically contains a processor, microcode, and some private memory to enable it to process the SCSI protocol messages.

Polling & Interrupts

In many computer architectures, three CPU-instruction cycles are sufficient to poll a device: read a device register, logical–and to extract a status bit, and branch if not zero. But polling becomes inefficient when it is attempted repeatedly yet rarely finds a device to be ready for service, while other useful CPU processing remains undone.

The hardware mechanism that enables a device to notify the CPU is called an interrupt.

The basic interrupt mechanism works as follows. The CPU hardware has a wire called the interrupt-request line that the CPU senses after executing every instruction. When the CPU detects that a controller has asserted a signal on the interrupt-request line, the CPU performs a state save and jumps to the interrupt-handler routine at a fixed address in memory.

Another example is found in the implementation of system calls. Usually, a program uses library calls to issue system calls. The library routines check the arguments given by the application, build a data structure to convey the arguments to the kernel, and then execute a special instruction called a software interrupt, or trap.

Direct Memory Access

For a device that does large transfers, such as a disk drive, it seems wasteful to use an expensive general-purpose processor to watch status bits and to feed data into a controller register one byte at a time—a process termed programmed I/O (PIO). Many computers avoid burdening the main CPU with PIO by offloading some of this work to a special-purpose processor called a direct-memory-access (DMA) controller.

Blocking and Nonblocking I/O

An alternative to a nonblocking system call is an asynchronous system call. An asynchronous call returns immediately, without waiting for the I/O to complete.

The difference between nonblocking and asynchronous system calls is that a nonblocking read() returns immediately with whatever data are available—the full number of bytes requested, fewer, or none at all. An asynchronous read() call requests a transfer that will be performed in its entirety but will complete at some future time.

Buffering

A buffer is a memory area that stores data being transferred between two devices or between a device and an application.

Buffering is done for three reasons. 1. One reason is to cope with a speed mismatch between the producer and consumer of a data stream. 2. A second use of buffering is to provide adaptations for devices that have different data-transfer sizes. 3. A third use of buffering is to support copy semantics for application I/O.

With copy semantics, the version of the data written to disk is guaranteed to be the version at the time of the application system call, independent of any subsequent changes in the application’s buffer. A simple way in which the operating system can guarantee copy semantics is for the write() system call to copy the application data into a kernel buffer before returning control to the application. The disk write is performed from the kernel buffer, so that subsequent changes to the application buffer have no effect.

Caching

A cache is a region of fast memory that holds copies of data.

When the kernel receives a file I/O request, the kernel first accesses the buffer cache to see whether that region of the file is already available in main memory. If it is, a physical disk I/O can be avoided or deferred.

Transforming I/O Requests to Hardware Operations

The figure suggests that an I/O operation requires a great many steps that together consume a tremendous number of CPU cycles.

A process issues a blocking read() system call to a file descriptor of a file that has been opened previously.
The system-call code in the kernel checks the parameters for correctness. In the case of input, if the data are already available in the buffer cache, the data are returned to the process, and the I/O request is completed.
Otherwise, a physical I/O must be performed. The process is removed from the run queue and is placed on the wait queue for the device, and the I/O request is scheduled. Eventually, the I/O subsystem sends the request to the device driver. Depending on the operating system, the request is sent via a subroutine call or an in-kernel message.
The device driver allocates kernel buffer space to receive the data and schedules the I/O. Eventually, the driver sends commands to the device controller by writing into the device-control registers.
The device controller operates the device hardware to perform the data transfer.
The driver may poll for status and data, or it may have set up a DMA transfer into kernel memory. We assume that the transfer is managed by a DMA controller, which generates an interrupt when the transfer completes.
The correct interrupt handler receives the interrupt via the interrupt- vector table, stores any necessary data, signals the device driver, and returns from the interrupt.
The device driver receives the signal, determines which I/O request has completed, determines the request’s status, and signals the kernel I/O subsystem that the request has been completed.
The kernel transfers data or return codes to the address space of the requesting process and moves the process from the wait queue back to the ready queue.
Moving the process to the ready queue unblocks the process. When the scheduler assigns the process to the CPU, the process resumes execution at the completion of the system call.

Operating System Concepts - MemoryMapped File

Apr 27th, 2014 | Comments

TODO Reading Book List

Apr 23rd, 2014 | Comments

List the title of the books with corresponding reference url, for those which begin with */** are highly recommended.

Fundamental

Java

**Thinking in Java (4th Edition) prefer to starting with collections&container chapter
*Java Performance
Java Nio
Java Message Service

Shell

DataBase

Operating System Concepts - VirtualMemory

Apr 23rd, 2014 | Comments

Demand Paging

Loading the entire program into memory results in loading the executable code for all options, regardless of whether an option is ultimately selected by the user or not. An alternative strategy is to load pages only as they are needed. This technique is known as demand paging and is commonly used in virtual memory systems.

Page Fault

A page fault causes the following sequence to occur:

Trap to the operating system.
Save the user registers and process state.
Determine that the interrupt was a page fault.
Check that the page reference was legal and determine the location of the page on the disk.
Issue a read from the disk to a free frame:
1. Wait in a queue for this device until the read request is serviced.
2. Wait for the device seek and/or latency time.
3. Begin the transfer of the page to a free frame.
While waiting, allocate the CPU to some other user (CPU scheduling, optional).
Receive an interrupt from the disk I/O subsystem (I/O completed).
Save the registers and process state for the other user (if step 6 is executed).
Determine that the interrupt was from the disk.
Correct the page table and other tables to show that the desired page is now in memory.
Wait for the CPU to be allocated to this process again.
Restore the user registers, process state, and new page table, and then  resume the interrupted instruction.

Copy-on-Write

Considering that many child processes invoke the exec() system call immediately after creation, the copying of the parent’s address space may be unnecessary.

Instead, we can use a technique known as copy-on-write, which works by allowing the parent and child processes initially to share the same pages. These shared pages are marked as copy-on-write pages, meaning that if either process writes to a shared page, a copy of the shared page is created.

Page Replacement

Find the location of the desired page on the disk.
Find a free frame:
1. If there is a free frame, use it.
2. If there is no free frame, use a page-replacement algorithm to select a victim frame.
3. Write the victim frame to the disk; change the page and frame tables accordingly.
Read the desired page into the newly freed frame; change the page and frame tables.
Restart the user process.

Page Replacement Algorithm

FIFO Page Replacement
Optimal Page Replacement
LRU Page Replacement
LRU-Approximation Page Replacement
Additional-Reference-Bits Algorithm
Second-Chance Algorithm
1. Enhanced Second-Chance Algorithm
Counting-Based Page Replacement
1. least-frequently-used (LFU) page-replacement algorithm
2. most-frequently-used (MFU) page-replacement algorithm

Raw I/O

Some operating systems give special programs the ability to use a disk partition as a large sequential array of logical blocks, without any file-system data structures. This array is sometimes called the raw disk, and I/O to this array is termed raw I/O.

Raw I/O bypasses all the file- system services, such as file I/O demand paging, file locking, prefetching, space allocation, file names, and directories.

Thrashing

In fact, look at any process that does not have “enough” frames. If the process does not have the number of frames it needs to support pages in active use, it will quickly page-fault. At this point, it must replace some page. However, since all its pages are in active use, it must replace a page that will be needed again right away. Consequently, it quickly faults again, and again, and again, replacing pages that it must bring back in immediately.

This high paging activity is called thrashing. A process is thrashing if it is spending more time paging than executing.

Operating System Concepts - MainMemory

Apr 9th, 2014 | Comments

Contiguous Memory Allocation

Problem:

Both the first-fit and best-fit strategies for memory allocation suffer from external fragmentation.

The general approach to avoiding this problem is to break the physical memory into fixed-sized blocks and allocate memory in units based on block size. With this approach, the memory allocated to a process may be slightly larger than the requested memory. The difference between these two numbers is internal fragmentation — unused memory that is internal to a partition.

Solution:

One solution to the problem of external fragmentation is compaction. The goal is to shuffle the memory contents so as to place all free memory together in one large block. Compaction is possible only if relocation is dynamic and is done at execution time.

Another possible solution to the external-fragmentation problem is to permit the logical address space of the processes to be noncontiguous, thus allowing a process to be allocated physical memory wherever such memory is available.

Page and Segmentation

Paging is a memory-management scheme that permits the physical address space of a process to be noncontiguous. Paging avoids external fragmentation and the need for compaction.

paging_with_TLB

The translation look-aside buffer(TLB) is associative, high-speed memory. Each entry in the TLB consists of two parts: a key (or tag) and a value. When the associative memory is presented with an item, the item is compared with all keys simultaneously. If the item is found, the corresponding value field is returned.

If the page number is not in the TLB (known as a TLB miss), a memory reference to the page table must be made. When the frame number is obtained, we can use it to access memory (Figure 8.11). In addition, we add the page number and frame number to the TLB, so that they will be found quickly on the next reference.

Segmentation is a memory-management scheme that supports this user view of memory.

Page Table

hierarchical paging

Problem&Solution:

For example, consider a system with a 32-bit logical address space. If the page size in such a system is 4 KB (212), then a page table may consist of up to 1 million entries (232/212). Assuming that each entry consists of 4 bytes, each process may need up to 4 MB of physical address space for the page table alone. Clearly, we would not want to allocate the page table contiguously in main memory. One simple solution to this problem is to divide the page table into smaller pieces. We can accomplish this division in several ways.

$ getconf PAGESIZE
4096

hashed page tables

Problem&Solution:

A common approach for handling address spaces larger than 32 bits is to use a hashed page table, with the hash value being the virtual page number.

inverted page tables

Problem&Solution:

Usually, each process has an associated page table. One of the drawbacks of this method is that each page table may consist of millions of entries. These tables may consume large amounts of physical memory just to keep track of how other physical memory is being used.

Drawbacks:

Although this scheme decreases the amount of memory needed to store each page table, it increases the amount of time needed to search the table when a page reference occurs.

Systems that use inverted page tables have difficulty implementing shared memory. Shared memory is usually implemented as multiple virtual addresses (one for each process sharing the memory) that are mapped to one physical address. This standard method cannot be used with inverted page tables; because there is only one virtual page entry for every physical page, one physical page cannot have two (or more) shared virtual addresses.

Operating System Concepts - CPU

Apr 2nd, 2014 | Comments

CPU Scheduling

CPU scheduling is the task of selecting a waiting process from the ready queue and allocating the CPU to it. The CPU is allocated to the selected process by the dispatcher.

Scheduling Criteria

CPU utilization
Throughput
Turnaround time

The interval from the time of submission of a process to the time of completion is the turnaround time. Turnaround time is the sum of the periods spent waiting to get into memory, waiting in the ready queue, executing on the CPU, and doing I/O.

Waiting time
Response time

Scheduling Alogrithm

First-Come, First-Served Scheduling
Shortest-Job-First Scheduling
Priority Scheduling

A major problem with priority scheduling algorithms is indefinite blocking, or starvation.

A solution to the problem of indefinite blockage of low-priority processes is aging.

Round-Robin Scheduling
Multilevel Queue Scheduling
Multilevel Feedback Queue Scheduling

Multiple-Processor/Multicore Scheduling

Processor Affinity

Because of the high cost of invalidating and repopulating caches, most SMP systems try to avoid migration of processes from one processor to another and instead attempt to keep a process running on the same processor. This is known as processor affinity.

Some systems — such as Linux — also provide system calls that support hard affinity, thereby allowing a process to specify that it is not to migrate to other processors.

The main-memory architecture of a system can affect processor affinity issues. Recall the knowledge non-uniform memory access (NUMA) mentioned in previous post, in which a CPU has faster access to some parts of main memory than to other parts.

Load Balancing

Load balancing attempts to keep the workload evenly distributed across all processors in an SMP system. It is important to note that load balancing is typically only necessary on systems where each processor has its own private queue of eligible processes to execute.

There are two general approaches to load balancing: push migration and pull migration. Linux runs its load-balancing algorithm every 200 milliseconds (push migration) or whenever the run queue for a processor is empty (pull migration).

Memory Stall

Researchers have discovered that when a processor accesses memory, it spends a significant amount of time waiting for the data to become available. This situation, known as a memory stall, may occur for various reasons, such as a cache miss (accessing data that are not in cache memory).

Virtualization

The virtualization software presents one or more virtual CPUs to each of the virtual machines running on the system and then schedules the use of the physical CPUs among the virtual machines.

← Older Blog Archives

MV* framework

MVC extension framework

template

build automation

module loader

package manager

workflow automation

code analysis

unit testing

Bottom Up Approach

Choosing the Right CPU Architecture

CPU Utilization

Monitoring Linux CPU Scheduler Run Queue

Memory Utilization

Monitoring Lock Contention on Linux

Quick Lock Contention Monitoring

Isolating Hot Locks

Monitoring Involuntary Context Switches

Monitoring Thread Migrations

Network I/O Utilization

Disk I/O Utilization

Additional Command Line Tools

Monitoring CPU Utilization on SPARC T-Series Systems

I/O Hardware

Polling & Interrupts

Direct Memory Access

Blocking and Nonblocking I/O

Buffering

Caching

Transforming I/O Requests to Hardware Operations

Fundamental

Java

Shell

DataBase

Demand Paging

Page Fault

Copy-on-Write

Page Replacement

Raw I/O

Thrashing

Contiguous Memory Allocation

Page and Segmentation

Page Table

CPU Scheduling

Scheduling Criteria

Scheduling Alogrithm

Multiple-Processor/Multicore Scheduling