Sc compiling opt


跳转到: 导航, 搜索


Optimization by Compiling

Fine-tune performance to target Intel CPUs with Intel compiler and processor-specific options. The optimized version shows many improvements over official binaries. For instance, table below illustrates that the young GC performance improved by 20%.

table 2.1
specjbb 2005 Official 117.496 116.792 117.584 117.2907
specjbb 2005Taobao 94.3242 97.234693.2691 94.9426 23.54%
GCBench Official 1.31511 1.35022 1.37998 1.348403
GCBenchTaobao 1.07705 1.11266 1.06704 1.085583 24.21%
SPECJVM2008Official 12.0404 11.2786 10.5841 11.3010
SPECJVM2008Taobao 9.26237 9.73316 9.34638 9.44730 19.62%

TAOBAOJDK-004: Improvement on JNI invocation on X86_64

Rearrange the compiled-to-native wrapper code to straighten instruction branches. It helps to decrease the overhead of JNI invocation by eliminating CPU pipeline stalls.

[source code] by changren

TAOBAOJDK-005: Remove unused DTrace hooks in Taobao-specific build

[source code] by changren

TAOBAOJDK-016: Add internal support for crc32 and crc32c

Add JVM intrinsic for CRC32 implementation and for CRC32C when SSE4.2 crc32 instruction is supported.

[source code] by changren kungu.mjh

TAOBAOJDK-017: Improvement on parnew work steal policy

In ParNew GC, GC thread steals task from others after dealing with its own task. The performance of the original steal policy degrade, if the last GC thread is dealing with large objects. This can be solved by giving up stealing GC tasks after failing over a number of times.

[source code] by chengtao, changren

TAOBAOJDK-021: Java interface for Crc32

Java interface for intrinsic Crc32, see also TAOBAOJDK-016

[source code] by kungu.mjh

TAOBAOJDK-022: Java interface for Crc32C

Java interface for intrinsic Crc32C, see also TAOBAOJDK-016

[source code] by kungu.mjh

TAOBAOJDK-023: Improvement on JNI invocation on X86_32

Port TAOBAOJDK-004 to x86 32 bits VM.

[source code] by sajia

TAOBAOJDK-024: Improvement of card-table process

When GC is underway, JVM scans a card table per byte. The card table's bytes are mostly clean. Scanning card table per quad word(8 bytes) improves the GC performance. The improvement is more obvious on large java heap.

[source code] by chengtao ,reference