Forum stats game

For all your silly time-killing forum games.

Moderators: jestingrabbit, Moderators General, Prelates

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 9:44 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256kB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle. I guess it was about 4 pages before the real jerk showed up.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 4 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive work.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of applications generally scale very well with
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

mickyj300x
Posts: 295
Joined: Wed Apr 02, 2008 5:07 am UTC
Location: New Zealand

Re: Forum stats game

Postby mickyj300x » Mon Jun 15, 2009 9:44 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 4 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive work.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose the game
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 9:48 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 4 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive work.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose the game
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
poxic
Eloquently Prismatic
Posts: 4751
Joined: Sat Jun 07, 2008 3:28 am UTC
Location: Left coast of Canada

Re: Forum stats game

Postby poxic » Mon Jun 15, 2009 9:50 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive work.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose the game
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
A man who is 'ill-adjusted' to the world is always on the verge of finding himself. One who is adjusted to the world never finds himself, but gets to be a cabinet minister.
- Hermann Hesse, novelist, poet, Nobel laureate (2 Jul 1877-1962)

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 9:52 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive work.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose the game
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 9:52 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive work.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 9:54 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive work. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 9:55 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 9:56 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “super-scalar execution,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 9:59 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 10:07 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor core. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 10:11 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 10:13 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 10:15 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level mangoes, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
poxic
Eloquently Prismatic
Posts: 4751
Joined: Sat Jun 07, 2008 3:28 am UTC
Location: Left coast of Canada

Re: Forum stats game

Postby poxic » Mon Jun 15, 2009 10:16 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and frequency scalability of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
A man who is 'ill-adjusted' to the world is always on the verge of finding himself. One who is adjusted to the world never finds himself, but gets to be a cabinet minister.
- Hermann Hesse, novelist, poet, Nobel laureate (2 Jul 1877-1962)

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 10:18 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time - 'of course they will'
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and slimy peculiarity of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 10:25 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time - 'of course they will' but i love it so much
“out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these innovations and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and slimy peculiarity of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 10:31 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time - 'of course they will' but i love it so much - does this mean i am a jerk? “out-of-order speculative execution” and “tiny hamsters fitted into gloves,” introduced on prior Intel® microarchitecture
generations. Many of these raptors and advances were made possible with the
improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and slimy peculiarity of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Mon Jun 15, 2009 10:39 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time - 'of course they will' but
love it so much does this mean i am a jerk? yes me too “out-of-order speculative execution” and “tiny hamsters fitted
into gloves,” introduced on prior Intel® microarchitecture generations. Many of these raptors and advances were made
possible with the improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and slimy peculiarity of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this window
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Mon Jun 15, 2009 10:42 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time - 'of course they will' but
love it so much does this mean i am a jerk? yes me too - jerks together then =] “out-of-order speculative execution” and “tiny hamsters fitted
into gloves,” introduced on prior Intel® microarchitecture generations. Many of these raptors and advances were made
possible with the improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The hyper-pipelined technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and slimy peculiarity of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this well of happiness
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced branch prediction capability that allows
the processor to be more accurate in predicting program branches and has the net effect of reducing the
number of branch mispredictions by about 33% over the P6 micro-architecture’s branch prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) branch target buffer in which to store more
detail on the history of past branches as well as implementing a more advanced branch prediction algorithm.
This enhanced branch prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to branch misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit SIMD integer arithmetic operations and 128-bit SIMD
double-precision floating-point (FP) operations. These new instructions provide programmers with new
abilities to execute a particular program task on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall performance increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
poxic
Eloquently Prismatic
Posts: 4751
Joined: Sat Jun 07, 2008 3:28 am UTC
Location: Left coast of Canada

Re: Forum stats game

Postby poxic » Mon Jun 15, 2009 11:37 pm UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time - 'of course they will' but
love it so much does this mean i am a jerk? yes me too - jerks together then =] “out-of-order speculative execution” and “tiny hamsters fitted
into gloves,” introduced on prior Intel® microarchitecture generations. Many of these raptors and advances were made
possible with the improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The Hugo Pippelini technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and slimy peculiarity of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this well of happiness
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced barf prediction capability that allows
the processor to be more accurate in predicting barfing episodes and has the net effect of reducing the
number of barf mispredictions by about 33% over the P6 micro-architecture’s barf prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) barf target buffer in which to store more
detail on the history of past barfs as well as implementing a more advanced barf prediction algorithm.
This enhanced barf prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to barf misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit Simian integer arithmetic operations and 128-bit Monkey
double-precision floating-point (ApeP) operations. These new instructions provide orangutans with new
abilities to execute a particular poo fling on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall chimpanzee increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
A man who is 'ill-adjusted' to the world is always on the verge of finding himself. One who is adjusted to the world never finds himself, but gets to be a cabinet minister.
- Hermann Hesse, novelist, poet, Nobel laureate (2 Jul 1877-1962)

User avatar
dedalus
Posts: 1169
Joined: Fri Apr 24, 2009 12:16 pm UTC
Location: Dark Side of the Moon.

Re: Forum stats game

Postby dedalus » Tue Jun 16, 2009 8:06 am UTC

To make it even worse for eagle, there is an edit button muhahaha.
Last edited by dedalus on Tue Jun 16, 2009 11:21 am UTC, edited 1 time in total.
doogly wrote:Oh yea, obviously they wouldn't know Griffiths from Sakurai if I were throwing them at them.

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Tue Jun 16, 2009 8:41 am UTC

The Pentium 4 processor, utilizing the Intel NetBurst micro-architecture, is a complete processor redesign
that delivers new technologies and capabilities while advancing many of the innovative features, such as
deary deary me, eaglef2 is going to haaaaaate us for a very, very, very, very, very long time - 'of course they will' but
love it so much does this mean i am a jerk? yes me too - jerks together then =] “out-of-order speculative execution” and “tiny hamsters fitted
into gloves,” introduced on prior Intel® microarchitecture generations. Many of these raptors and advances were made
possible with the improvements in processor technology, transistor technology and circuit design, and they could not have
been implemented previously in high-volume, manufacturable solutions. The new technologies and
innovative features that are introduced in the Intel NetBurst micro-architecture are listed below:
Hyper-Pipelined Technology: The Hugo Pippelini technology of the NetBurst micro-architecture doubles
the pipeline depth, compared to the P6 micro-architecture, with a 20-stage pipeline. This technology
significantly increases processor performance and slimy peculiarity of the base micro-architecture.
400-MHz System Bus: Through a physical signaling scheme of quad pumping the data transfers over a 100-
MHz clocked system bus and a buffering scheme allowing for sustained 400-MHz data transfers, the Pentium
4 processor supports the industry’s highest performance desktop system bus delivering a data rate of 3.2
Giga-Bytes per second (GB/s) in and out of the processor. This compares to 1.06 GB/s delivered on the
Pentium III processor’s 133-MHz system bus.
Advanced Dynamic Execution: The Advanced Dynamic Execution engine is a very deep, out-of-order
speculative execution engine that keeps the execution units busy. It does so by providing a very large
window of instructions from which the execution units can choose in order to get around stalls due to
instructions that are not ready to execute based on some unmet dependency (such as waiting for data to be
loaded from main memory). The NetBurst micro-architecture can have up to 126 instructions in this well of happiness
(in flight) versus the P6 micro-architecture’s much smaller window of 42 instructions.
The Advanced Dynamic Execution engine also delivers an enhanced barf prediction capability that allows
the processor to be more accurate in predicting barfing episodes and has the net effect of reducing the
number of barf mispredictions by about 33% over the P6 micro-architecture’s barf prediction
capability. It does this by implementing a 4 Kilo Bytes (KB) barf target buffer in which to store more
detail on the history of past barfs as well as implementing a more advanced barf prediction algorithm.
This enhanced barf prediction capability is one of the key design elements that helps to reduce the overall
sensitivity to barf misprediction penalty of the NetBurst micro-architecture.
Rapid Execution Engine: Through a combination of architectural, physical and circuit designs, the
Arithmetic Logic Units (ALUs) within the processor run at two times the frequency of the processor core.
This allows the ALUs to execute certain instructions in ½ a core clock and results in higher execution
throughput as well as reduced latency of execution.
Advanced Transfer Cache: The level 2 Advanced Transfer Cache is 256KB in size and delivers a much
higher data throughput channel between the level 2 cache and the processor wooliness. The Advanced Transfer
Cache consists of a 256-bit (32-byte) interface that transfers data on each core clock. As a result, a 1.5-GHz
Pentium 4 processor could deliver a data transfer rate of 48GB/s (32 bytes x 1 (data transfer per clock) x 1.5
GHz = 48GB/s). This compares to a transfer rate of 16GB/s on the Pentium III processor 1 GHz and
contributes to the processor’s ability to keep the high-frequency execution units busy executing instructions
instead of sitting idle.
Execution Trace Cache: The Execution Trace Cache is an innovative way to implement a 1st level
instruction cache. It caches decoded IA-32 instructions (or micro-ops), thus removing the latency associated
with the instruction decoder from the main execution loops. In addition, the Execution Trace Cache stores
these micro-ops in the path of program execution flow, where the results of branches in the code are
integrated into the same cache line. This increases the instruction flow from the cache and makes better use
of the overall cache storage space (12K micro-ops) since the cache no longer stores instructions that are
branched over and never executed. The net result is a means to deliver a high volume of instructions to the
processor’s execution units and a reduction in the overall time required to recover from branches that have
been mispredicted.
Streaming SIMD Extensions 2 (SSE2): With the introduction of the SSE2 extensions, the NetBurst microarchitecture
now extends the SIMD capabilities of Intel® MMXTM technology and the SSE extensions by
Desktop Performance and Optimization for Pentium® 4 Processor
Page 7
adding 144 new instructions that perform 128-bit Simian integer arithmetic operations and 128-bit Monkey
double-precision floating-point (ApeP) operations. These new instructions provide orangutans with new
abilities to execute a particular poo fling on Pentium 4 processors with fewer instructions and in less
time. As a result using SSE2 extension can contribute significantly to an overall chimpanzee increase.
In addition, the Pentium 4 processor has implemenHAPPYCORE PORNOGRAPHYted a Hardware Prefetcher: The automatic hardware
prefetcher operates transparently without requiring programmer’s active intervention. It is triggered by
regular access patterns and helps predict future accesses, thereby overlapping memory latency with
computation. By enabling concurrency between memory accesses and computation, this maximizes the
computational benefit of higher Pentium 5 processor frequencies
1.2 Desktop Performance Expectations
The scalability of application performance with higher processor frequencies vary greatly across applications.
This is because different applications have different requirements and are coded differently. Application
code can be divided into the following categories: integer and basic office productivity applications versus
floating-point and multimedia applications. The instructions executed per clock achievable by these different
application categories varies greatly, and this variance is strongly affected by the number of branches that
application code typically takes and the predictability of these branches. The more branches taken with lower
predictability, the more opportunity to incorrectly predict the result of the branches, and hence the possibility
of performing nonproductive squirrels. I am so easily pleased.
Integer and basic office productivity applications, such as word and spreadsheet processing, tend to have
many branches in the code, thus reducing overall IPC capabilities. As a result, the associated branch penalties
and performance on these applications does not generally scale as well with frequency and are more resistant
to improvements in micro-architectural means, such as deeper pipelines. However, significantly raising the
performance level on these types of applications that run in basic, non-multitasking, environments does not
necessarily increase the user’s experience, because the processing power required by these types of basic
applications and environments tends to be satisfied by today’s higher end Pentium III processors.
Floating-point and multimedia applications tend to have branches that are very predictable, and thus naturally
have a higher average IPC capability. As a result, these types of people lose twenty dollars and my self respect
frequency and are inclined to benefit greatly from deeper pipelines. In addition, the processing power
required by these applications tend to be unbounded: the more performance that is available, the better the
user’s experience.
The Pentium 4 processor shows immediate performance improvements across most existing software
available today, with performance levels varying depending on the application category type and the extent
that an application is optimized for the new micro-architecture.
An increase in frequency with previous micro-architectural generation products, such as the Pentium III
processor, generally did not yield performance increases equal to the frequency increases. The exact
efficiency of performance increase versus frequency (comparing a Pentium 4 processor at 1.5 GHz and a
Pentium III processor at 1GHz ) depends on individual application (see Figure 2), but in general you should
not expect to see a 50% increase in performance with a 50% increase in frequency (i.e. 100% efficiency of
converting frequency increase into performance gain in Figure 2). With a 40-50% increase in frequency, the
Pentium 4 processor was designed to yield in the range of a 20% gain on integer and a 20-70% gain on
floating-point/multi-media performance. (In workloads that include system-level activities, such as disk and
network accesses, the performance results depend less on processor performance. Therefore, the performance
scaling tends to be lower, SYSmark* 2000 is one such case.) As seen in Figure 2, the Pentium 4 processor
enables not only a large increase in frequency, but also demonstrates greater efficiency in translating this
frequency into performance gains, when compared to the Pentium III processor.

No copy-pasta quote tags people. Its in the rules.
I know
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
Gojoe
Posts: 3218
Joined: Wed Apr 30, 2008 12:45 pm UTC
Location: New Zealand!!!

Re: Forum stats game

Postby Gojoe » Tue Jun 16, 2009 8:43 am UTC

Gojoe rocks
michaelandjimi wrote:Oh Mr Gojoe
I won't make fun of your mojo.
Though in this fora I serenade you
I really only do it to aid you.
*Various positive comments on your masculinity
That continue on into infinity*

Feeble accompanying guitar.

User avatar
Tillan
Posts: 223
Joined: Sat Sep 20, 2008 1:36 pm UTC
Location: Coffee
Contact:

Re: Forum stats game

Postby Tillan » Tue Jun 16, 2009 11:19 am UTC

Gojoe is a jerk =p
Now work damnit! No, dont carry on posting here, you're a very busy person. work work work!!

User avatar
dedalus
Posts: 1169
Joined: Fri Apr 24, 2009 12:16 pm UTC
Location: Dark Side of the Moon.

Re: Forum stats game

Postby dedalus » Tue Jun 16, 2009 11:20 am UTC

Gojoe is a jerk =P
doogly wrote:Oh yea, obviously they wouldn't know Griffiths from Sakurai if I were throwing them at them.

User avatar
eaglef2
Posts: 93
Joined: Mon Feb 09, 2009 4:52 am UTC
Location: I am over there

Re: Forum stats game

Postby eaglef2 » Tue Jun 16, 2009 11:00 pm UTC

well I have made a program to check the strings, and I am not checking for hidden tags like

Code: Select all

h[s][/s]i
, so it has become much easier for me
"I am a four hundred-foot tall purple Platypus Bear with pink horns and silver wings."
-Azula, Avatar: The Last Airbender.

User avatar
Tillian
NotNotNotTillian
Posts: 71
Joined: Sat May 16, 2009 2:47 pm UTC
Location: Set₁

Re: Forum stats game

Postby Tillian » Tue Jun 16, 2009 11:42 pm UTC

well I have made a program to check the strings, and I am not checking for hidden tags like

Code: Select all

h[s][/s]i
, so it has become much easier for me
xkcd forum gamers' Discord chat: https://discord.gg/Q7QM5sH
heuristically_alone wrote:Tillian you are always in every single one of dreams,
usually driving an ice cream truck.
NOTE: This is not me. That's another guy.

User avatar
poxic
Eloquently Prismatic
Posts: 4751
Joined: Sat Jun 07, 2008 3:28 am UTC
Location: Left coast of Canada

Re: Forum stats game

Postby poxic » Wed Jun 17, 2009 12:27 am UTC

well I have made a program to check the strings, and I am not checking for hidden tags like

Code: Select all

h[s][/s]i
, so it has become much easier for me
A man who is 'ill-adjusted' to the world is always on the verge of finding himself. One who is adjusted to the world never finds himself, but gets to be a cabinet minister.
- Hermann Hesse, novelist, poet, Nobel laureate (2 Jul 1877-1962)

Agent_Irons
Posts: 213
Joined: Wed Sep 10, 2008 3:54 am UTC

Re: Forum stats game

Postby Agent_Irons » Wed Jun 17, 2009 5:13 am UTC

well I have made a program to check the strings, and I am not checking for hidden tags like

Code: Select all

h[s][/s]i
, so it has become much easier for me

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Wed Jun 17, 2009 3:52 pm UTC

Fuck it, ill be a jerk...
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
eaglef2
Posts: 93
Joined: Mon Feb 09, 2009 4:52 am UTC
Location: I am over there

Re: Forum stats game

Postby eaglef2 » Wed Jun 17, 2009 7:01 pm UTC

I'm against that word
"I am a four hundred-foot tall purple Platypus Bear with pink horns and silver wings."
-Azula, Avatar: The Last Airbender.

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Wed Jun 17, 2009 7:04 pm UTC

I'm against that word

*Apologizes*
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]

User avatar
‎Cheese
Posts: 25
Joined: Fri May 01, 2009 5:39 pm UTC
Location: ¿burning you?

Re: Forum stats game

Postby ‎Cheese » Wed Jun 17, 2009 8:06 pm UTC

HAI THREAD IMMA BREAKING UR COMBO WELL NOT REALLY BUT Y'KNOW WHAT I MEAN LOFL

ALSO CHEESE IS GODLIKE, AIN'T HE? YEAH.
hermaj wrote:No-one. Will. Be. Taking. Cheese's. Spot.
Spoiler:
LE4dGOLEM wrote:Cheese is utterly correct on all fronts.
SecondTalon wrote:That thing that Cheese just said. Do that.
Meaux_Pas wrote:I hereby disagree and declare Cheese to be brilliant.
Image

User avatar
Nith Azra
Posts: 117
Joined: Sat May 16, 2009 6:14 pm UTC
Location: B-Town/Bendighetto
Contact:

Re: Forum stats game

Postby Nith Azra » Wed Jun 17, 2009 8:17 pm UTC

Oh hai thar, I herd you leik mudkipz?
Mighty Jalapeno wrote:I wrote "moistly"... wierd.


::.._____..::ROYAL RAINBOW!!!::.._____..::

User avatar
poxic
Eloquently Prismatic
Posts: 4751
Joined: Sat Jun 07, 2008 3:28 am UTC
Location: Left coast of Canada

Re: Forum stats game

Postby poxic » Thu Jun 18, 2009 1:20 am UTC

No.
A man who is 'ill-adjusted' to the world is always on the verge of finding himself. One who is adjusted to the world never finds himself, but gets to be a cabinet minister.
- Hermann Hesse, novelist, poet, Nobel laureate (2 Jul 1877-1962)

User avatar
Nith Azra
Posts: 117
Joined: Sat May 16, 2009 6:14 pm UTC
Location: B-Town/Bendighetto
Contact:

Re: Forum stats game

Postby Nith Azra » Thu Jun 18, 2009 1:41 am UTC

Well, you sure as hell don't like shitty memes. :)
Mighty Jalapeno wrote:I wrote "moistly"... wierd.


::.._____..::ROYAL RAINBOW!!!::.._____..::

User avatar
poxic
Eloquently Prismatic
Posts: 4751
Joined: Sat Jun 07, 2008 3:28 am UTC
Location: Left coast of Canada

Re: Forum stats game

Postby poxic » Thu Jun 18, 2009 1:53 am UTC

Well, you sure as hell don't like shitty memes. :)
A man who is 'ill-adjusted' to the world is always on the verge of finding himself. One who is adjusted to the world never finds himself, but gets to be a cabinet minister.
- Hermann Hesse, novelist, poet, Nobel laureate (2 Jul 1877-1962)

User avatar
Nith Azra
Posts: 117
Joined: Sat May 16, 2009 6:14 pm UTC
Location: B-Town/Bendighetto
Contact:

Re: Forum stats game

Postby Nith Azra » Thu Jun 18, 2009 2:06 am UTC

Well, you sure as hell don't like shitty memes. :)
Mighty Jalapeno wrote:I wrote "moistly"... wierd.


::.._____..::ROYAL RAINBOW!!!::.._____..::

User avatar
Fractal_Tangent
Today is my Birthday!
Posts: 923
Joined: Thu Feb 19, 2009 9:34 pm UTC
Location: Here, I suppose. I could be elsewhere...

Re: Forum stats game

Postby Fractal_Tangent » Thu Jun 18, 2009 3:47 pm UTC

Well, you sure as hell don't like shitty memes. :)
eSOANEM wrote:
right now, that means it's Nazi punching time.


she/her/hers
=]


Return to “Forum Games”

Who is online

Users browsing this forum: chridd, SuperJedi224 and 34 guests