c++ Programming Glossary: agner

Why is transposing a matrix of 512x512 much slower than transposing a matrix of 513x513?

http://stackoverflow.com/questions/11413855/why-is-transposing-a-matrix-of-512x512-much-slower-than-transposing-a-matrix-of

share improve this question The explanation comes from Agner Fog in Optimizing software in C and it reduces to how data is.. read memory. I'll try to somewhat follow the example from Agner Assume each set has 4 lines each holding 64 bytes. We first.. lines. This is the theory part. Next the explanation also Agner I'm following it closely to avoid making mistakes Assume a matrix..

cpu dispatcher for visual studio for AVX and SSE

http://stackoverflow.com/questions/15406658/cpu-dispatcher-for-visual-studio-for-avx-and-sse

the appropriate code path. I've follow the suggestions by Agner Fog to make a CPU dispatcher http www.agner.org optimize #vectorclass.. this Edit Okay I think I isolated the problem. I'm using Agner Fog's vector class and I have defined three source files as.. as long as I don't have another source file with AVX. Agner Fog's manual says There is no advantage in using the 256 bit..

What is “cache-friendly” code?

http://stackoverflow.com/questions/16699247/what-is-cache-friendly-code

about caches memory hierarchies and proper programming Agner Fog's page . In his excellent documents you can find detailed..

Why is unsigned integer overflow defined behavior but signed integer overflow isn't?

http://stackoverflow.com/questions/18195715/why-is-unsigned-integer-overflow-defined-behavior-but-signed-integer-overflow-is

this blog post by Ian Lance Taylor or this complaint by Agner Fog and the answers to his bug report. share improve this answer..

Performance of built-in types : char vs short vs int vs. float vs. double

http://stackoverflow.com/questions/5069489/performance-of-built-in-types-char-vs-short-vs-int-vs-float-vs-double

about them and non existent otherwise. Further reading Agner Fog maintains a nice website with lots of discussion of low..

SSE SSE2 and SSE3 for GNU C++

http://stackoverflow.com/questions/661338/sse-sse2-and-sse3-for-gnu-c

some very nice coverage of intrinsics and vectorization in Agner Fog's optimization PDFs thanks although it's a bit spread about..

How to write fast (low level) code? [closed]

http://stackoverflow.com/questions/6852670/how-to-write-fast-low-level-code

Writing High Level book Software optimization resources by Agner Fog five detailed pdf manuals I'll need a bit of skim time to..

How can adding code to a loop make it faster?

http://stackoverflow.com/questions/688325/how-can-adding-code-to-a-loop-make-it-faster

EDIT If you want to read on the branch prediction give Agner Fog's excellent web site a try http www.agner.org optimize This..

Using AVX CPU instructions: Poor performance without “/arch:AVX”

http://stackoverflow.com/questions/7839925/using-avx-cpu-instructions-poor-performance-without-archavx

the result of expensive state switching. See page 102 of Agner Fog's manual http www.agner.org optimize microarchitecture.pdf..

how to achieve 4 FLOPs per cycle

http://stackoverflow.com/questions/8389648/how-to-achieve-4-flops-per-cycle

mul to complete on most of the modern Intel cpu's see e.g. Agner Fog's 'Instruction Tables' . Due to pipelining one can get a..

Why is one loop so much slower than two loops?

http://stackoverflow.com/questions/8547778/why-is-one-loop-so-much-slower-than-two-loops

going on here... Alignment could still play an effect as Agner Fog mentions cache bank conflicts . That link is about Sandy..