AMD "INCEPTION" CPU Vulnerability Disclosed

pnutzh4x0r@lemmy.ndlug.org · 11 months ago

AMD "INCEPTION" CPU Vulnerability Disclosed

PenguinTD@lemmy.ca · 11 months ago

NOTE: this is based on my personal understanding which might be wrong/outdated in the modern setting, so if some lemmy is expert in this regard feel free to correct me.

some source, I only skimp it and decide what I typed might still be relevant: https://en.wikipedia.org/wiki/Branch_predictor#

If anything that my previous cpu architecture course(in the early 2000, so almost 20 years old) is that intuition doesn’t work as you imagined. The branch prediction is like a counter you can keep or adjust or flush and there is a very good reason that you execute both when you have the registers and processing unit that can run the instructions in parallel and just pick the correct path once the IF check result is completed. Why they have this in the beginning even before multicore CPU is that the penalty of waiting far outweight just run as if the “if” never exist and “abandon” the result and skip the if block. Common if block is to hide expensive calculation, which means you usually have fetches and potential cache miss and needs to go all the way to read disk(that’s why we have prefetching etc instructions that compiler guess for you depending on your code), so if you waited all the cycle and then cache miss, you are gonna stall and wait until a couple hundred cycles waiting for the data to fetch and fill the registers/cache required to run that instruction in side the if block. is there a penalty if the predictor guess wrong? yes, but the penalty is far cheaper compare to if you don’t run at all.

For blocks that have both if/else that are roughly similar, then running both is preferred unless your scheduling etc can’t really do this(compile will decide this for you) and you probably need some profiler to check why your CPU stalled and fix your code so compiler can schedule better for you(profiler assisted optimization is a thing). It doesn’t have to be all equal weight for both if and else part, that’s why predictor kicks in so if the past batch of data all resulted in true, they allocate more in the true block and run less in the false block and eat the penalty if only a couple in the same data set resulted in having to run longer in the false block. Modern compiler/CPUs are really good with this and usually are better using the predictor/default optimization flag over hand tuning(like writing your own unwrap/asm etc).

Is there cases where running both is not really practical? yes, I think there will always be cases where your program maybe data heavy where say, true and false fetches entirely different block of data for calculation and doing both makes your cache miss more, it would just be game of “eating less penalty” and usually is out of reach for us since compiler just did it for you. And that’s not my specialty at all, I am just your common pleb programmer that relies on compiler -O3.