This is enabled with -O3, but only for some targets. Seems to be off for x86_64.
docs:
Use IRA to evaluate register pressure in loops for decisions to move loop invariants. This option usually results in generation of faster and smaller code on machines with large register files (>= 32 registers), but it can slow the compiler down.
docs:
I'd recommend to use this at
least for x86/x86-64. I think any OOO processor with small or
moderate register file which does not use the 1st insn scheduling
might benefit from this too.
On SPEC2000 for x86/x86-64 (I use Haswell processor, -O3 with
general tuning), the optimization usage results in smaller code size
in average (for floating point and integer benchmarks in 32- and
64-bit mode). The improvement better visible for SPECFP2000 (although
I have the same improvement on x86-64 SPECInt2000 but it might be
attributed mostly mcf benchmark unstability). It is about 0.5% for
32-bit and 64-bit mode. It is understandable, as the optimization has
more opportunities to improve the code on longer BBs. Different from
other heuristic optimizations, I don't see any significant worse
performance. It gives practically the same or better performance (a
few benchmarks imporoved by 1% or more upto 3%).
The single but significant drawback is additional compilation time
(4%-6%) as the 1st insn scheduling pass is quite expensive.
Source of docs: https://gcc.gnu.org/legacy-ml/gcc-patches/2013-11/msg00420.html
docs:
When -fgcse-las is enabled, the global common subexpression elimination pass eliminates redundant loads that come after stores to the same memory location (both partial and full redundancies).
docs:
When -fgcse-sm is enabled, a store motion pass is run after global common subexpression elimination. This pass attempts to move stores out of loops. When used in conjunction with -fgcse-lm, loops containing a load/store sequence can be changed to a load before the loop and a store after the loop.