Modern computers' processors are about 1500 times faster than memory.
Hard Disk Drives or other external memory devices are even slower.
Many programs often wait for a memory access after every a few of processor instructions.
It's quite inefficient, but simple to code.
There are registers & tiered cache inside the processors.
When a program's part can be loaded into a cache or use only registers it does not have to wait for memory access and is much faster; it's harder & more expensive programmer-time-wise to write such programs however. Programmers are paid per hour of work and this is not a small amount of money as well.
There are other optimizations too, for example - caching partial computations so these do not need to be computed again or read from external media device.
For even more of optimizations, please refer to books about 'Algorithms & Data Structures' - for example: , , ,  (see right border of this blog for literature section).
Simple programs as many of web-pages do not need to be blazing-fast, they should be cheap instead to make & change.
In these cases programmers should & do optimize their own time at cost of program's efficiency.
It's cheaper to change hardware infrastructure or to buy many computers than to pay many of very skilled programmers for long times of optimizations.
Often a time of make is crucial too, for example when business wants to release a product before competition.
'Ola AH' Programming Language.
i think that with 'Ola AH' Programming Language it should be possible to take advantage of tiered-memory-pyramid-of-needs, as well as of multi-threading of standard processors, as well as of the RISC machine architectures to accelerate the computations.
RISC stands for 'Reduced Instruction Set Computing' machines, including ARM - 'Advanced RISC Machines'. These use very many of simple & cheap processors working in parallel.
Exact details to be designed still, it will be important when the compiler will be implemented.