$NVDA - NVIDIA has successfully transitioned from CUDA cores to tensor cores, implementing systolic arrays that dramatically improve compute efficiency by reducing data movement costs. The company's architectural innovations (like moving from 2x FP4/FP8 ratio to 3x in B300) show continued optimization leadership in AI chip design.
$FPGA - FPGAs provide deterministic latency and high parallelism for applications like high-frequency trading where predictability is more valuable than raw throughput. While 10x less efficient than ASICs, the $10k vs $30M first-unit cost and field programmability create a sustainable niche for frequently-changing workloads.
$LOWPRECISION - Lower precision arithmetic (FP4 vs FP8) provides quadratic area savings due to the p×q scaling of multiply-accumulate circuits, making it increasingly favorable for neural networks. This structural advantage drives continued adoption of lower precision formats in AI accelerators.
Bearish:
$CPUARCH - Traditional CPU architecture suffers from fundamental inefficiency where 7/8 of circuit cost goes to register file data movement rather than actual computation. Branch predictors consume significant die area for nondeterministic latency benefits that are increasingly less valuable for AI workloads.
$CACHE - CPU cache systems introduce nondeterministic latency that is increasingly problematic for modern workloads requiring predictable performance. Scratchpad memory architectures (as in TPUs) provide superior determinism by moving memory management decisions from hardware to software.