顺便查了下,AMD的CMT集群多线程(模块化的学名)技术及推土机架构的相关专利还真是多啊:
clustered multithreading with 2 int clusters with each of them having:
2 ALUs, 2 AGUs
one L1 data cache
scheduler, integer register file (IRF), ROB
(see 20080263373*, 20080209173, 7315935)
a trace cache, not to make cheaper decoders but to quickly recover from a mispredicted branch (7197630 and many others)
read port arbitration for a faster IRF (7315935)
shared FPU supporting ADD, MUL, FMAC etc. and 64 or 128 bit max. operand width (20080263373)
FPU may run in full bit or reduced bit modes to save power (20080209185)
32 byte fetch, 4-way Decoder - multithreaded round robin or depending on queue saturation (20080263373, EP1244962)
fine grained power management (token based, 20080263373) for optimal usage of given TDP/ACP
a lot more speculation (data speculation, cache way prediction, see 7024537, 7028166 and many others)
2 loads from L1 D$ per cycle per cluster (7502914)
maybe 2 cycle effective L1 D$ latency instead of 4 thanks to replaying (7502914)
possibly a shared L2 (7502914)
loop detectors (7130991)
dynamically scalable cache architecture to save power by switching off cache portions or levels (20080104324)
AMD's turbo mode (running cores faster if others are less utilized, 7490254, filed 2005/08/02)
链接在此:
http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=136642&sid=7684996c4331959e9b922b0352ad9116