[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [oc] Beyond Transmeta...
----- Original Message -----
Sent: Tuesday, February 11, 2003 1:53
AM
Subject: Re: [oc] Beyond
Transmeta...
<snip>
> In practice, you would find it hard to
make a multiplier that would fit your
> purpose, also your logic would
switch many times, consuming more power than
> standard circuits, and not
to speak of multi-phase clock issues.
There is not much standard about the circuit. So
you invent a new serial multiplier
abcd x efgh
Becomes: (set typeface to courier)
abcd(h)+
abcd(g)+
pppppp+
abcd(f)+
ppppppp+
abcd(e)+
pppppppp
Where (x) indicates a conditional pull of the
bit stream through an adder
(else pull of 0's) The product is fully complete in 11 clocks, but available
for use after 1 clock. Note that the lsb of
the product is immutable after
1 clock, the 2nd lsb is immutable after 2 clocks,
... i.e. each bit of the
product is available for additional operations as
it emerges. Therefor,
if you were to incorporate the multiply above into
a multiply and
acumulate operation (MAC) i.e.
result = (abcd x efgh) + ijkl
Then the addition of ijkl can begin after only 1
clock tick of the bitstream.
Re: power. Could be much less than
conventional means.
The multiply requires 4 1-bit serial adders.
Each performing 4 additions.
Which is 16 1-bit cell operations, no latch
operations
The routing logic is not illustrated above so that
would increase power
consumption.
The traditional multiply would require perhaps 4
4-bit adder operations,
4 4-bit latch operations, 4 9-bit shift register
operations, (additional operations)
at least 68 1-bit cell operations. This indicates
bitstream could consume
1/4 the power of conventional means (at least
for this example).
Using the assumption that the bitstream can clock
at word width times
the parallel implimentation the traditional method
computes the MAC
((4 adds + 4 shift/latch) + add) x 4 or 36 clock times of the
bitstream
method. Not as good as the 50x as shown earlier.
Also note, as you go wider in word width the parallel method must
slow
down for carry propigation whereas the bitstream does not.
There are a lot of unknowns here so don't be
so quick to assume anything
about power consumption. A general rule of thumb
though is if you can
generate the same result with less work you will
consume less power.
> But even when leaving aside the
implementation issues, you have will problems
> with loops, function
calls and sw model, especially with PLD idea.
Why think in terms of loops and function
calls? Go out of the box.
Start with a clean sheet of paper.
> There is also problem of debugging.
Initial debugging would be done through emulation. Not unlike what you
do
now (synthesys). When the routing is proven then it would be
incorporated
into the larger project and tested again.
Jim Dempsey