This lesson is next and final step before we start to code. It is about decoding numbers and saving them into computer using IEEE protocols for standard and double precision. Normalization procedures are shown precisely and are really easy to understand.

**Display of Real Numbers by a computer**

Technorati Tags: C++, Programming, IEEE, Normalization, Mantissa, Standard Precision, Bit

Standard precision: 32 bits (4 byte)

Double precision: 64 bits (8 byte)

Real Numbers of Standard Precision

Declaration in programming language C:

**float**

**IEEE** (

P | for sign ( P=1 negative, P=0 positive) |

Characteristic | binary exponent + 127 (to avoid display of negative exponent) |

Mantissa | normalized (only one bit in front of a binary spot). |

**Example**: display of decimal number 5.75

** 5.75**_{10} = **101.11**_{2} * 2^{0} = **1.0111**_{2} * 2^{2}

Because normalization of every binary number (except zero) displays shape **1**.xxxxx, leading **1** is unnecessary. This is why the leading **1** isn’t saved into computer and is referred as hidden bit. This advantage provides us one extra bit of space, giving us higher precision possibility.

P =for sign = 0 (positive number)

Binary exponent = 2 K = 2 + 127 = 129 = (1000 0001)_{2}

Mantissa (whole) 1.0111

Mantissa (without hidden bit) 0111

Resault: 0 10000001 01110000000000000000000

or 0100 0000 1011 1000 0000 0000 0000 0000

** 4 0 B 8 0 0 0 0** (hexadecimal)

**Examples:**

**2** = 10_{2} * 2^{0} = 1_{2} * 2^{1} = 0100 0000 0000 0000 ... 0000 0000 = 4000 0000 hex

P = 0, K = 1 + 127 = 128 (10000000), M = (1.) 000 0000 ... 0000 0000

**-2** = -10_{2} * 2^{0} = -1_{2} * 2^{1} = 1100 0000 0000 0000 ... 0000 0000 = C000 0000 hex

Equal to 2, but P = 1

**4** = 100

Equal Mantissa, BE = 2, K = 2 + 127 = 129 (10000001)

**6** = 110

**1 ** = 1

K = 0 + 127 (01111111).

**.75** = 0.11

Special Case - 0:

Normalization of number 0 can’t produce shape 1.xxxxx

0 = 0 0000000 0000 ... like 1.0_{2} * 2^{-127}

Range and precision of Real Numbers:

In case of Real number of standard precision, characteristic (8 bits) can be somewhere in interval [0,255].

K = 0 reserved to display zero

K = 255 reserved to display infinity

While BE = K - 127, BE can be created in interval [-126,127].

Smallest positive number different than zero which can be displayed:

1.0_{2} * 2 ^{‑126} ~ 1.175494350822*10 ^{‑38}

and the biggest is:

1.11111111111111111111111_{2} * 2^{127} ~2^{128} = 3.402823669209*10^{38}

Precision: 24 binary digits

2^{24 }~^{ } 10^{x} 24 log 2 ~ x log 10 x ~ 24 log 2 ~ 7.224719895936

about first 7 digits are valid correct.

Display by numerical line:

Numerical mistake:

Not possible to use all bits while calculating:

**Example**: 0.0001

0.0001_{10} : (1.)10100011011011100010111_{2} * 2^{-14}

0.9900_{10} : (1.)11111010111000010100011_{2} * 2^{-1}

While adding, binary spots must be one underneath the other:

#.000000000000011010001101 * 2^{0} Only 11 of 24 bits!

+.111111010111000010100011 * 2^{0}

=.111111010111011100110000 * 2^{0} = 0,9900999069214_{10}

**Real numbers in double precise mode**

Declaration in program language C:

double

P | forsign ( P=1 negative, P=0 positive) |

Characteristic | binary exponent + 1023 (11 bits) |

Mantissa | normalized (52+1 bit). |

Range:

K [0,2047].

K = 0 reserved for display of zero

K = 2047 reserved for display infinity

BE = K - 1023

BE [-1022,1023]

Smallest positive number different than zero which can be displayed:

1.0_{2} * 2 ^{‑1022} ~2.225073858507*10 ^{‑308}

and the biggest is:

1.1111.....111111_{2} * 2^{1023} ~2^{1024} = 1.797693134862316*10^{308}

Correct: 53 binary numbers

2^{53 }~^{ } 10^{x} 53 log 2 ~ x log 10 x ~ 53 log 2 ~ 15.95458977019

near to 16 first numbers are valid.

There is also:

**long double** 80 bits

Characteristic: 15 bits

Binary exponent: Characteristic - 16383

**Real constants**

1. 2.34 9e-8 8.345e+25 **double**

2f 2.34F -1.34e5f **float**

1.L 2.34L -2.5e-37L **long double**

Technorati Tags: C++, Programming, IEEE, Normalization, Mantissa, Standard Precision, Bit

This Website is optimized for Firefox. Users browsing with Internet Explorer may encounter problems while viewing pages.

- Lesson 1
- Lesson 2
- Lesson 3
- Lesson 4
- Lesson 5
- Lesson 6
- Lesson 7
- Lesson 8
- Lesson 9
- Lesson 10
- Lesson 11
- Lesson 12
- Lesson 13
- Lesson 14
- Lesson 15
- Lesson 16
- Lesson 17
- Lesson 18
- Lesson 19

- Binary To Decimal
- Decimal To Binary
- Decimal To Hexadecimal
- Decimal To Decimal
- Hexadecimal To Decimal
- Hex, Octal, Binary

Daily Lessons for programming in Visual Studio, using C code.

sorry for delay...reupped lesson 2 with pictures for better understanding (float and double precision display)

I am searching for a site like this but has materials/tutorials for writing J2SE Java App. I will be grateful if anyone can help me

Nice short overview. Thx for that

the polarizer

Guess I'll have to look elsewhere for a programming tutorial. This one assumes the user already understands many concepts.

Well acctually it doesn't. Try following it from Lesson 1 (found on this website) and take few days for all tutorials. Go slowly through one lesson at the time and when you're stuck, just post the question, and I'll be more than glad to help you! It was written in a Way so you don't need to have any pre programming knowledge.

[quote]

It was written in a Way so you don't need to have any pre programming knowledge.

[/quote]

binary exponent, mantissa, real number and normalised are not terms that you often hear down the pub. I agree with chris, this is more of a reference manual for seasoned programmers than a way to introduce basic concepts to a newbie.

spent a couple hours over lessons 1 + 2 and cross ref Wikipedia for definitiions and further examples- all hangs together; thanks

I kinda have to agree with the others. I have spend hours looking over lessons 1 & 2 and i dot understand it. You need more definitions. Thanks for the other lessons though they are good.

Grath

Hi!

I'm looking at a pretty simple problem that involves reading numbers from a file and storing them in a different file with a small amount of procesing and in somewhat different order, but in IEEE standard integer and floating point formats. I'd like to use a compiler that already stores the numbers in the correct formats, so that in most cases I can just copy the bytes, but in a few cases some bit fiddling will still be needed. So the question is, which Windows XP based C and C++ compilers that suppport the IEEE standard number formats would you recommend?

Most of my programming experience is with DOS based Turbo Pascal, but I did one big application in DOS based C a long time ago, and have intended to upgrade to a windows based C compiler for years, but until now never had any applications (all small and simple) for which C would be better that my very old versions of Tubrbo Pascal. So now is the time to upgrade because the application would be much more convenient to write with a C compiler that already stored the numbers in the proper format. With the Turbo Pascal compiler, I would have to reformat every number, which is very easy for the integers, and not too hard for the floats, but why waste time this way? The switch to C is well overdue.