It is by design that the fragment offset is in multiple of 8 octet. In IP addressing 8 is a 'magic number' just like in base 10 arithmetic 10, 100, 1000 etc. In IP addressing, a character is stored in an octect which is 8bits
These days you can take 8 bits = 1byte = 1octet. It is not usually like this in the past.
In the second octet 5 bits are wasted, but by design it will still be used to store 7
The 8 octect multiple emanated from the 8 bit multiple for memory storage space on routers because a router is a store and forward device( it recieve packets in full,examine the header and forward.
Therefore a router that cannot handle 1500 MTU had to fragment it to 20 octet header and 1480 octet data. 20 + 976 = 996. 4 octet will be wasted because from the 1000 octet maximum 20 octet is taken for header remaining 980 octect, 980/8 = 122 remainder 4( the remainder 4 will be wasted because is not up to 8 octet). Remember, by definition " The header length is an integer that specifies the length of the segment header measured in 32 bit multiple" . If you run a network sniffer on a network which gives you the value of say 7 as the header length( this is usually in hexadecimal). It actually means 7x32/8 = 224/8 = 8
bytes. The role of the header length is to allow the receiving computer know where the data portion starts in the packet transmitted. since IP header+TCP header+Data= MTU
also, TCP header + Data = Segment, and Data = Maximum segment in multiple of 8
Finally IP header + sement = IP Datagram