Sunday, February 24, 2019

Computing - Binary, Hex, and Digital Forensics

While job hunting, I'm also working to brush up on/expand my knowledge in certain areas, working towards getting a certification and the like. I was thinking that I should try to put what I know in my own words, to see if I've understood the concepts well enough.

I suppose the best place to start is with binary.

Computing, for most people, is a bit like driving a car. That is, we all know how to put the key in the ignition, start a car, turn the steering wheel (to direct the car), press the accelerator and the brake, select forward or reverse, and so on and so forth... but most people don't necessarily understand what is going on behind the scenes. What happens when you press the brakes, or accelerator, or turn the wheel.

By the same token, we all know how to turn on a computer (or smartphone), how to surf the internet, how to point and click with a mouse or type with a keyboard... but we don't necessarily understand what the computer is doing behind the scenes to make the magic happen. How does it know what a particular mouse click means? Or how to interpret what we type in a keyboard? How does it know what to do when we type a URL in an internet browser?

The very, very basics come down to binary. Most computers use electricity, of course, so it's all tied to electrical signals. On or off. (You can actually create computers using other means, so long as you have some way of indicating 1 or 0... like punch cards, which can be punched or not punched. It's just that electrical signals can be fast, and we can create electrical circuits that do the calculations with a very small amount of space.)

The computer gets a series of 1s and 0s, and it knows how to interpret them based on computer architecture and the efforts of previous computer scientists to build a framework for interpreting a particular sequences of 1s and 0s. In some cases, that series of 1s and 0s might indicate a particular action to take (the machine language of that particular system), or a memory location where the data is stored, or the actual data stored at that location.

Behind the scenes, your computer is processing all sorts of 1s and 0s to store numbers and act on numbers, and it can process thousands of instructions every second.

The words I type here can be coded in binary (ASCII says that certain strings of 1 and 0 refer to specific letters. So the word 'word' can be coded as 01110111 01101111 01110010 01100100.) Again, computers build upon all the work that came before, and things like ASCII were created so we had a consistent way of coding 1s and 0s to indicate letters. (There are other encoding formats, like UTF-8 and UTF-32. Since computers are used for languages that don't use the Roman alphabet, it's important to have codes for all the other possible symbols.)

The machine needs to know how to interpret any particular series of 1's and 0's, whether it indicates a letter, or Chinese character, or number, or instruction, or memory location.

We have problems understanding binary, of course, so we have ways of making this more human readable. (Hexadecimal, or a base-16, can seriously reduce how much space it takes to write binary. We're most familiar with decimal, of course, but using 'A' for 10, 'B' for 11, and so on until you get 'F' for 15 is convenient. It's basically because four binary numbers can be represented by one hexadecimal number. Since computers are often groups in sets of 8, or a byte, two hex characters can represent a byte. So 'word', in hex, is 776F7264. 77 is 'w', 6F is 'o', 72 is 'r', and 64 is 'd'. It's useful for a variety of reasons, and very basic machine instructions are often shown in hex instead of binary because it's easier for humans to understand... just remember that the computer still sees it as a string of 1s and 0s.)

Digital forensics... well. Every file you have is stored somewhere as a series of 1s and 0s, with some additional information (the header, for example) that helps the computer understand what that sequence of numbers means. The header might have a string of 1s and 0s indicating it's a jpg picture, or docx word document, and in hex that header would be 'FF D8 FF E0 00 10 4A 46 49 46 00 01 01' for jpg, or 'D0 CF 11 E0 A1 B1 1A E1' for a .doc file.

When you delete a file, the computer doesn't erase all those 1s and 0s. It just changes a bit (a bit being one digit that's either 1 or 0) to indicate that that space is now free, and if/when you save something else it *might* decide to save it where the old file used to be. (Actually, anyone who has deleted a file to the trash bin knows there's two levels of 'delete'. The first delete makes a small change that puts it in the trash bin, but you can still 'restore' the file as the computer won't try saving anything where that file exists. When you empty the trash bin and 'permanently' delete it, the computer now considers that space fair game.)

That's part of how digital forensics works... it can find the series of 1s and 0s still in existence. If all that was changed was the bit indicating whether the space was available or not, it can undelete the entire file. So long as it's not been written over already. (There's more to it than that. Sometimes something is saved over part of the previous file's data, so you might lose the beginning portion and still be able to understand what the rest of the file held. And if the information was overwritten in a known process, you can reverse the process to recreate what used to be there. I now understand what some of the IT guys in the military were talking about when they digitally 'shred' something... they're basically running a program that repeatedly overwrites the data with 1's and 0's so that it can't be reversed and the data can't be recreated.)

Encryption encrypts those 1's and 0's in a systemic way, so that (if you have the key) you can reverse the process and get the original information... but someone without the key can't.

That's a very, very simplified explanation that doesn't even go into what the CPU does, much less how this works with websites and internet connectivity.  I'll post more another time.

No comments:

Post a Comment