Monday, October 30, 2017

What I've Learned in School

The more I learn about computers, the more I realize how much there IS to learn.  And the more I appreciate the hard work of those who've come before me.  It must have taken thousands upon thousands of man hours every year to get where we are today.

This post is just typing out some of the things I've been piecing together, since I think it will help clarify some things for myself.  So as not to get too bogged down in technical details, I'll just warn any readers that people can write books about each topic here, so this is definitely not an in-depth look at any of it.

The heart of computing is tied to simple electric switches.  Think of a light switch to your living room - it's either on or off.  Yes or no.  0 or 1.  It's all...binary.

All computing comes down to turning millinos of tiny little switches either on or off.  Which is why it's so freaking amazing that we have display screens, graphics, video games and more...all from a series of switches turning on and off.

Hardware (circuits, memory cards, etc) has to do with the physical ways we have of determining what switches are turned on and off.  How your mouse or keyboard sends signals telling the computer to change some 0s and 1s on or off.  Your computer has logic circuits, for example, which may look at two different locations and decide whether to turn a switch on or off based on what it sees.  (i.e.  if something is true you turn a switch on, and arbitrarily call it '1'.  If it's false you turn the switch off, and arbitrarily call it '0'.  If you only want something to occur if two items are true, you check to see if both are switched on...and only if both are on do you turn on another switch.  x AND y turns on only if both x and y are on.  x OR y turns on if either x or y is on.  x XOR y turns on only if one or the other is on, but not both.  Hence logic circuits can allow us to do complicated things based on whether switches are turned on or off...and can mimic our more formal logic.)

But 0's and 1's are really hard to work with, and it's too easy for us to make mistakes.  So we came up with a number of things.  We came up with some arbitrary sets of 0's and 1's to represent letters, for example (like ASCII and Unicode).  And a way of telling the machine whether the 0s and 1s indicate numbers or letters.

We came up with something called Assembly Language, which allows us to tell a machine what to do without having to translate it all into 0s and 1s ourselves (the machine does it for us.)  Assembly Language gets into the nitty gritty details of your computer, and how it does what it does.  It means you have to tell it every little step of the way.  Like giving extremely detailed instructions on how to drive your car to the store.  You can't just say "drive down the street until you reach the third stop sign, then turn left."  You have to tell it how to put a key into the ignition, how to put the car into reverse, how to step on the gas...all the tiny little details it takes to do this thing we call 'drive a car'.   

This is interesting stuff, though, since I think that's where we get most of our security exploits.  Oh, assembly language is also slow and tedious (though less slow than 0s and 1s), and we have even higher programming languages that are easier to use.  We're learning some of them in my computer science courses.  But figuring out how a piece of malware takes over a program and depends more on the mechanics going on at assembly level.

So then...

When you run a program, the central processing unit sets aside some places in memory for everything the program needs.  You've got some identifying information, the code itself, initial variables, uninitialized variables that will be set during the running of the program, and a couple of interesting things called the heap and the stack.  These things are stored in specific locations.  Many programs require you to identify what sort of variable you are using (integer, character, etc) ahead of time so that the computer can reserve enough space to store that information when the program runs. 

The heap and stack change size as needed while the program runs.  The stack in particular makes things interesting, since it allows a program to do some pretty complicated things.  See, every time you say "compare x to y, and if x is ____ do _____" it requires some complicated steps.  First, you have to know where x is and where y is.  Then you have to do the comparison. 

So x and y get stored in registers, and there's special arithmetic logic unit that does the comparison and returns the result.  The computer looks at that result...and often the result requires jumping to a new series of instructions.  There's something called an instruction pointer that helps the computer keep track of where it is.  It stores the memory location for the next instruction.  If the program says to go to another set of instructions, the pointer will put in the new location for that instruction, and the program processes the instruction at that location.

I'm still learning a lot about how it all works, but the security flaw that sort of clicked in my head was a buffer overflow.  If I understand what's going on correctly, what happens is that (unless someone is careful with their programming) any time you ask a user for input and the user provides something larger than what the computer expected, that information will fill the location set aside when the program was initialized and spill over into the next memory location.  'Overflow'.  If you know where you are in the program, and what it's overflowing into, you can overwrite the values already at that location.  If those values happen to be instructions for the program to run, you can give the program new instructions.  You can basically tell the instruction pointer to jump to a totally different location where you've coded in malicious instructions (or just overwrite the existing instructions entirely, which will likely cause an error whenever that program is run...so it's kind of obvious.  Better to make a small change pointing the code elsewhere, then return to the original program and run it like it's intended.  Then the user won't notice anything particularly wrong.)

There's a lot more to it than that, of course, but it's nice to start making some connections out of all the stuff I'm learning.


No comments:

Post a Comment