Monday, March 11, 2019
Interesting Evolution
Saturday, March 9, 2019
Computing - Malware, Phishing, Etc.
Your business operates in a gated community, where the front office is the only 'address' the internet has. The front office routes all the messages it receives in accordance with the various rules and protocols it's told to use, forwarding most messages to the fulfillment center managing customer requests and the rest to various employee residences as needed.
Each building - the front office, fulfillment center, and residence - has a robot maid (called Rosie, in honor of the Jetsons). Rosie comes in all sorts of makes and models, and you can add attachments as needed. There's danger, though... as someone familiar with the particular make, model, and/or attachment may be able to program Rosie to act in unexpected ways. She may take pictures of the files in your filing cabinet, or make copies of all your letters and send them on, or set a particular window to receive commands from another location.
I want to discuss that process a little bit further. See, malware has two separate stages. There's the method of infection (the vulnerability that allows someone to install malicious software on your computer) and then there's the payload, which is what you want the malware to do once it's there. The payload has more to do with the attacker's intent, whether it's copying files or encrypting files until you pay a ransom or turning your computer into a 'bot' that checks in periodically for further orders.
Any method of delivering malware can be used for any of those purposes, and more. Different payloads will also give off different indicators that there's malware on your computer. If Rosie is communicating with a command and control center somewhere, there may be logs of that activity and ways to detect it... though you'd have to know how to spot it. (That's part of why I want to play with an IPS/IDS. See what it logs, what sets off an alert, etc.)
But the malware has to get to Rosie first, get through the firewall to her, and she has to read it and be affected by it.
There's a bunch of different methods for doing this, and I by no means claim that I know them all. I will say one of the biggest/best methods is to get someone inside the firewall to connect to you. Phishing attacks, watering hole attacks, and the like.
Phishing attacks, at least when someone is deliberately targeting a business, are not the laughable attempts you'll find in your spam folder. They probably did their research (the video I posted earlier has some great examples of this, though it was done after they were already in the network.)
They may look you up on Facebook, or LinkedIn. May try to figure out your home address, your pets names, your children's names. They'll try to guess your passwords and user name. They may go dumpster diving for any information they can use against you.
And when they craft an e-mail, it will be practically indistinguishable from legitimate mail. If they're trying to get you to install something malicious, they'll create their malicious software and find a way of wrapping it in something legitimate. If they say it's an ssh application, they'll attach it to the e-mail and it'll open and act just like an ssh application ought to. Or they'll add a link to an e-mail, but modify the address so that it goes to the computer they're waiting to hack you with.
They can apparently also use macros, and images, and other things. It's part of why they advise you not to let your e-mail give you a preview (since doing so involves opening the e-mail) and not to let it display images.
Or they research your company, and notice that a lot of your employees go to a specific website outside the firewall (to check the news, or stock prices, or sports... or even something work-related.)
So they set up a watering hole attack, using that website to launch an attack when your employees visit it.
Or they drop some infected USB drives in a parking lot, and wait for someone to find it and plug it in.
I'm still learning all this myself, but I've heard a couple of penetration testers say that they have never failed to penetrate the defenses of their target.
Think about that for a second.
Anyways, to bring this back to my analogy... let's say that a malicious letter was somehow wrapped up in a legitimate letter. It gets to your home through one method or another, and it goes straight to Rosie (unless it's caught by an anti-virus program before she can read it). If she hasn't been updated or modified since the vulnerability was found, then the payload is delivered and she follows whatever she's instructed to do.
Once Rosie's affected, she can also be used to send messages to her neighbors, affecting them as well.
Eh... I'm not too happy with this post, I feel like it didn't quite go where I wanted it to. But it's getting late, and I've got stuff to do tomorrow. Not sure how much I'll post next week.
Friday, March 8, 2019
Computing - Server Side Cyber Threats
Though really, it's probably more like 'defending a gated community'.
Picture your typical business as a gated community, with a front office (where mail is received and sorted), various different homes (where employees do their work), and a fulfillment center (surrounded by another wall) that receives and responds to client requests.
I am drastically oversimplifying this, since businesses vary greatly in how they configure their internal network - perhaps putting all the HR people in one walled off section, all the R&D people in another. Or having one building for receiving requests, and sending valid requests on to another building for fulfillment. That's not even getting into how a large company with multiple work locations would be configured, but I'm trying to keep it simple.
So then. Each house, the fulfillment center, and the front office all have their own 'Rosie' to receive and send messages. Rosie comes in all sorts of different makes and models, and if you know her model well you may even know of some 'tricks' that will get her to act in unexpected ways. You might send her a letter with keywords that make her start taking pictures of rooms in the house, or to open up a new window or door for a specific type of message traffic, or initiate some messages of her own.
The postal service delivers all mail to the front office in part because they don't actually have any addresses for anyone inside. It's of like how a large business might need someone to sort mail to a specific building or room, though if I'm going to maintain my previous analogies I'll just say that the robot maid in the front office assigns certain windows or doors to a specific home or building inside the community, and knows that any letters delivered to that location (i.e. door number 1470) goes to house number 42 in a process called NAT, or Network Address Translation. This developed in part because we were running out of IP addresses, and needed a way for computers to re-use some numbers. It's part of why if you check the IP address for your computer it's most likely to start with 192.168.1.X.
Anyways. What this means is that anyone outside the business has no real idea how to reach any of the homes inside the business unless they are directly responding to someone's messages. There's what's called an 'attack surface' of publicly facing connections - generally the front office and fulfillment center (since all the customers need to be able to reach the fulfillment center in order to make their requests.) Individual employees do offer up a target as well, but it's mostly because they're using their computer to access sites outside their place of work - gotta check those ESPN scores, you know? - and get responses in turn. (Okay, I'm halfway joking about ESPN. There are often legitimate work reasons to access outside websites... but you can communicate with everyone inside the gated community without going through the front office.)
All right, so what happens next? Well... it depends on a bit on what you're trying to do, tbh. This is also, btw, where I feel my inexperience shows the most... I'm trying to put into words my understanding of the process, but I may get it wrong. Feel free to comment with any corrections.
Let's start with one type of Denial-of-Service Attack, and it's close cousin a Distributed-Denial-of-Service attack. There you are, at the front office, when suddenly the mailman starts delivering bags and bags of mail. Much like the scene from Miracle on 33rd St, it overwhelms the front office and they wind up throwing away a lot of mail, including legitimate letters.
If it's a straight up DoS attack, you might notice that all the return addresses are the same, which makes it easy to tell Rosie to just throw them all away.
If it's a DDoS attack, the return addresses are generally different. Generally speaking, someone sent letters to a bunch of different robot maids, telling them to send letters to your address. It's called a botnet, and all the different return addresses indicate different robots sending letters as ordered.
Certain kinds of DDoS attacks will all be the same basic form, the color and/or size of a specific type of message (like an ICMP flood). There's still some things the front office can do to help filter it out, and they can also reach out to their nearest post office for help managing the flood of traffic, but it generally requires a bit more management than simply telling Rosie to throw out all the letters from x.x.x.x address.
This sort of attack will disrupt your business, but it's not really getting inside your community.
If you want to gain access to the businesses database there's probably two broad avenues of attack (I've never heard anyone put it like this, though, so maybe not). Basically you can do what we discussed earlier, where you manipulate the input at the company's website meant for use by a legitimate customer or you can try to take over the 'Rosie' that manages that particular building.
If you attack through the website, your message traffic looks superficially legitimate and will probably pass through the front office no problem. Then it gets forwarded for processing at the fulfillment center, and it's success depends on what sort of processes occur there. You're not really taking over any computers, you're just submitting a request in such a way that the fulfillment center will respond with information you're not supposed to have.
The other option (and the same sort of option you'd probably use to target the R&D section, or some other part of the business) is to try to reach a Rosie inside. Taking over the computer, or taking over 'a' computer and then pivoting to take over neighboring computers until you finally get the access you want.
This option can be used for all sorts of things, whether it's turning every Rosie into a member of a botnet, or getting Rosie to change information in the filing cabinet, or asking Rosie to take pictures of all the files and send them out to another location, and so on and so forth. You might even just tell Rosie to do nothing, to periodically check a particular window for messages.
But getting Rosie on your side... well, I'll get to that another day.
Tuesday, March 5, 2019
Cybersecurity, a Video
Came across this video from the NSA on Advanced Persistent Threats, and it fits in nicely with what I've been posting about.
https://youtu.be/bDJb8WOJYdA
I really liked what he said about reputation trackers, because I was thinking about how hard it is for the average user to really know whether a site has put the effort in to secure itself.
Take sites that use login or credit card information in the head of the request (i.e. Like putting that info on the outside of an envelope).
The average user won't know that. At best, they know to look for the 'https' and/or lock on the browser. They don't know how to look at the message traffic or site code in order to tell what's really secure or not.
There's a million different ways to write a program. Some are better than others, but if it gives you the functionality you want how can you really tell?
It takes extra lines of code to validate user input and make sure nothing fishy is going on. It takes knowledge about which methods are more secure than others. But you won't necessarily see which companies put that effort in.
A reputation tracker might be useful for that.
Computing - Broad Overview of Potential Insecurities
Anyways. I wanted to pull together some of what I've covered, and how it relates to computer security. This is going to be a mile wide and an inch deep. For everything I touch on there are measures, and counter measures (and counter-counter measures), so if something piques your interest definitely look into it on a deeper level.
I used the postal service analogy before, and I'm going to stick with it (and maybe even expand upon it when I get to the server side of things).
Picture the many, many ways a handwritten letter could be insecure.
Someone might be reading over your shoulder while you write it. Or they may enter your office and look at it before you mail it off. Or they may intercept it anywhere along the way and open it.
You can take countermeasures to protect your information of course - like writing the letter in code, for example - but our ability to 'sniff' traffic makes it ridiculously easy to read at least the mailing information on the outside of your envelope.
And certain websites may be so ignorant (or lazy) that they put sensitive information - like your username, password, or credit card information - right there on the outer envelope.
Wireshark is a tool that passively collects information on all the letters passing through an area. You can filter and search through that traffic for all sorts of information...
And btw, there are all sorts of tools that allow you to sniff traffic, whether over a physical wire or off the wifi. Such tools are not necessarily bad in and of themselves, it's just that they can definitely be used for malicious purposes.
Even if sensitive websites (like banks, or online retailers) use https, and have that little green lock on the top indicating all the messages are transmitted in some sort of cipher, if you use the same password on every other site all it takes is one insecure site and a hacker can find out what other sites you've visited and try using your password there.
That's not even including efforts to misdirect your mail to a fake website, or add malicious software to an ad on a legitimate website, or cross-site scripting (which I don't fully understand yet, but hope to get into in more detail later.)
Which means that there are multiple cases where your information can be stolen, well before you've even reached any particular website. Talking about what happens there is going to take way more work, so I'll get into that next time.
Monday, March 4, 2019
Computing - User Input, Web Applications, and Security
I've also been reading a rather excellent book on how hackers access information through a business's particular web site - The Web Application Hacker's Handbook, which explains so much about why/how security is so complicated these days.
The issue comes back, yet again, to user input. See, since computers use 1's and 0's for everything, the only way they know whether a sequence of 1's and 0's is supposed to be a number, or a letter, or a location, or an instruction is because of the context.
Most computer science programs will focus on teaching their students at least one programming language, in part to teach you programming logic. There are some slight differences (i.e. object-oriented programming and whatnot), but the basics are fairly similar.
It's the syntax that changes. Different languages have different ways of telling the computer when an instruction ends. And so in some languages we use a ';' to indicate the end of one line of instruction, so the computer knows when to stop. And if you go to your web browser and select 'web developer' or somesuch from the viewing options, you'll see the code for the webpage you are viewing. It will probably have something like <head> and </head> or <body> and </body> to indicate which text is part of the head, and should be formatted as indicated elsewhere for headers, and so on for the body. Note the closing '/' to indicate the end of a section.
It may seem overly technical to anyone not in computers, but bear with me.
A very common 'first program' in any language, is to print "Hello, World!" to your screen. In Java the particular line of code would be -
System.out.println("Hello, World!")
Running a program that includes that line will give you 'Hello, World!'. But let's say you want to add another line, "How are you?" You could enter
System.out.println("Hello, World! How are you?");
And the result would look something like "Hello, World! How are you?"
But what if you want it printed the second half to print on a new line (entering a carriage return, in old typing terminology.)
There's code for doing that, but the computer reads everything within the quotation marks as letters and prints accordingly. So you need an escape character, something to tell the computer "Hold up, wait. This needs to be processed differently."
In java, you can use the '/' as the escape character, so if you said
System.out.println("Hello, World!/nHow are you?");
It would print something like:
Hello World!
How are you?
Escape characters are actually kind of important, because that's how a hacker can tell the computer to process their input as commands rather than simple text.
I've only just begin to read the book on web hacking, but the first few chapters easily conveyed just how difficult it is to validate user input.
For example, a hacker might try typing <script> to do something (presumably start a script? Looks like the type of code you see in web pages, like those <header> bits, or xml code).
A business may code their application to remove all <script> instances from user input. But what if they type in <scr<script>ipt>?
Now if you take out the <script>, it collapses the rest of the line into another <script>.
To add to the confusion, recall that every character can be represented by numbers (i.e. ASCII coding format)... so the hex number 27 can be read as an apostrophe, or 25 can be read as a %, 3c can be <, etc.
If you know what language is being used, and how the input is being read... to include the removal of certain characters... you can come up with a particular line of code that will get through all the filters and do something unintended.
Apparently, many businesses use multiple different tools to create their web applications. And if those tools use different programming languages, and react to different escape characters, then there's no one-size-fits-all way of checking user input for hacking attempts.
Saturday, March 2, 2019
Relevant Article
Friday, March 1, 2019
Computing - the Internet, Cont.
So I'm going to talk about databases for a bit. It's relevant, I promise, and I'll try to turn it into a personal story of sorts.
When you grow up with technology, but not a techie - in my case, at least - you just sort of absorb odds and ends as you come across them. Databases have been like that for me, a topic that started impinging on my awareness about a decade ago.
It's not that I didn't know the term, or have some general sense of what it meant, but when a colleague of mine decided our reports would be easier to work with in a database rather than the excel files we'd been using, I started learning more about when and why their important.
And the key to understanding the most common type of database is to understand relationships. That is, if you wanted to create an excel file for customer orders you would probably have to enter in the customer name and address for every single time that the customer placed an order.
If it's in a database, on the other hand, you can have a customer table and an order table, and establish a relationship between the two (i.e. a customer id number). In the customer table you only have to enter the address once, but because it's associated with the corresponding customer id in the order table you can easily create a table showing all the orders that customer has placed. Including the mailing address.
Businesses use these all. the. time. That way you can have a table showing all your employees, and a separate table showing all the wages paid, as well as a table for all customers, and a table for all the orders they placed, and a table for all the bills charged to the customer... and businesses can run queries against that database to get the information they need. Like a history of all the orders a particular customer placed, or an aggregate of all orders from a particular region, or all the customers who owe the company money.
Although I've worked more with Microsoft Access than the big enterprise databases (Oracle, MySQL, SQLServer, etc.) but most have a similar structure for querying the data - Structured Query Language (hence the 'SQL' in the databases above).
SQL statements are actually fairly obvious when you keep them simple, though they can grow to be horrendously complicated. A basic query might look like this:
SELECT * FROM Users WHERE Name = 'foo' AND Pass = 'bar'
It's basically saying you want to select everything from the Users table where the name is 'foo' and the password is 'bar'. (The * is a wildcard, it means select everything that matches the criteria)
So why is this important?
Well, websites these days are interactive. You don't just download a page and read what it says. You login - to check your bank account, or the news for a site you've subscribed to, to check your friends on Facebook, or post to Instagram, check your employee benefits, request time off, or any number of things online - and that login is generally used to find your specific information in a database.
When you go to a website, well... to use the postal service analogy, that site is getting thousands and thousands of letters a day, and it is passing your particular information to a warehouse (the database) that processes your request and sends back the information you wanted.
The website and database together have to a) keep track of your mail vs. all the other user's mail, so you don't have to send your username and password every time you send a letter and b) make sure that you only see your information, it doesn't send you someone else's response or let someone else see yours.
That means businesses operating over the internet generally have a way of tracking your session (that's where cookies come in to play, though it's not the only way of doing so) and some sort of system for authenticating you as the user and granting access to the information you are authorized to see.
And most of those have to deal with the exact same problem I listed when describing the buffer overflow - user input.
Thursday, February 28, 2019
Computing - the Internet, Cont.
A couple of points need to be made here - we remember names better than numbers, but computers run on numbers. Binary, to be exact.
When we put 'www.google.com' into a browser, we are sending a request to whatever computer hosts the file for that site, and asking them to send us the file.
However, it's a bit like putting 'New York, NY' on the letter... and then the post office asks us for the full and complete zip code, to include the final four digits (the zip might be 10004-1007) In computing, that's actually the IP address... which is really just a fancy way of making all the 1s and 0s a little more human readable. (254.171.170.75 is just a fancy way of saying 111111101010101110101010010010111 by breaking the binary into groups of eight and converting to decimal. Since 11111111 is 255 in decimal, each grouping can store a number between 0 and 255. They started running out of numbers as the internet boomed, and came up with a few workarounds to help extend the system for a little while, but ultimately came up with IPv6 - or Internet Protocol version 6 - as a better solution. IPv6 numbers are in hexadecimal, so you'll see something like FE80:CD00:0000:0CDE:1257:0000:211E:729C, which is again a fancy way of organizing a string of 1's and 0's)
So the computer needs a number, and the address you entered is not it. So the computer has to have a way of looking up what number is associated with that address, sort of like my zip code finder above.
In computers, the entire process relies on what's called a Domain Name Server, or DNS. Most of what you can find online likes to describe DNS as something like a telephone book, but I used zip code for a reason.
Let's say you go to the USPS zip code finder and enter an address in New York, NY... but someone's hacked that site, and it gives you the zip code 92101-1007. The post office routes it off the zip code, so that letter is going to San Diego, CA. You may have intended to send it to New York, but there's this thing called 'DNS poisoning' where hackers will give your computer an address that isn't the one you intended. If they're really good, they'll even make the site look exactly like the site you expected. It's just, well, it might record your login information or trick you into downloading something malicious.
So you entered an address in your browser, and your computer looks up the number associated with that address. It then sends out the letter(s) to your destination, asking for the web page. Your destination receives your request and serves a response, sending a stream of letters back to you.
That's a very rough idea of what's going on, and I'll dig into it a bit more...
Another time. I've been messing around creating VMs (virtual machines) this morning (CentOS installed fine, but why oh why does my Ubuntu say there was an unresolvable error? Ugh! And I need to find out if I have any old Windows software lying around. Even though I'm just creating some VMs to practice my skills with, virtual machines are computers in their own right. They use the keyboard, mouse, CPU, display screen, etc. that the physical computer does. See, your CPU is amazingly powerful... and if you're not using it, it sits idle. Once you've downloaded that webpage and sit there reading it, there's not really a lot for your CPU to do. Some genius realized that you can make computing more efficient if you kept that CPU humming... so putting multiple virtual computers on one physical computer was one way of doing that - and provided some additional layers of security and whatnot. Anyways, I'm not really intending to use the VM as a Windows XP or Windows 7 machine, but it could be used that way and the software is proprietary.)
Anyways, I don't really feel like delving further into internet basics.
Wednesday, February 27, 2019
Computing - the Internet
Anyways. Today I wanted to talk a bit about the internet. The usual place to start is with a rather technical discussion, but I'm trying to translate these concepts into everyday language, so I'm going to try something different. Understand that I am vastly oversimplifying some things, and skipping over a lot of detail here.
With that out of the way, let's talk about the internet.
Actually, let's start with something most people are more comfortable with - tracking a package.
Whether it's USPS, DHL, UPS, or Amazon, every carrier will show the progress of a package as it moves from it's source to it's destination. I think most people understand what is going on here without thinking about it too much, either.
That is, the carrier isn't going to send your mail directly from one place to another. There's too much mail going to too many places and if you need that to happen, you'd hire a courier to do the job.
Instead, the post office (and other such carriers) regularly carry mail to nearby facilities, and will route everything they receive to the facility best suited for sending that mail on to it's final destination. As you can see in the picture above, the package goes from Chicago to Indianapolis before going to Fort Wayne. Indianapolis is a bit further south than Fort Wayne, so routing it this way added a bit more time than sending it directly to Fort Wayne, but there probably isn't enough mail going to straight to Fort Wayne to warrant a direct connection... so Indy was the closest place along the way (and much closer then sending the mail to St. Louis!).
It's all done for the same reason airlines have hubs, and it's difficult to get a flight straight to your destination unless you live at one of those hubs, and are going straight to another hub.
The internet is a bit like that, as well. When you get on an internet browser and put in a location, the page you're looking for is not floating out there in the digital aether. It's sitting on a computer somewhere. (Okay, so it might be sitting on a virtual machine, with twenty other virtual machines... but those are all sitting on a physical machine somewhere. Ultimately, everything you do on the internet is tied to a real, physical place.)
If you want to go see funny cats, and you put https://icanhas.cheezburger.com/lolcats in the browser, that /lolcats indicates a subdirectory where the file is stored, much like you might find a picture by going to C:\Users\Yourname\Pictures.
Your internet browser will send a request for the webpage to the site (called a 'server', since it serves webpages to whoever asks) and that site will send you the information you requested. The browser knows how to read the result and present it to us with all the fonts, colors, and pictures the creator intended.
Although the internet works a bit like the postal service, there are also (numerous) key differences. For one thing, a post office generally stays open for years. Decades even. Carrier routes are somewhat stable, and any significant changes tend to be advertised well in advance.
In contrast, routers may appear and disappear on the internet in the blink of an eye. Someone may add a router in one location, and another might break down and go off the net. So the routers, or 'sorting facilities' of the post office, regularly send out messages to their neighbors going "Are you still there?"
And whenever a new router comes online, it sends out messages to all its neighbors saying "Tadah! I AM HERE!"
The sorting facilities constantly have to update their routing information to take into account any sudden loss of connection, and they have to sense whether their neighbor is getting flooded with mail so they can lighten the load a bit and route things to a different facility.
There are other differences as well, such as size limits. If you wanted to send a letter, or mail a package, most of the time you can do it all at once. And when you send it, you can send it off and forget about it, relying on the carrier to get it where it needs to be. (Like UDP, an internet protocol that fires and forgets.)
That won't work for certain things, though I'm editing to say video is UDP, because waiting for confirmation slows it down too much. Web pages and file transfers are TCP, since you want to make sure everything arrives.
So you need mechanisms for keeping track of what order the letters were sent, you need to verify that all the letters were actually received correctly (and that any missing letters are resent), and the letters ought to be sealed so that you know if someone got in and tampered with it.
And so on, and so forth. You can get into the nitty gritty details on how this is managed if you want, there are plenty of resources out there.
Monday, February 25, 2019
Computing - Hacking, Buffer Overflows, Etc
Hacking, I've come to learn, doesn't necessarily mean anything malicious. It's more about a mindset... about knowing how the system works, and how to manipulate it in order to make it do what you want. Even if what you want was not the original intent.
But what does that mean, really? Well, early programs sort of took it on faith that people would use them the way they were intended... so most programmers didn't code ways of catching such errors.
A lot of programs are interactive - they ask us for our user names, or birthdate, or what our favorite fruit is, or what command we want to give to a character in a computer game.
And here's the thing - the programmer doesn't necessarily know what we're going to type. If it's asking for our favorite fruit, it could be 'apple', or 'banana', or something more obscure like 'jackfruit' or 'starfruit'. The program has to be prepared for whatever we type in...
And sometimes, well... people make mistakes. Typos. They might be asked for their favorite fruit, and instead hit the 'ENTER' key without typing anything. Or accidentally type ap4le. Or bannnana. Or 42.
And sometimes, someone who knows the system very well can deliberately type in something else. For example, the program might have reserved enough space for whatever they consider appropriate for a fruit. Maybe there's enough space to type jackfruit_____ (giving extra room, just in case.)
But what happens if someone types in jackfruit_____gotomyvirus?
It's possible for everything past the buffer, everything past the space reserved for that user in put, to overwrite whatever happened to be at that location. It's still all in binary, of course. So it's as though the computer had said 011001010 and now says 010010110. But if that location was, oh, let's say instructions for the next step of the program... and the new instructions basically tell it to go to another location and run a bit of code that does something malicious (i.e. 'go to my virus'), then congratulations. Your program has a vulnerability that has now been exploited and used to launch a computer virus.
Computers only do what they are told to do, no more and no less. If it thinks a program is telling it to go to a location, it doesn't know that the command was altered... it just follows the instruction and goes to that location. And if that location includes a list of other commands, commands telling it to start a remote trojan that takes over your computer, or secretly record everything you type, or run a program mining for cryptocurrency, or check for orders from another computer and send multiple information requests to a designated website... well, it's just doing what the instructions said.
There are countermeasures, of course. Secure coding practices that check user input for things like this, and/or cut off anything that goes beyond the allotted space.
This also is just one example of a vulnerability, to give readers a sense of how it works.
(It's also just one area of this vast realm we call 'cybersecurity' that I'm interested in. Not so much finding those vulnerabilities and exploiting them - though there are people who make a living finding such things - so much as looking at a virus and figuring out how it does what it does. I think I'd have to set up a virtual machine, get some code I suspect is malicious, and then watch/monitor the actions it takes when it runs. I'd probably need something that breaks down those actions into machine language, and have to understand that language well enough to go looking for where the instructions are altered. Figure out where something is entered to overwrite what was supposed to be in the program and put in new instructions. And it'll look different for the various vulnerabilities, too. Plus there's an ongoing cat and mouse game to this. That is, anti-virus programs can identify sequences of 1's and 0's that indicate something malicious is going on, so hackers sometimes encrypt the virus or alter it on a regular basis so that it looks like something different.
I haven't gotten down into the weeds of this yet. Tbh I'm hesitant to start up my own 'sandbox' and risk infecting my systems if I do something stupid. I'd sort of prefer to know someone already doing this kind of work, and learn from them as much as possible.
I like the idea, though, and may just jump in and start messing around with it at some point. If I break something, well, I'll just have to learn how to fix it afterwards, right? It's just... there are tons of other interesting things to explore, as well!)
Sunday, February 24, 2019
Computing - Binary, Hex, and Digital Forensics
I suppose the best place to start is with binary.
Computing, for most people, is a bit like driving a car. That is, we all know how to put the key in the ignition, start a car, turn the steering wheel (to direct the car), press the accelerator and the brake, select forward or reverse, and so on and so forth... but most people don't necessarily understand what is going on behind the scenes. What happens when you press the brakes, or accelerator, or turn the wheel.
By the same token, we all know how to turn on a computer (or smartphone), how to surf the internet, how to point and click with a mouse or type with a keyboard... but we don't necessarily understand what the computer is doing behind the scenes to make the magic happen. How does it know what a particular mouse click means? Or how to interpret what we type in a keyboard? How does it know what to do when we type a URL in an internet browser?
The very, very basics come down to binary. Most computers use electricity, of course, so it's all tied to electrical signals. On or off. (You can actually create computers using other means, so long as you have some way of indicating 1 or 0... like punch cards, which can be punched or not punched. It's just that electrical signals can be fast, and we can create electrical circuits that do the calculations with a very small amount of space.)
The computer gets a series of 1s and 0s, and it knows how to interpret them based on computer architecture and the efforts of previous computer scientists to build a framework for interpreting a particular sequences of 1s and 0s. In some cases, that series of 1s and 0s might indicate a particular action to take (the machine language of that particular system), or a memory location where the data is stored, or the actual data stored at that location.
Behind the scenes, your computer is processing all sorts of 1s and 0s to store numbers and act on numbers, and it can process thousands of instructions every second.
The words I type here can be coded in binary (ASCII says that certain strings of 1 and 0 refer to specific letters. So the word 'word' can be coded as 01110111 01101111 01110010 01100100.) Again, computers build upon all the work that came before, and things like ASCII were created so we had a consistent way of coding 1s and 0s to indicate letters. (There are other encoding formats, like UTF-8 and UTF-32. Since computers are used for languages that don't use the Roman alphabet, it's important to have codes for all the other possible symbols.)
The machine needs to know how to interpret any particular series of 1's and 0's, whether it indicates a letter, or Chinese character, or number, or instruction, or memory location.
We have problems understanding binary, of course, so we have ways of making this more human readable. (Hexadecimal, or a base-16, can seriously reduce how much space it takes to write binary. We're most familiar with decimal, of course, but using 'A' for 10, 'B' for 11, and so on until you get 'F' for 15 is convenient. It's basically because four binary numbers can be represented by one hexadecimal number. Since computers are often groups in sets of 8, or a byte, two hex characters can represent a byte. So 'word', in hex, is 776F7264. 77 is 'w', 6F is 'o', 72 is 'r', and 64 is 'd'. It's useful for a variety of reasons, and very basic machine instructions are often shown in hex instead of binary because it's easier for humans to understand... just remember that the computer still sees it as a string of 1s and 0s.)
Digital forensics... well. Every file you have is stored somewhere as a series of 1s and 0s, with some additional information (the header, for example) that helps the computer understand what that sequence of numbers means. The header might have a string of 1s and 0s indicating it's a jpg picture, or docx word document, and in hex that header would be 'FF D8 FF E0 00 10 4A 46 49 46 00 01 01' for jpg, or 'D0 CF 11 E0 A1 B1 1A E1' for a .doc file.
When you delete a file, the computer doesn't erase all those 1s and 0s. It just changes a bit (a bit being one digit that's either 1 or 0) to indicate that that space is now free, and if/when you save something else it *might* decide to save it where the old file used to be. (Actually, anyone who has deleted a file to the trash bin knows there's two levels of 'delete'. The first delete makes a small change that puts it in the trash bin, but you can still 'restore' the file as the computer won't try saving anything where that file exists. When you empty the trash bin and 'permanently' delete it, the computer now considers that space fair game.)
That's part of how digital forensics works... it can find the series of 1s and 0s still in existence. If all that was changed was the bit indicating whether the space was available or not, it can undelete the entire file. So long as it's not been written over already. (There's more to it than that. Sometimes something is saved over part of the previous file's data, so you might lose the beginning portion and still be able to understand what the rest of the file held. And if the information was overwritten in a known process, you can reverse the process to recreate what used to be there. I now understand what some of the IT guys in the military were talking about when they digitally 'shred' something... they're basically running a program that repeatedly overwrites the data with 1's and 0's so that it can't be reversed and the data can't be recreated.)
Encryption encrypts those 1's and 0's in a systemic way, so that (if you have the key) you can reverse the process and get the original information... but someone without the key can't.
That's a very, very simplified explanation that doesn't even go into what the CPU does, much less how this works with websites and internet connectivity. I'll post more another time.
Sunday, February 11, 2018
CS Cracks Me Up
"I'm doing homework and came across something called a *-property. Pronounced star property.
Why? Because nobody could come up with a name. It was written that way in a report so it'd be easy to find and update once they came up with a name, but they never did.
This field is sometimes so...goofy. They make jokes all the time, like naming the programming language C++ because it meant "add one to C" in computer programming, or rather its a way of saying "one better than C"
I've always kind of enjoyed how humanity seeps out in unexpected ways (like how telegraphers develop a distinct style and personality even when transmitting something as straightforward as Morse code), but I somehow still don't expect it to essentially permeate my textbooks like this.
Tuesday, June 20, 2017
School Update
Saturday, December 10, 2016
Further Updates
I've decided to go back to school, for a second bachelor's in Computer Science. I've already been accepted to the online program at U of I, Springfield. I've applied for my post 9/11 GI benefits, though I don't think I'll get 100% (my commitment from ROTC doesn't count towards time in service for this, so when they calculate the percentage it'll be based on the year or so after I finished that)
I feel - good. Yet terrified. But good.
That is, it feels like the right decision. I've got some more details to work out (I want to study Finance as well. UIS has a finance minor, but not entirely online. I might be able to transfer to U of I, Champaign Urbana later. Maybe. I've got options and possibilities.)
It feels right... but holy hell, what have I just gotten myself into?!? I'm going to quit a rather decent paying job (with benefits!) to study something that will require calculus (which I haven't done in over twenty years) in a field where I - a woman pushing 40 - will definitely not be the norm. And what will this do to my finances?!? I think I can find freelance work or a part time job to fill in the gaps, but I'm worried that three months from now I'll be kicking myself for doing something stupid.
So. Yeah.
Scary. But exciting.