Monday, March 4, 2019

Computing - User Input, Web Applications, and Security

I found a marvelous source for creating the virtual environment I want, so I am happily configuring my virtual network environment. Or not so happily, as the case may be. Since software tends to update almost as soon as any guides are created, there's always a bit of fun to be had in figuring out how to apply the instructions to your actual system. (I got pfSense installed, and it can ping out, but I can't seem to access the website for configuring it from my laptop. Google isn't getting me the answer I want, as I keep getting hits for similar-but-not-quite-the-same problems. I wonder if the same would hold true if I created a VM on the virtual network?)

I've also been reading a rather excellent book on how hackers access information through a business's particular web site - The Web Application Hacker's Handbook, which explains so much about why/how security is so complicated these days.

The issue comes back, yet again, to user input. See, since computers use 1's and 0's for everything, the only way they know whether a sequence of 1's and 0's is supposed to be a number, or a letter, or a location, or an instruction is because of the context.

Most computer science programs will focus on teaching their students at least one programming language, in part to teach you programming logic. There are some slight differences (i.e. object-oriented programming and whatnot), but the basics are fairly similar.

It's the syntax that changes. Different languages have different ways of telling the computer when an instruction ends. And so in some languages we use a ';' to indicate the end of one line of instruction, so the computer knows when to stop. And if you go to your web browser and select 'web developer' or somesuch from the viewing options, you'll see the code for the webpage you are viewing. It will probably have something like <head> and </head> or <body> and </body> to indicate which text is part of the head, and should be formatted as indicated elsewhere for headers, and so on for the body. Note the closing '/' to indicate the end of a section.

It may seem overly technical to anyone not in computers, but bear with me.

A very common 'first program' in any language, is to print "Hello, World!" to your screen. In Java the particular line of code would be -


System.out.println("Hello, World!")


Running a program that includes that line will give you 'Hello, World!'. But let's say you want to add another line, "How are you?" You could enter


System.out.println("Hello, World! How are you?");


And the result would look something like "Hello, World! How are you?"

But what if you want it printed the second half to print on a new line (entering a carriage return, in old typing terminology.)

There's code for doing that, but the computer reads everything within the quotation marks as letters and prints accordingly. So you need an escape character, something to tell the computer "Hold up, wait. This needs to be processed differently."

In java, you can use the '/' as the escape character, so if you said


System.out.println("Hello, World!/nHow are you?");


It would print something like:

Hello World!
How are you?

Escape characters are actually kind of important, because that's how a hacker can tell the computer to process their input as commands rather than simple text.

I've only just begin to read the book on web hacking, but the first few chapters easily conveyed just how difficult it is to validate user input.

For example, a hacker might try typing <script> to do something (presumably start a script? Looks like the type of code you see in web pages, like those <header> bits, or xml code).

A business may code their application to remove all  <script> instances from user input. But what if they type in <scr<script>ipt>?

Now if you take out the <script>, it collapses the rest of the line into another <script>.

To add to the confusion, recall that every character can be represented by numbers (i.e. ASCII coding format)... so the hex number 27 can be read as an apostrophe, or 25 can be read as a %, 3c can be <, etc.

If you know what language is being used, and how the input is being read... to include the removal of certain characters... you can come up with a particular line of code that will get through all the filters and do something unintended.

Apparently, many businesses use multiple different tools to create their web applications. And if those tools use different programming languages, and react to different escape characters, then there's no one-size-fits-all way of checking user input for hacking attempts.

No comments:

Post a Comment