Page 1 of 1

avoiding coding "representation errors"

Posted: Sun Oct 21, 2018 10:23 pm UTC
by >-)
a class of programming bugs i often commit involve mistaking the "representation" of a variable -- such as adding degrees and radians, treating a coordinate in the camera-frame as a coordinate in the world-frame, or forgetting to swap the channels of an image from BGR to RGB.

this can be alleviated with a strong type system, and if i carefully create a type for each unit or representation that a coordinate/image might have, but that approach seems to be pretty heavy handed: i'd need to define a bunch of wrapper classes which do nothing besides check that types of the arguments match.

i'm not sure if using a type system to do this is the right approach since i've never seen it in real world code. so what is the solution?

Re: avoiding coding "representation errors"

Posted: Sun Oct 21, 2018 10:55 pm UTC
by ucim
Try incorporating the unit in the variable name: degrees_from_north, inches_to_target... stuff like that. It won't keep the computer from making a mistake, but it will hint to the programmer to not do so.

Jose

Re: avoiding coding "representation errors"

Posted: Mon Oct 22, 2018 12:22 pm UTC
by Xanthir
I also often name functions that *take* a particular value and convert it into another value to encode that information; `fooFromRadians()`, for example, is much harder to accidentally pass degrees to. ^_^

Re: avoiding coding "representation errors"

Posted: Mon Oct 22, 2018 1:24 pm UTC
by Soupspoon
It happens to the best of them.

I usually fall down on 'dimensionless' misreferencing, like misidentifying 'nth character' with 'element n' in the zero-indexed array of characters. Or trying to work out if whatever version of spreadsheet-like MID(string,startcharacter, length) formula matches the substr(EXPRESSION,OFFSET, LEN) of something more code-like in nature that I'm more used to. And whether I need to do things off-by-one to splice out something identified by a FIND(string,match) position return of a marker (maybe multicharacter; and maybe post-splice, pre-splice or even intended to be part of the splicing grab, so already there's a LEN(match) included or not).

But, as already said, variable names (variations upon Hungarian notation, even if not actually type-different) is one way. Going full-blown creating child types in an objective structure that reveals the 'correct' value through autoconversion when used in a sibling-type context is perhaps another method, if you have the time and inclination to create and validate all the cross-links, but beware of rounding errors creeping in as it cross-converts, especially when the conversion is between an RGB of a grey tone and an HSL with a technically undefined hue.

Re: avoiding coding "representation errors"

Posted: Wed Oct 24, 2018 5:09 pm UTC
by >-)
yes, automatically converting representations is definitely a bad idea, especially since the conversion process is often not "lossless" as you gave an example of. (also it's usually impossible to do automatically, as you can't convert a camera frame coordinate to a world frame coordinate without knowing the camera pose)

hungarian notation seems to be the right answer. as joel spolsky points out (https://www.joelonsoftware.com/2005/05/ ... ook-wrong/) the original intended usage of hungarian notation is exactly to solve this problem, NOT to prefix variables with their type just for the sake of doing so.

Re: avoiding coding "representation errors"

Posted: Thu Oct 25, 2018 6:58 pm UTC
by elasto
>-) wrote:hungarian notation seems to be the right answer. as joel spolsky points out (https://www.joelonsoftware.com/2005/05/ ... ook-wrong/) the original intended usage of hungarian notation is exactly to solve this problem, NOT to prefix variables with their type just for the sake of doing so.

That's a really interesting bit of history, thanks for pointing to that. Explains an awful lot too.

Re: avoiding coding "representation errors"

Posted: Fri Oct 26, 2018 3:02 pm UTC
by Tub
On the topic of "making wrong code look wrong", the two things that look wrong when I read the article are:
* using string concatenation to generate html
* cleanup code outside of a destructor or finally block
Clean code is a rather subjective and debatable thing, but in this case I'd rather fix the root cause.

If you have strict requirements (like preventing XSS attacks, where a single issue is fatal), you need to use safe APIs. Naming conventions will not help unless you lint for them (but then it's usually easier to write a safe API than a lint rule). If you're writing guidance software for a $1B space probe, invest the time in a type system that requires explicit units and conversions.

If it's just about making code more readable, and spending a bit less time debugging, variable and function names often benefit from a bit more verbosity. There are more options than just using hungarian prefixes, and going strictly 100% hungarian is often more of a maintenance burden than actual help. Find a balance that works for you, and be more verbose on identifiers with a huge scope.

Re: avoiding coding "representation errors"

Posted: Fri Oct 26, 2018 3:16 pm UTC
by ucim
Tub wrote:* using string concatenation to generate html
Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start? What would you recommend? There are times where components are self-contained:
$middle =<tags> stuff </tags>
but there are times when the "stuff" is computed separately from the tags surrounding it (formatting depends on one thing, content depends on another). Then what?

Jose

Re: avoiding coding "representation errors"

Posted: Fri Oct 26, 2018 10:40 pm UTC
by Tub
ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?

Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad. One leads to XSS attacks, the other to sql injection. The same problem exists for xml, json, filenames, command lines, urls and any other textual formats that require escaping.

Escaping a string once and calling it "safe" is simply wrong, because different parts of a html document have different escaping rules.

For database queries, most databases provide a templated API, like db.query('SELECT * FROM foo WHERE a = ? AND b = ?', 42, "bar").

For HTML, there are tons of template engines of varying quality and features. A good one will parse and understand the html structure, reject invalid templates at compile time, and choose the proper escaping based on where your values are inserted.

Re: avoiding coding "representation errors"

Posted: Sat Oct 27, 2018 12:50 am UTC
by Flumble
>-) wrote:this can be alleviated with a strong type system, and if i carefully create a type for each unit or representation that a coordinate/image might have, but that approach seems to be pretty heavy handed: i'd need to define a bunch of wrapper classes which do nothing besides check that types of the arguments match.

i'm not sure if using a type system to do this is the right approach since i've never seen it in real world code. so what is the solution?

The bulk of "real world" code is written in languages that have shitty type systems that require a lot of clutter and wrapper code if you want to add more semantics to your types, so you barely see it. (IIRC C++'s time library has very specific time types, but that's one of few cases.) I do think it is the right approach, but it requires a language with a decent type system and good inference/little duplication so you don't have too much bloat.
I think languages like Agda have a type system that is expressive enough (and a syntax terse enough) to cram nearly all your semantics in the type level, so you don't need any Hungarian naming for your variables. And you still get your compiler to complain (rather than a runtime error or a silently crashing orbiter) when it fails to deduce relativeImpulse = lockheedImpulse-referenceImpulse, because lockheedImpulse is in lbf¹s¹ whereas referenceImpulse is in N¹s¹. Or it may complain about fruits = apples+oranges unless apples (in AppleCount) and oranges (in OrangeCount) can be autoconverted to a FruitCount type. Or it may even complain about inLeftHand = inBothHands-inRightHand because they're all FruitCount←NonnegativeNumber, while a-b may clearly be negative unless you can assure the type system that a≥b.

It sounds very rude to eat an apple with your right hand while holding 0 apples in total.

Re: avoiding coding "representation errors"

Posted: Sat Oct 27, 2018 3:51 am UTC
by ucim
Tub wrote:
ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?
Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad...
But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

Jose

Re: avoiding coding "representation errors"

Posted: Sat Oct 27, 2018 6:49 am UTC
by phlip
ucim wrote:
Tub wrote:But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

But how does your clean() function tell the difference between a good tag that you want to be there, vs a bad tag that was injected from user input that you've accidentally forgotten to escape properly?

Re: avoiding coding "representation errors"

Posted: Sat Oct 27, 2018 2:59 pm UTC
by ucim
phlip wrote:But how does your clean() function tell the difference between a good tag that you want to be there, vs a bad tag that was injected from user input that you've accidentally forgotten to escape properly?
It doesn't, universally. But in some restricted use cases it might. In the case where user-supplied HTML is simply not permitted (and all user input is cleaned before further processing) then it should be ok. All tags would be programmer-supplied (perhaps based on user hints, such as bbcode)

Yes, the programmer could use logic that is too convoluted, but even template engines have to put it together somewhere, and that's going to be string concatenation too.

Yes?

btw, you have a quote fail. String concatenation bug? :)

Jose

Re: avoiding coding "representation errors"

Posted: Sat Oct 27, 2018 3:04 pm UTC
by Flumble
ucim wrote:
Tub wrote:
ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?
Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad...
But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

Surely $safestart+$safemiddle+$safeend$safehtml because concatenation is a safe operation, right?
But even if all escaping is done right and all concatenation is safe (and type-correct), HTML is a tree structure, so gluing an html to another html as if it's a sequence is fundamentally wrong.
Alright, you'll need an asText function somewhere that converts an HTML node to text using string concatenation. But that's the one place where you disable your error reporting or write all the type annotations and conversions so you can build "HTML text" (which is, of course, fundamentally different from other text and other representations of HTML).

Re: avoiding coding "representation errors"

Posted: Sat Oct 27, 2018 7:03 pm UTC
by ucim
Flumble wrote:HTML is a tree structure, so gluing an html to another html as if it's a sequence is fundamentally wrong.
Well, gluing HTML to HTML doesn't mean you get good HTML at the end ("enclosing HTML in HTML" is probably a better way to abstract it out, but enclosing employs concatenation anyway). That's why I specified beginning, middle, and end, neither of which is necessarily (complete) HTML.

e.g.
$opentags = '<b><i>';
$text = 'Hello world';
$closetags = '</b></i>;
$output = $opentags+$text+$closetags;


It's still on you to not mess up the tag order (as I did here), because that is what is fundamentally wrong. But that's a different kind of bug, and not the fault of concatenation.

and...
$opentags = '<b><i>';
$userinput = getuserinput();
$text = clean($userinput);
$closetags = '</i></b>;
$output = $opentags+$text+$closetags;


should work fine, no?

In this simple example, it might be better to do:
$taglist = array('b', 'i');
$tags = makeHTMLtags ($taglist);
$opentags = $tags['open']...


but I'm not convinced that it's the concatenation itself that is the issue. Like an MP3 file, you can't glue pieces arbitrarily. The pieces have to be assembled correctly. But I'm not sure that is amenable to a simple "just use HH (HTML Hungarian) notation" solution.

Jose

Re: avoiding coding "representation errors"

Posted: Sun Oct 28, 2018 7:47 pm UTC
by elasto
Obviously you're just trying to give a simple example here, but surely you wouldn't be constructing it that way.

Off the top of my head, and assuming we are going for a really simple use-case here, wouldn't a better approach be something like:

Code: Select all

var rawUserInput = GetUserInput();
var cleanUserInput = Clean(rawUserInput);
var cleanFormattedUserInput = cleanUserInput.ApplyItalics().ApplyBold();


That way you can't mess up the tag order in the way you suggested..?

Re: avoiding coding "representation errors"

Posted: Sun Oct 28, 2018 9:40 pm UTC
by ucim
elasto wrote:var cleanFormattedUserInput = cleanUserInput.ApplyItalics().ApplyBold();
Sure, but when you write the ApplyBold() method, aren't you going to something like
$bolded = $openboldtag.$this.$closeboldtag;? (Please forgive my mangling together of php, OOP, and C++) :)

The point was that string concatenation isn't (or is it?) a bad way to create HTML. Ultimately, how else would you do it?

Yes, you can't stick things together willy nilly, but you do have to stick things together! The thing is to try to keep the pieces simple, so that it's harder to unknowingly stick the wrong things together. Sometimes though, there is a complicated piece of HTML for which you could either (here be dragons!) brute force it, or write a ch*rpton of fiddly little functions you'll only use once. And while it's tempting to say just write the functions to keep things clean, once those functions are written, it will be tempting to re-use them (nothing is ever used just once), so they will need to be made robust under all sorts of use cases you'll never use them for, in case you do use them for one of those other use cases you didn't think of.

Whenever you put a wrapper on something, it needs to be a good wrapper.

Jose

Re: avoiding coding "representation errors"

Posted: Mon Oct 29, 2018 10:45 am UTC
by Tub
Jose, if you really wish to make the point that string concatenation is fine for html creation, please start by implementing that clean() function that you keep using, such that it produces correct and safe html in any case. Multiple people have explained to you why that's problematic, but you've ignored them. Maybe you'll need to implement it to understand.

Next you'll need to show how you're going to test your html for mismatched tags and other invalid constructs before deployment.

Those are the basic requirements for any API: safe, correct and verifyable. Anything else is not suitable beyond small personal projects.

Once you've shown that it's a usable API, we can discuss whether it's a good API, and whether it's better than the alternatives. Is the code easily readable? Is the html structure you generate easily discernible from your code? Is your development environment capable of syntax highlighting your html, possibly highlighting invalid syntax as you type it?
You keep asking about alternatives, but you keep ignoring the answers. Go and research a few templating systems; most of them fare much better than the approaches you've posted here.

Re: avoiding coding "representation errors"

Posted: Mon Oct 29, 2018 2:09 pm UTC
by Sizik
The real solution is to let go of the assumption that HTML should be stored and manipulated as plain strings, and only convert it to a string at the very end when the page is complete.

Re: avoiding coding "representation errors"

Posted: Mon Oct 29, 2018 5:22 pm UTC
by Tyndmyr
ucim wrote:
Tub wrote:
ucim wrote:Because $start+$middle+$end could contain a $middle that accidentally closes a tag in $start?
Creating html via string concatenation is bad for the same reason that creating database queries with string concatenation is bad...
But surely you should be able to do:
$rawhtml = $safestart+$safemiddle+$safeend;
$safehtml = clean($rawhtml);

assuming you have a clean() function that does the proper escaping for the kind of HTML you need at that point, no?

Jose


You can. However, in practice, complexity tends to build up. So long as complexity stays sufficiently low, it's fairly easy to keep track of, but if you scale upward, it'll eventually become less obvious what you're doing, and troubleshooting can be annoying.

There are some cases where you have to deal with complex strings in relationship to html(Struts webapps can be one of them if you get creative with the framework), but generally, you want to avoid it as much as possible so you don't need to return to your html generation as your app grows.

Also, as implied when Tub was talking about attacks, if anything you're creating is coming from user input, you need to properly sanitize that. Yeah, you could roll your own security, but given that it's a pretty generic problem, you're usually better served by using existing methods. Same goes with string concatenation. Yeah, it's a string at the end, but that doesn't mean you benefit from treating it as a string at all other times. Your content will determine what exactly is most handy, but a tree structure is pretty normal. Likewise, if you're dealing with XML, you're largely going to use an existing parser rather than doing string concatenation on your own, unless you have to deal with very unusual input, in which case you're basically rolling your own parser to deal with probably-non-standard data that somehow needs to be cleaned and used. That's an edge case, though.

So, there's a little fuzziness where circumstances exist in which "bad" practices are needed, but as a general rule of thumb, it's definitely worth following existing standards. The existence of edge cases isn't a good reason to ignore standard practices in most cases.

Re: avoiding coding "representation errors"

Posted: Mon Dec 10, 2018 11:40 pm UTC
by Jplus
>-) wrote:a class of programming bugs i often commit involve mistaking the "representation" of a variable -- such as adding degrees and radians, treating a coordinate in the camera-frame as a coordinate in the world-frame, or forgetting to swap the channels of an image from BGR to RGB.

this can be alleviated with a strong type system, and if i carefully create a type for each unit or representation that a coordinate/image might have, but that approach seems to be pretty heavy handed: i'd need to define a bunch of wrapper classes which do nothing besides check that types of the arguments match.

i'm not sure if using a type system to do this is the right approach since i've never seen it in real world code. so what is the solution?

I do think that a better type system is the solution. The approach that you described is sometimes referred to as "type-rich programming". As far as I know, this has been explored the most in C++, with libraries like Boost.Units. Bjarne Stroustrup also mentioned it in an article (see the "Compute Less" section). People sometimes take this effort in C++ despite the fact that it takes some cumbersome struct-wrapping, as you and Flumble already mentioned.

Haskell has newtype, which should make this a bit easier, but I'm not aware of similar efforts in the Haskell world.

Type-rich is probably only feasible in a few rare languages with very powerful type systems, like C++ and Haskell. Languages like Agda (also mentioned by Flumble) and Coq have stronger type systems, but they're purely academic.

I'm strongly in favor of type-rich. I'm always a bit disappointed when people introduce yet another new language with a static type system that isn't Turing-complete. For example Nim and Swift. I just did a web search and found that nowadays more languages have Turing-complete type systems than I thought (particularly Rust and TypeScript), though, so that made me a bit happier. Maybe Nim or Swift has crossed the barrier in the meanwhile, too.

In languages where rich typing is not an option, I do agree that Hungarian notation (as originally intended, also mentioned before) is the best alternative. You don't get compile-time checks, but at least wrong code looks wrong.