My Unix CLI manifesto, aka why PowerShell is the bees knees

Please compose all posts in Emacs.

Moderators: phlip, Moderators General, Prelates

User avatar
tetsujin
Posts: 426
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Thu Aug 30, 2012 9:37 pm UTC

tomCar wrote:
tetsujin wrote:(On encapsulating binary payloads in CSV:)

1: the receiving program doesn't know it's binary data, let alone what it's supposed to represent.
2: By ASCII-encoding the payload you're inflating its size a lot. (Doubling it for a HEX encoding, or increasing its size by about 1/3 in the case of base-64)
3: If you use an encoding like base-64, then you lose a valuable property of a bytestream: the byte boundaries of the source data no longer fall on byte boundaries in the encoding.

(paraphrased for brevity: in all cases, JSON presents the same issues)


1: True. But JSON and XML give you pre-defined mechanisms that are useful for addressing that. In CSV you'd have to invent those mechanisms, too - mechanisms which all tools working with the shell serialization format would need to respect to some degree. It just seems to me that there's not much point building features on top of CSV just to get to the point where you can establish conventions for type-tagging things, when you could just use JSON which isn't that much more complicated, and which is pretty well accepted among the community already.
2 and 3: Agreed, it's a problem with any text encoding. This is one of the reasons why most of my efforts have been set to defining a bytestream format for my shell.

In any case you need to define the convention used to tag data. But in CSV you'd have to define the syntax for that mechanism, as well. This is the bit that I think is particularly undesirable. I don't think it makes sense to re-invent a mechanism you'd get for free with JSON, especially given that JSON is compact, popular, well-defined, well-supported, simple, robust, lends itself well to the definition of extensible and maintainable formats, etc.

(Obviously, inventing a whole new format for my shell design has problems of its own. I expect this will prove an obstacle to adoption of my shell - as much as Unix guys are likely to resist any "common interchange format" in the shell based on misguided adherence to the letter of "The Unix Philosophy" - I think they are much less likely to resist JSON as the basis for such an environment. It's simple, it's text-based, it's fairly popular, and XML makes it look very nice and compact by comparison. So I do have to think about whether it really makes sense to stick with this plan, of the bytestream format being the primary format - I think text-based JSON will have to play some role in my shell ultimately.)

I don't agree that nested structures aren't "useful enough" - and your criterion there, "useful enough to bother", is skewed by the fact that you're starting with a format that doesn't naturally lend itself to that kind of functionality.

For instance, what's "useful enough to bother" using nested structures in JSON? It's trivial. You just use 'em. Anything that processes JSON will understand that it's a nested structure, and any JSON parser will be able to tell you exactly what the correct (decoded) payload for each field is.

You're missing the point, the point was that no utility uses them and that I don't believe nested structures would make life easier. In other words, show me how you'd use nested structures to simplify the usage of some utility. Then we can discuss whether or not nested structures are useful.


I can't believe they wouldn't be useful. We're dealing with a set of limitations so ingrained into the operation of a traditional Unix shell that users of these shells can hardly see beyond them. If we can't deal with a particular problem within the limitations of the shell, then we treat it as a problem that shouldn't be addressed within the shell - instead of treating it as a flaw in the shell, we treat it as a limit of the scope of a shell.

So why should the "scope" of the shell be wider than it is? Well, that one is harder for me to explain. Basically, it's just what I want. I want a shell that is more powerful. I guess you could call it a more POWERful... shell. But Microsoft is icky, so I don't want their icky shell, and I don't want to copy their icky (but, admittedly, well thought-out) design decisions. But when you look a the kinds of things Powershell can do - for instance, chain two "commandlets" together, and they can pass .NET objects back and forth - not even serializing them. That's pretty damn cool, and there's a lot you can do with that. Application scripting? Sure, why not?

But the Powershell approach is a harder sell in the Unix environment than in Windows - to pass "live objects" between processes without copying or serializing them, the two processes would have to share (at least some) memory space, and common ideas of how objects are represented in memory. .NET solves this with its virtual machine system - which would be a very hard sell to Unix users, I think. "Don't use that version of Python that you're used to using and which is the most up-to-date, use this version I forked to use my special VM to represent all its objects, which may or may not be compatible with all the libraries and stuff you're used to using." So there has to be serialization - it's the only way to get this kind of functionality safely. But I still want to be able to address the kinds of things Powershell is capable of addressing. I want the shell to be able to handle "large" problems, problems we presently consider "outside the scope" of the shell. For large problems, nested structures are an essential organizational tool. For small problems, not really "essential" at all - as you rightly point out, in simple cases it's not a big deal to do without nested structures. But they can still make the organization of data clearer and more natural, better suited to what the data represents, even in simple cases.

So for example, let's say I'm processing a list of records, and attaching some additional data to a record as it makes its way through a pipeline. Now, I could stick that extra data into the record itself - invent a new field for it (named or unnamed, whatever) - But that approach has some inconveniences as well as some legitimate problems. Like "naming". If it's a named field, the new field has to be given a name that's not already in that record. If it's a field referenced by index, then it has to be an unused index. I could insert it as the first field - but then I'd have to remember that the indices of all the other fields have now changed. I could add it as the last field - which would probably be less problematic overall, but I'd have to remember at what index I put it. Either way, the complexity of the whole issue grows significantly as more and more pieces of data are added on. None of them can conflict with each other, none of them can conflict with the stuff that's already in these records, and if adding a new one impacts how I reference the record's true fields, or other fields I've added, I have to remember that, too. And, presumably, at some later step in the procedure, I may want to undo all this, get the original record back.

Encapsulating that record makes the whole deal a lot easier. The whole, original record is set aside, encapsulated within one field of the new record. Let's say that original record is at field 0 of the new record. Data that I add to the record goes into field 1, 2, 3, etc. I'm numbering them here, but they could as easily be named. Any time I want to cast off the extra junk and access the original record, I just read field 0. New fields (if named) don't have to be kept distinct from existing names in the structure, and if they're indexed, they start from 1 instead of "however many fields the record has" - and they don't complicate the process of accessing the original record fields.

Again, the nested encapsulation isn't necessary - in the same sense that the current Unix tools do fine without it, just as they "do fine" without a truly common serialization format - but it makes it easier to express and access the data, and easier for the user to see where the boundary lies between the encapsulated record and the extra data that's being attached to it.

This could be an issue. I don't intend to use relations in the first place, I was merely suggesting that they can accomplish the same goals.
However, I will make note that there's no reason a filter program couldn't know how to remove residual records. After all, a referenced record probably belongs to a different "table" and if we're removing the field referencing that table then it shouldn't be that hard to find the table and remove it as well. If we're referencing something from the same table... well we probably don't want to remove the record anyways (as it probably has useful information.)


If relational-linking were part of the format, such that a program reading the stream would know that a certain field was a "link" and knew that certain records should only be retained when there's another record linking to it, then that approach could be annoying to support but it would work. Otherwise, I'd say that functionality should rightly be considered outside the scope of a filter program (whose job is to test a field and either pass the whole record or drop it - kind of analogous to grep) - but it would be difficult to implement correctly without it being part of that filter program if the filter program can't at least tell what's a "child record" and what isn't.

I have to believe nested structures are useful because we use them all the time in other programming languages, in our filesystems, in our documents... It's an idea that clearly works. You have to figure, also, that one common scenario for nested structures will be that you're actually just encoding and passing over the stream a piece of data that was created somewhere else, data which was originally represented as some sort of nested structure. If the streaming format supports those data structure concepts, then the translation process is pretty straightforward, and if two different people were to guess how that structure would be translated, it's likely they'd arrive at the same answer, and be unsurprised to find it's the same answer the computer came up with.

You're begging the question here. Assuming that we have a nested structure to deal with before we've decided to represent something as a nested structure.


I don't believe I am "begging the question" - though I am not thoroughly, intuitively familiar with the classic Logical Fallacies and had to look it up to remember what it means. I have said things along the lines of "if the shell supported structures better, people would find a use for them", or "nested structures aren't a useful thing in the shell presently because they're too hard to use" - that could be considered "begging the question", I think, if it were my only argument in favor of the feature. But I've also explained my reasoning. First, they're a feature found in virtually every programming language in common use. Would this be the case if they weren't useful? This brings the question of the "scope of the shell" into question again - and I suppose it is valid to say that maybe the scope of the shell doesn't need to be greater than it is - that's just not the kind of shell I want to create. And second, if you're working with nested structures in other environments, then having support for nested structures in the shell environment's interchange format means that you can directly and naturally stream those structures out, and read them into a script or another environment.

Personally, I've never seen a nested structure in a document


An HTML document is a whole hierarchy of nested data structures. A paragraph may contain a table, which contains rows, which contain cells, which may contain other paragraphs.

A less technical form of "document" will still contain nesting: a book has chapters, which themselves may have sections, which themselves have paragraphs, images, etc. This hierarchy is often reflected in the structure of the file in which the document is encoded.

Programming languages are a little different though. They can (and usually do) define their structures before actually creating/using them. This means that if you want to do something with a structure you don't have to look at all of it, you can just look at the 1 component you want. This is not the case on the shell. The entire data structure is going to get written to your terminal and if you have to wade through a nested structure on the terminal... well good luck (I don't envy you.) In this case, references work better since you don't have everything and the kitchen sink in one place.


If the shell is given a more sophisticated data model for its serialization, of course that more-sophisticated data model will be supported in the syntax and in the value display as well. So it would be possible to do things like store a structure into a variable, read a field out of it - or, yes, print it to the screen, hopefully in a way that it's actually usable. Ultimately I think neither EvanED or I are thinking of a traditional shell-to-terminal interaction here. Once the other pieces are in place we'll probably go to keyboard-controlled GUIs (or terminals with GUI elements). On more traditional "terminal" displays we'll just have to do what we can. It's a problem for sure, but that doesn't mean it's one to dodge - it's just something we have to think about and try to address.

Look at it this way: when you have a nested structure in an interactive Python session, and you type its name in the REPL, what do you get? You get a printed description of that object - whenever possible, this is in a format that if you copy/pasted it back into the REPL you'd get another object with the same value. Yes, you can have a structure so huge that it's unwieldy - in which case you can do things like look at what fields it has or what array index values it will accept, things like that. You can dig in more gradually, narrow the scope to find a small-enough piece that you can work with the data. You're still working with potentially huge data, you're still within the confines of the terminal, but you have the tools to deal with it.

CSV is extensible. If I wanted to I could add another field, it would just have to come last


There are all kinds of scenarios where this just isn't adequate. We've seen plenty of them already in the classic UNIX tools, most of which do use some kind of line-based, "simple" delimited-field format.

A very basic example is, what happens to the format after a lot of these additions/removals are performed? You wind up with a bunch of unused fields kept around as vestigial place-holders, and a bunch of new fields tacked on to the end.

Because ",,,,Hi" is so much worse than "{"field":"Hi"}" which is assuming that I would even have those fields stick around. If it's unused then it can be safely deleted in most cases.


It is, actually. Someone who looks at that data without some kind of reference manual handy has no way of knowing what the eighth field is supposed to represent, or why there are seven empty fields preceding it. If you delete or repurpose the old, "unused" fields and suddenly everyone's scripts break. Keep them around and you have an ever-growing collection of cruft. It's less of an issue with CSV than it is with the current "ad hoc" style of Unix tool output (in which programs write out data that is intended to be both script readable and human-readable when printed on a terminal) but it's the sort of thing that does become a real problem as a tool ages.

Or what if two different people, working on diverging implementations of the same program, both add a new field to that stream format? Naturally, they'll both add the new field on the end, and in both cases it'll be the Nth field. This is the sort of thing you might get from, say, different implementations of "ls -l" or "ps". Scripts working on the output of those utilities won't be portable because that Nth field has different meaning depending on which version of that utility is installed.

Not my problem. They should have thought it through better. Which is, by the way, what I want to encourage. I don't want people treating their data structures lightly. I don't want people doing something simply because they could.


See, that's one of the sorts of problems I actually want to try to solve. As it stands, it's a pretty significant problem: it's one of the reasons people don't like to rely on shell scripts. Starting over with a fresh set of tools helps for a while, until divergent derivative versions start creeping in again... But we can attack the problem on a more fundamental level, and I think that's a worthy and important goal. Naming the fields doesn't solve the problem entirely, but it helps. It becomes easier and more reliable for scripts to answer questions like "where is the field I want in this version of this tool's output?" or "is that field present at all?" - and it becomes easier for users to look at the data and figure out how to access the field they need.

It's really not that hard to get together and agree to a single order. No more so than it is to get people to not use the same fields.


Ah, but it is. This is because the field indices are always allocated in order. Anyone who defines a new field is going to take the "next one". Field names, on the other hand, are a sparsely populated space. You can establish conventions (such as DNS naming) that will prevent conflicts.

Again, look at ps. If you look at the fields in terms of how the columns are labeled, it's easy to tell what is what. If you look at them just in terms of their order, it's much harder. For a script to look at the output of ps (cross-platform) and know which field is which is pretty difficult. "ps" on cygwin, for instance, has nine fields, one of which is unlabeled, and two of which can contain whitespace. If it's easy for people to coordinate decisions like this, then it seems like we haven't had a lot of luck with that so far. :)

a prefix like "perm" might mean "this field tells you who can read/write/execute this file" - while the field itself defines some permission scheme beyond the scope of classic Unix - like access control lists or whatever. Which, incidentally, would be another case where nested structures would be useful. :)

Lists are best represented as lists... not nested structures :P


Yeah, and ACL is a list. Each element within the list is a structure. And where do you find an ACL? Well, it's something that belongs to a file, which is represented as a record in a file list. So you've got a list of file records, each one containing an ACL (list of security info records). Depending on whether you consider the lists themselves a "level of nesting", that's either two or four levels deep.

See there's two kinds of useful: useful because I like it and useful because it gets stuff that needs doing done. I care only about the later. So if I can't see what would require a feature to be doable... I can't in good conscience include that feature (no definite pro but minor definite con = no such feature.)


At a certain point, the two become one and the same.

I mean, it would be entirely possible for us to do all our programming in assembly. After all, that's what it all comes down to in the end - machine code. We don't do that, because all those more elaborate structures on top of assembly are useful. They make it easier to do the things we want to do.

Generally, this gap in "how easy" it is to get something done in low-level vs. high-level language is so great that people won't even consider tackling complex problems in low-level languages. They just flat-out won't go there.

So a feature like named fields may seem unnecessary, and in the strictest sense this is true. But field names make the programs easier to work with, and easier to maintain in the long run. Once you're knee-deep in other forms of complication, having something like that to make your life a little easier can be practically essential.

I suppose it's good to consider regional variations...but I really don't care. It's not that much trouble to use decimal points to indicate decimal place. I've worked in both systems... and it just isn't a big deal to get users to switch. Of course, with my tools... the user could easily switch to using colons as field separators and neatly side-step the issue.


Well, you'd have to communicate to each program in the chain (via env. variable, perhaps) that you've switched delimiters. I agree it's not much of an issue, personally I wouldn't even go there in the serialization format itself. Serialization formats should be rigidly defined, and consistent across regions, so there's no guesswork required to interpret them. It's the UI - things like the shell syntax and value display mechanism, which should (perhaps) honor regional variations.


Anyway, I hope you don't feel like I'm being too harsh on your approach. Bear in mind that, for me, discussions like this are also an opportunity to test the value of my own ideas. Like, my whole approach, my whole design, is based on a certain set of assumptions. Things like the importance of robust, reliable, simple encapsulation, the practical and conceptual value of extending that encapsulation to any context (nesting), the value of a type system that's supported in the shell and in the serialization format, and the value of supporting binary data, and supporting it well. Justifying these ideas, being able to advocate them, is very important. I am, after all, planning to write a shell that is different enough from traditional Unix shells that many of the people who are open to the idea of running a command shell may find it distasteful. If I'm going to go that far beyond people's comfort zone, I need to be able to justify why I'm doing it, particularly if I want people to use the shell.

Likewise, every now and then an idea I like has to be cast off. I've gone through a couple of these in this thread, actually (though they were ideas I'd already cast off, but forgotten I'd done so...) If I find I can't justify something, then it comes under scrutiny.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Thu Aug 30, 2012 11:30 pm UTC

@tesujin
I hope you don't mind if some of the wall of text disappears. :D
I don't really feel you're being harsh. I'm doing much the same as you have/are. Testing ideas on others... especially, since I "can't see the forest for the trees" quite frequently (which is why I asked to be shown examples, I couldn't think of any because I am/was thinking at too low a level.)

A lot of our disagreement is really on how far we want to deviate from the "old school." I don't really want to change things too much (because I feel they mostly work, but are redundant/bloated/confused and could use some clean up.) Where as, you (seem to) have much greater needs of the shell and would prefer something with vastly more flexibility and power. This is reflected in the fact that you're redefining the shell and I just want all the tools to speak the same language. As a result, I lean more towards a non-flexible format that's mostly similar to formats already used and you want a very flexible format that allows all sorts of magic. 8-)

On the binary format thing... JSON provides you the exact same mechanism for indicating type as CSV does: fields. JSON allows you to name them (making it potentially clearer to a new-comer,) but the mechanism is still the same (and still requires a known convention.) XML additionally provides attributes... but attributes are just un-nestable fields (so everything applies to XML too.)

The real advantage of names is that they make order/location unimportant (eventually, you're going to learn what's what, so the benefit of telling you that is only temporary.) Which does give your format flexibility, which is certainly desirable for a shell pipeline... but as a common language of a (fixed) set of tools... well that flexibility really doesn't come into play. If someone does decide to re-implement my tools and rearrange things then I expect them to take the time to educate their users. And if someone rearranges fields while working on a script... then likewise, they should keep track of that (I understand that you want to make this easier... but it's not a problem I care about.)

As far as documents... HTML isn't really the document (though HTML is nested, as it uses "XML".) The document is the words, sentences, etc. which are linearly arranged (no nesting.) While you may have a word in a sentence in a paragraph... these are just different levels of organization that emerge from the flat, linear arrangement of characters.

ACLs are a single dimension (a list) and file records a second. CSV is 2d (records+fields.) Mind you, if there's more information in the record it could become a bit much...but that's true if you nest as well.

p.s. To explain how you begged the question. In short form, our dialog went like this:
Question: Are nested structures necessary?
Answer: Yes, to represent nested structures.
See, your answer assumes that the question is true (that nesting is necessary for those structures.)

p.s.s. you may notice I skipped your bit on expanding the scope of the shell... that's simply because I tend to prefer d.s.l.'s to "do-it-all" languages. Which is a discussion worth it's own thread.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Fri Aug 31, 2012 3:05 am UTC

tomCar wrote:
You've shown how to do it. Tetsujin (I suspect) and I don't think that you've shown how to do it well.

So you find that the method that you're using doesn't work well? That's a little confusing. 'Cause if you didn't notice my example was patterned after how metadata is introduced in JSON/XML. There is of course, your idea of using type headers. I'm not certain how well it would work (since I'd need to add some way of detecting them,) but it could work.

I think that comment may have been borne out of a misunderstanding. I was referring to how you add extra information (add a new column at the end) or remove information (leave it blank), neither of which we like.

Where as, I don't really think it's difficult to keep in mind the record structure that you're using.

So when writing in C, you refer to the field you want by its offset instead of its name? Or at least wouldn't mind doing so? :-)

Records may not be as self-descriptive as an object, but they are fairly intuitive if you've read the manual (see the output of ls -l.)

Reducing the role of "RTFM" is one (of several) goals of this project; IMO that is a major hindrance to learning and a major disadvantage vs a GUI. With a GUI, you can learn by experimentation -- you can play around, look through the menus, try stuff out, see what it does. (At least if you're not scared of breaking stuff, which you usually shouldn't be.) I consider this to be the single-best way to learn about to using a computer. I didn't have to RTFM about PowerPoint 2007 when I installed it despite the fact that it had an entirely new UI; you just look at it and it's fairly evident what it does. With a GUI, it's only when you need to start doing more advanced things where you have to go to the manual. This mode of operation is almost entirely impossible at the command line; there's basically no way that you'll guess what the arguments to ps mean for instance.

Hell, I've been using Linux in some capacity for basically a decade now, and I don't even know what the arguments mean -- I know that ps tells me a minimal set of information about the processes related to this shell, I know that ps aux shows me lots of info about everything, and I know that there are SysV and BSD versions of the command and GNU's tries to cover both, and that's literally all I know about the ps arguments. And what's more, I couldn't tell you at all what the columns are in the output of ps aux -- I know that PID is first, and the command is last, and that's all without looking. (*looks* See, I even got that wrong: PID is column 2.) I look at the headers. I'm sure I'm not the only one, or it wouldn't even print them. (This serves to argue both why "RTFM" is a bad answer and why I want names. I have, at some point, RTFM for ps. But I use that command relatively infrequently, and nearly never to look at something besides getting the PID for some process by its name. So there's basically no reinforcement for me to actually learn the information, and so it doesn't get learned. Now, I could make flashcards or something like that and actively try to memorize the man pages, but wouldn't it be better to not need to do that in the first place and instead spend that time being, I dunno, productive?)

I don't claim that adding names to things solves the "play around with stuff" problem, but I think it helps a little, because it will increase (in some cases, I'd argue dramatically) the space of what you can do without RTFMing.


For someone learning the shell as a "first language" I could see named fields as being hugely eneficial... but I'm not inclined to tailor my format for a single group.

As I argued above, it's not just beneficial to beginners. Look at my struct example again -- you don't see the Linux people accessing fields by offset instead of name even though they're all have lots of experience with programming C. They use field names because it's just better. I feel the same applies to the shell objects.

Just leave it blank then. It's not expensive at all, to have blank fields in CSV.

Code: Select all

info,,more info

If having extra commas isn't your thing... it should be possible to identify rows with less fields and treat them differently (of course, CSV doesn't handle multiple optional fields as well as JSON would, but I don't know if that's a common enough scenario that I need to worry about it.)

It's not expensive in terms of bytes, but we thoroughly disagree on how conceptually ugly it is; I think that it's very very ugly to have around those extra fields that you're not using any more. To me, it's just asking for trouble.

And you don't handle just multiple optional fields (you say it doesn't hand it "as well as JSON", I say it doesn't handle it at all without some sort of markers in your data anyway), but alternative fields.

Personally, I see nothing wrong with the second form. However, your later flat examples look fine in CSV.

Code: Select all

$ ls
bar.txt,2009,8,29,22,21
foot.txt,2012,8,29,22,21
$ ls | select where .2 ">=" 2010
foot.txt,2012,8,29,22,21

Now what happens when you want something to have a list of files in it?

Of course, this ignores the fact that ls wouldn't actually output just names and dates... but whatever, it works for discussion. I didn't make it clear, but I intend to include pattern matching (with several predefined patterns, plus wild card notation.) So I'll probably have YYYY/MM/DD.hh:mm:ss as my default date/time format.

How do you compare two dates then? Special case the comparison?

And how would you deal with INI or plain text config files? It's nice to get things "for free," but there's lots of other config file formats that neither of use will be able to easily operate on.

Write a converter. The point is that not necessarily that it will make operating on those structures trivial, but at least you can do it: JSON will let me do INI->JSON or even XML->JSON. Going JSON->CSV might as well be impossible.

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Fri Aug 31, 2012 6:01 am UTC

@EvanED
I think your C example is probably the most convincing argument that you have (and it is convincing.) I'm still not sure that you can't learn what goes where, but I suppose accessing fields by name is more natural...

Headers provide names. There is an issue with identifying headers though...there's several conventions that could be used (including forced headers.)

Some other points:
  • Dates can be string compared in YYYY/MM/DD format.
  • I've mostly been leaving fields blank because I wasn't using headers. Using headers I can safely re-organize fields (and omit them.)
  • You may not know ps' output off the top of your head, but chances are if I sat you down with it you'd figure out what column corresponds to what. This is all that matters (why would you need to know about ps if you weren't using it?)
  • No tool just randomly outputs different object types. There never will be a case where all records are not of the same type. (in response to your file list and alternate fields thing...but I believe I simply don't know what you're talking about.)
  • Converting from JSON to CSV isn't that hard. Field names -> header. Each object -> record. If you have heterogeneous types then you sort by type and separate by headers (or better yet separate into files to form proper tables.)
  • On a note about JSON. I feel it's ugly. It doesn't look like something a person would use. It's got quite a bit more noise than CSV, and not a substantially larger feature set. Of course, it is easy for a computer to parse, which makes it suitable for a pipeline, but as a human displayed format...not so much.

User avatar
tetsujin
Posts: 426
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Fri Aug 31, 2012 8:55 am UTC

Indeed, I don't mind if you pare down some of the "wall of text". I was struggling to edit it down while still addressing what I wanted to address. That proces takes a lot of time, though, so eventually I just had to send the message and be done with it. :)

The real advantage of names is that they make order/location unimportant (eventually, you're going to learn what's what, so the benefit of telling you that is only temporary.) Which does give your format flexibility, which is certainly desirable for a shell pipeline... but as a common language of a (fixed) set of tools... well that flexibility really doesn't come into play.


The set of tools isn't fixed. Anyone can write a new program and add it to the shell environment. Anyone can write a shell script that writes out a data stream. And my hope is that a time will come when people will actually want their program to work really well in my shell. At the beginning that's an unrealistic goal, but in the mean time people are still going to use the tools I provide to create data streams of their own, or consume a data stream output by another tool - so there should be some variety there in terms of how people format their data.

And there's the script portability/reliability issues I mentioned earlier - named fields don't solve the problem entirely, but they realy do help.

p.s. To explain how you begged the question. In short form, our dialog went like this:
Question: Are nested structures necessary?
Answer: Yes, to represent nested structures.
See, your answer assumes that the question is true (that nesting is necessary for those structures.)


The answer does not assume the question is true: The nested structures I was referring to already exist. So "to represent nested structures" is to break down a major barrier in effectively communicating with other programming environments. That is (one reason) why nested structures are a valuable thing to have in the shell.

I gave other examples as well: ACLs in a file listing, stream-processed records with additional data tacked on, etc.

As far as documents... HTML isn't really the document


What? Yes it is! Or at least the computerized representation of one - which is the only kind of document the shell's going to be messing with anyway. :)

ACLs are a single dimension (a list) and file records a second. CSV is 2d (records+fields.) Mind you, if there's more information in the record it could become a bit much...but that's true if you nest as well.


You can't use the second dimension (columns) for ACL data, because it's already used for other stuff.

I mean, let's say this is the output of an "ls"-style program:

Code: Select all

filename,size,ACLs
derp,20124,nobody can access this!,"Except for tetsujin, cause he's awesome","I guess if Wil Wheaton wants to read it he can take a look too, but read-only"


So, first problem: "ACLs" is the third field in the header, but it actually accounts for third, fourth, and fifth fields in this example.
Second problem, what happens if you need to add a fourth field? You can't add it at the end, because the ACL data simply extends to the end of the record. You could insert it before "ACLs" (the fieldname header means you can shift fields around - as long as everybody reading the stream is looking up fields by name instead of position) - But then, what if you need a second list of something? There's no way to keep the two lists distinct without encapsulating each list into a single field.

tomCar wrote:You may not know ps' output off the top of your head, but chances are if I sat you down with it you'd figure out what column corresponds to what. This is all that matters (why would you need to know about ps if you weren't using it?)


When you want to write a script and have it work on multiple platforms, you need to be able to access those same fields in the same way. It's not a question of whether the user can sit down, type "ps", and understand its output - it's a question of whether a script, once written, can do this. It's a lot easier to get that kind of stability in the long term if you access fields by name.

p.s.s. you may notice I skipped your bit on expanding the scope of the shell... that's simply because I tend to prefer d.s.l.'s to "do-it-all" languages. Which is a discussion worth it's own thread.


The shell is the environment where everything gets connected together. It's like a big solderless breadboard into which all the other components get plugged.

So to me, it's entirely appropriate for the shell to be a "do it all" language - in the sense that the shell environment should be a great place for working with other tools that know how to do all the actual work.

For instance: audio processing? Totally beyond the scope of the shell. But you've got tools on your system that can do things with audio files - add effects, change formats, etc. So at a basic level, audio data is something you can pass around in pipes (indeed, it already is. It's just not type-tagged, so if you mistakenly passed audio data to a program that can't deal with it, or tried binding it to a variable, you wouldn't get a helpful error, but rather just a big mess.) At a somewhat more advanced level, the shell could answer questions like, "what kind of operations can I perform on this data?" or "how do I convert this to another type of file?" - or, if you really don't care how it's done, you could just ask the shell to do it, and the shell will pick a good tool for the job.

In this example, the shell isn't delving into the realm of audio processing - that job is still left to tools that are made specifically for that job. The shell is just acting as a facilitator, a go-between. To do that, it needs to have a concept of data types (one which can extend beyond the types the shell itself can directly handle)

No tool just randomly outputs different object types. There never will be a case where all records are not of the same type.


I wouldn't use the word "never" quite so freely. :)

In object oriented languages there's the whole notion of polymorphism, objects which share a root type and interface but different specifics in their behavior and implementation. So something analogous could reasonably pop up in a data stream - not "objects" necessarily but records which at some level can be treated the same, but which also differ in some important ways.

A program could parse program code and write out the code in structural form - in which case you don't just get a list of "statements" - you'd get some built-in constructs (like conditionals or loops), some function calls (with varying numbers and types of arguments), some data declarations, and so on.

If you connected to a vector graphics program via a scripting interface to look at the elements that had been drawn into the current document, they would not be all of the same type. You could see shapes or lines, defined by varying numbers of points - some of which might have a graphical texture applied to them while others would just have color.

If you had a program whose job was to dig through /lost+found after a disk failure and try to triage everything, then certain fields would likely be common to all file types, while others would be particular to individual file types. Media files, for instance, could have a "run length" field and "bitrate" field - which would make it easy to tell the difference between an MP3 song and an MP3 audiobook chapter.

If a data stream includes mostly homogenous records, there is still the possibility that the tool writer will want to signal some unusual or exceptional case, like an error, as a record in that stream. While it's possible that the exceptional-case record could simply be fit into the same structure, I think it's likely that a different record structure would be better suited to handling such a case.

If you used a tool somewhat like "grep" to search a bunch of files for records containing a certain string of characters, the records returned by that tool won't necessarily be of the same type. Each record type will be whatever it was in the file it came from.

Although I agree that most of the time, you probably want the records in a stream to all share the same format (after all, if they're not of the same format, why are they in the same stream, being processed by the same tool?), I think there are also cases where it's useful for the format of those records to diverge a bit.

Converting from JSON to CSV isn't that hard. Field names -> header. Each object -> record. If you have heterogeneous types then you sort by type and separate by headers (or better yet separate into files to form proper tables.)


CSV can't directly express the full structure of JSON - nesting is the most serious omission.

On a note about JSON. I feel it's ugly. It doesn't look like something a person would use. It's got quite a bit more noise than CSV, and not a substantially larger feature set. Of course, it is easy for a computer to parse, which makes it suitable for a pipeline, but as a human displayed format...not so much.


I don't think the idea is for it to be a "human displayed format" at all. One of the reasons for going to a shell-standard serialization format is to get away from the Unix tradition of "value output that doubles as display output". The idea is that, generally, the user's not going to look at the encoding of the data, but rather a representation of the data's meaning - in much the same way as a debugger attempts to provide a useful representation of the variable values being watched, rather than the raw bytes in memory.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Fri Aug 31, 2012 10:30 pm UTC

If you aren't using JSON on the user side...then it works out fine. Though you do have to ask yourself what format you actually want to display in and whether or not you want that done implicitly. If I recall correctly, the later was decided against, though I don't remember if anyone decided on a user side format. However, using 2+ formats is directly opposed to my goal of unification.

There's been no mention of why nested structures would be useful in the core tools. 3rd parties are not my responsibility. Existing structures are also none of my concern. (It very much is begging the question to assume they exist when the question was always about what my tools should do. I will admit that I assumed you'd understand "utility" as referring to a "built in," considering that's what this thread is about. :wink: )

CSV doesn't prohibit names. So any script will still be portable (if they reference names.) It really isn't even an issue to add this feature. After all, I was already referencing columns; it's just a matter of aliasing them using the header. (quick aside about conversion, nesting can be handled with references.)

In regards to ACLs: Listing them does work fine. You're just doing it wrong. :P

Code: Select all

filename,size,access-everyone,access-tetsujin,access-wil-wheaton
derp,20124,no,yes,read-only

You can now feel free to rearrange and add/remove any fields you want.

HTML is just the encoding. If you change the document to use markdown it doesn't suddenly become a different document. Besides, HTML isn't even in the problem space of the core tools.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Sat Sep 01, 2012 12:23 am UTC

tomCar wrote:@EvanED
I think your C example is probably the most convincing argument that you have (and it is convincing.) I'm still not sure that you can't learn what goes where, but I suppose accessing fields by name is more natural...

I'm not saying you can't learn it. I'm saying that learning the order of your most-frequently-used utilities will take longer (another example: I tried listing from memory the columns of ls -l, something I run many times a day, and I didn't get it exactly right: I missed the link count field) and you probably won't learn the columns of commands you use infrequently, at least if you're anything like me.

Dates can be string compared in YYYY/MM/DD format

Well, in the common case, yes. It becomes harder or more annoying if you want to do something like "find me things created in any July".

I've mostly been leaving fields blank because I wasn't using headers. Using headers I can safely re-organize fields (and omit them.)

Absolutely... but now you've reduced one of the benefits of CSV over JSON, and introduced a number of other problems. (E.g. what's the convention? Are they in a separate line? If so, then individual lines no longer stand on their own.)

You may not know ps' output off the top of your head, but chances are if I sat you down with it you'd figure out what column corresponds to what. This is all that matters (why would you need to know about ps if you weren't using it?)

I have two objections. First, everything is not always entirely clear. For instance, without looking at the headers, it may not be much better than a 50/50 shot to guess which colum is MEM% and which is CPU% for instance. I guessed correctly, but I was only maybe 70% sure. If I was actually using it for something, I'd have had to go look at a key. Second, running the command and looking at the output is just a less annoying version of going to the man page -- it's something that I think it'd be great if we could eliminate. When building up a longer pipeline, I have to spend a fair bit of time testing it after many of the stages to make sure I'm pulling out the right fields and such. Names don't solve that problem, but I think that by making it easier to learn, they will reduce it. They can also sometimes improve the error messages; for instance, if I say sort --key=field1 and there's no field1, it can tell me "there are field2, field3, and field4", which means that I no longer have to sigh, go cut out some pipeline stages at the end, run it, see what's wrong, fix it, then add in the missing stages -- I can just fix it.

No tool just randomly outputs different object types. There never will be a case where all records are not of the same type. (in response to your file list and alternate fields thing...but I believe I simply don't know what you're talking about.)

Hah, you're just not imaginative enough. Even something as simple as stat might want to do this (Unix ls does too, but I'm breaking the -l functionality into stat so my ls "equivalent" won't). As a weak example, take the mode bits. It's conceivable to me that instead of outputting just some integer where clients can mask out bits and such, it may be better to have stat separate out some or all of the bits into their own fields. But not all mode flags always apply -- for instance, the setuid flag only applies to executable files. So it would make sense for stat to omit that field from directories and perhaps even non-executable files.

For a better example, both ls -l and stat output the target of a symbolic link. But there's a field that definitely doesn't apply 99.96% of the time. (99.96% statistic taken from a real experiment with the contents of my home directory on the computer I'm typing this on: 49454 objects, 22 of which are symlinks.) Should that field have to sit around for the 2247/2248 objects that aren't symbolic links? I emphatically say "no"!

Converting from JSON to CSV isn't that hard. Field names -> header. Each object -> record. If you have heterogeneous types then you sort by type and separate by headers (or better yet separate into files to form proper tables.)

But now you've just made things much more complicated, as you need a way to name the objects (a key) and an individual line no longer completely describes an object, because you need all of its components as well. (You'd have been better off arguing that you can flatten them. That's merely "ugly" as opposed to "major problems" :-).)

On a note about JSON. I feel it's ugly. It doesn't look like something a person would use. It's got quite a bit more noise than CSV, and not a substantially larger feature set. Of course, it is easy for a computer to parse, which makes it suitable for a pipeline, but as a human displayed format...not so much.

Tetsujin's comment was exactly correct: the JSON output isn't meant for human consumption.

Put it this way: in an ideal world (for some definition), there wouldn't even be a serialized format: the actual objects would just be passed from one program to the next in the pipeline. However, that requires imposing way more than I'm willing to in terms of what languages and platforms could be used by the shell utilities. So I need some serialized form of objects; JSON seems like a good mix between being powerful enough to represent actual PL objects with nesting and at least a small notion of types and still being human readable, while still meeting my other desiderata like named fields.

tomCar wrote:If you aren't using JSON on the user side...then it works out fine. Though you do have to ask yourself what format you actually want to display in and whether or not you want that done implicitly. If I recall correctly, the later was decided against, though I don't remember if anyone decided on a user side format. However, using 2+ formats is directly opposed to my goal of unification.

The "only" reason I think implicit is bad is because the only way I know to do it in current shells is by autodetecting whether the output is going to a TTY or not, and my personal view is that causes more problems than it solves; I don't even like it for determining whether to produce colored output or not.

However, longer term plans are to replace the shell (and even the terminal!), at which point the shell be able to tack the extra output step on there unless explicitly requested not to.

(Actually now that I think about it more, maybe you could do it in some of the PRE_CMD hooks or whatever, but that sounds ugly if it's even possible.)

User avatar
tetsujin
Posts: 426
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Sat Sep 01, 2012 4:59 am UTC

tomCar wrote:If you aren't using JSON on the user side...then it works out fine. Though you do have to ask yourself what format you actually want to display in and whether or not you want that done implicitly. If I recall correctly, the latter was decided against, though I don't remember if anyone decided on a user side format. However, using 2+ formats is directly opposed to my goal of unification.


It's not a second format, it's just a more visual way of representing the same data structure. My plan is that this will be automatic (basically, there will be a distinction, probably a segregation, between programs designed to work with the shell's data model and more "traditional" programs that make no assumptions about the shell. If the shell is running something that outputs a data stream, then it'll show a nice representation of the data stream instead of feeding raw data to the terminal.

As for the specific method for how I'll display values - for now it'll probably be the same syntax one could use at the shell to type those structures back in - except with an effort made to lay out the data nicely.

There's been no mention of why nested structures would be useful in the core tools. 3rd parties are not my responsibility. Existing structures are also none of my concern. (It very much is begging the question to assume they exist when the question was always about what my tools should do. I will admit that I assumed you'd understand "utility" as referring to a "built in," considering that's what this thread is about. :wink: )


I'm not "assuming" anything. These structures exist. Supporting features in the serialization format adequate to represent the kind of data elsewhere kind of boils down to a question of how frustrating (or easy) do you want it to be to work with that data in the shell? To me the answer is "very easy".

To say you don't care about other people's programs, or other people's data, when designing a shell is kind of a closed-minded view. The whole point of a shell is to run other people's programs, hook them together, and use them to work with different sorts of data. The whole point of designing a new shell is to do all that better.

In regards to ACLs: Listing them does work fine. You're just doing it wrong. :P

Code: Select all

filename,size,access-everyone,access-tetsujin,access-wil-wheaton
derp,20124,no,yes,read-only

You can now feel free to rearrange and add/remove any fields you want.


Add a second file to the list and problems (scalability issues, etc.) start to appear:

Code: Select all

filename,size,access-everyone,access-tetsujin,access-wil-wheaton,access-matt-smith,access-amber-benson
derp,20124,no,yes,read-only,,
quack,90210,no,,,read-only,yes


These two files define ACLs for entirely different sets of users - so the header has to have "access" fields for every user who's going to appear in an ACL. Either that, or each record gets its own header, and records in the list don't necessarily have the same structure.

If having the header field as a superset of all the users who'll appear in an ACL doesn't seem like a big deal... Well, suppose you're doing this for every file on the system (like the output of a "find" command, but with ACL data included in the output.) You'd have to run the whole "find" job, buffer the whole result to get your field names, and then you could start streaming it out. It may not even be adequate to include every user in /etc/passwd in the header - the filesystem could be a network filesystem, or a FS from another OS - in which case the set of users may not be the same. (In that case, presumably, you'd identify the user by whatever means the filesystem uses to identify them - a numeric UID or a textual username - whatever.)

But to me the larger issue here is that while, technically speaking, you can encode anything in CSV, it may not be a good choice. Often in programming, it's important to get data into a format that's easy to work with for what you're trying to do. The more "creative" you have to get to put that data into that format, the harder it will be to work with it. And the farther you stray from the way that format was designed to be used, the less support you'll get from your toolset.

HTML is just the encoding. If you change the document to use markdown it doesn't suddenly become a different document. Besides, HTML isn't even in the problem space of the core tools.


1: There's no reason HTML shouldn't be in the problem space of the core tools. Traditionally speaking, they are text-processing tools. More specifically, they're designed to operate on simple data structures encoded in text. HTML is a rather more complicated data structure - it would be hard to write the core tools in a way that would let them operate directly on that data (without simply specializing them to operate on XML-style encodings). But if you have one tool that parses the document into a structure the shell and its tools understand, and another tool to translate it back, then working with that data is relatively easy and can be done with generic tools.
2: Historically speaking, working with HTML in some capacity has been within the scope of the shell. For instance, CGI scripts, or programs that fetch an HTML document and pick bits of raw data out of it - written as shell scripts.
3: While it's true that HTML is "just" the encoding - the encoding is what you'd be working with if you wanted to use a script to alter that document. At some level you'd have to cope with the low-level aspects of the encoding (the specific syntax, etc.) - and at some level you'd have to cope with the high-level aspects (the ideas that govern how documents work within the HTML language.) There is no means of operating on that document without dealing with some kind of encoding.


One example I like to use to justify a lot of the stuff in my shell design is to think about the question, "What would it take to deal effectively with a format like XML in the shell?" That is to say, I don't want to deal specifically with XML - rather XML is acting as a representative for all the other encodings in the world that I might want to deal with in the shell. I want to think about what it would take to start with a format at that level of complication, and work with it in the shell.

In most programming languages you'd probably do it one of two ways: either you'd call a routine that would parse the whole file and store it as a data structure, or else you'd create some kind of "reader" object (that retains some state between calls, so it can remember things like the rough structure of the file and at what point in the file we are) - and interact with that, reading in or skipping over bits of data as desired.

The former approach wouldn't work in a shell like "bash" - its variables aren't really up to the job of storing that kind of data structure... You could get around that by storing an encoded representation of the data... But you've already got an encoded representation of the data, so that doesn't gain you much. You'd face similar problems passing the "parsed" data over a pipeline - there's just no common format that can represent what XML can represent, but which the tools already understand. Thus, the Unix methodology of solving the problem of "parsing XML" with a small, single-purpose tool more or less fails with this approach. You can write a tool that will parse that XML and write out another representation of it, but that other representation will be as complicated as XML, and no more likely to be supported by the tool on the receiving end of the pipe.

Going the other route, having a reader that the other code interacts with (obtaining or skipping over certain fields, all that) - there are ways to do it. For instance, you could implement your XML reader command such that when it opens a file, it forks a background process which does all the real work. Each individual invocation of the reader command would then communicate with this background process, dispatching additional commands and receiving additional results. (The background process is a kind of optimization - it retains information about the file that's been previously attained, so successive calls don't need to go through the process of doing those initial parsing steps again. A slightly more straightforward approach would be to run the XML reader as a coprocess - coprocesses would require a little less work to implement, as long as the shell supports 'em - but then you get back into encoding issues again - responses from a coprocess are typically, but not necessarily, single-line responses. So if you pull a piece of XML data that spans lines, you have to either encode that onto a single line, or encode it in a way that you know how many lines are part of the response to the query you just made.)

This works a little better, at least in the case where your shell script is interacting with the tool. This isn't the sort of thing that you could hand to another program and say, "OK, data's parsed, operate on this." It's also a bit more complicated to write a program this way - you need to set up named pipes to communicate with the background process, remember its PID so you can tell if it's died, maybe make it time out or something if it's not used for a while and the caller has neglected to shut it down.

If the shell supports a data format that can express the structure of the source file, then the first option is pretty easy. The core tools should be equipped to deal with all the format's organizational mechanisms, which means that flat-out transcoding the file lets you work with it, using these tools, in a way that reflects its original strucutre. The second option is still difficult - and that has led me to think about how to improve coprocesses and other ways of dealing with the issue.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Sat Sep 01, 2012 6:26 am UTC

I'm a little tired of having my session expire while I'm working so...I'm going to pull out your posts, organize, and think about them a bit before getting back to you. However, I did notice something that can be addressed quickly: what's a good "pretty print" format? (as I don't believe that's been addressed.)

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Sat Sep 01, 2012 6:52 am UTC

Right now I have a tabular printer. There's some examples of my stuff in the spoiler of this post; the format is, uh, "inspired by" what you see with PowerShell.

In the somewhat short term, I also want to add a stanza printer, along with some logic to automatically choose the better of those (as if the width of each line is greater than the column width of the screen, tabular format becomes essentially completely unreadable). The stanza format (corresponding to some of the last tabular example in my link above) would look something like

Code: Select all

in-directory: .
kind:         directory
name:         coreutils
permissions:  509
size:         4096

in-directory: .
kind:         regular file
name:         display-table
permissions:  509
size:         159

in-directory: .
kind:         regular file
name:         list-directory
permissions:  509
size:         148

(I didn't fix most of the display problems even though I hand-wrote this example; the ordering and display of the permissions bits and size need to change.)

In the longer term, I think the terminal ought to be replaced with something which can display actual tables and will allow more smooth exploration of the data that has been outputted. (There are some other people who are working sorta in this direction now, e.g. this, though no one has done quite what I want.)

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Sun Sep 02, 2012 3:59 am UTC

So you're limiting your rewrite to xterm replacements? Or is your new terminal going to figure out what to do based on the raw text output?

I ask because you're going to have to keep the old tools around if you intend to require X, which makes your tools useless for my purposes (as I'm frequently not running X.)

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Sun Sep 02, 2012 5:22 am UTC

A terminal replacement is a long time away; some of the common utilities come first, then a new shell, and only then a terminal... and progress is slow.

The utilities won't need to change at all, as they'll still use JSON as the interchange format. The shell may change a little to take advantage of the terminal when it can, but mainly the only thing that'll need to be done is to add another output backend that will output tables or whatever in a way the terminal will be able to interpret. The existing ones that output to text will certainly stick around.

Now, all that said: I will need all this to work over SSH, but aside from that, if there's a conflict between "I can do this awesome thing" and "but it'll need X (perhaps on the local side of an SSH tunnel)", then X will win out. (It'll also have to work in Windows. :-))

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Sun Sep 02, 2012 5:28 am UTC

Well, so long as textual display tools stick around I guess everything works out... after all, I'm certainly not the only person who will require graphics-less environment.

How do you intend to handle nested objects in your display tools? Flatten them? As a single field or multiple?

EDIT: It really doesn't take that long for most of the tools... awk is probably the hardest, but that's simply because it's got it's own language. I mean, here's the status page of my project (minus formatting changes, as I'm waiting to either finish up our dialog/agree to disagree.)
Done:
ascii
basename
cal
cat
cleanname
cmp
date
echo
hashsum
ls
mkdir
pbd
pwd
read
rm
seq
sleep
tail
tee
touch
unicode

In Progress:
split

TODO:
awk
bc
dc
dd
diff
du
ed
factor
fmt
fortune
freq
getflags
grep
hoc
join
listen1
look
mk
mtime
primes
rc
sam
sed
sort
split
ssam
strings
test
tr
troff
uniq
wc
yacc

man pages

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Sun Sep 02, 2012 6:01 am UTC

Not sure of what the tabular view will do yet. The stanza view will display it more as a tree, probably collapsing subcomponents small enough for one line into that line. (Possibly nesting that can't be nicely rendered in a single cell will be something that triggers the stanza view by default.) The eventual terminal will have a more interactive view, where you can contract/expand subcomponents.

As for "It really doesn't take that long for most of the tools... awk is probably the hardest, but that's simply because it's got it's own language," yes, that's true, but I'm also not working on it all that much; I've got other projects too (and laziness) that compete for my free time. The shell will also be... a bit of an undertaking; I'd guess more than all the utilities.

User avatar
tetsujin
Posts: 426
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Sun Sep 02, 2012 7:24 pm UTC

About the whole value display thing, and the requirement of a graphical display for the more sophisticated styles of value display, and all the questions about whether the shell has to adjust its behavior depending on what kind of terminal it's in..

First, one can do quite a bit with a straight-up terminal. Consider the sorts of UIs you can create with curses - that's what I want (ultimately) when the shell is running inside a terminal. In this model, the shell can have a pretty sophisticated interactive value display, including things like "drilling down" into hierarchies and expanding/collapsing the display of fields

For the shell really to take charge of the terminal like this means it will have to be careful about how it interacts with programs it runs - after all, a program you run (even in the background, in some cases) may open /dev/tty and write control codes to it. This could interfere with the shell's value display (as values are displayed as soon as they're available on the job's stdout pipe) - to prevent programs' terminal I/O from interfering with that of the shell, the shell needs to include terminal encapsulation functionality, line that of "screen"

Of course, sometimes people won't want to run in this sort of mode - they'll want shell output that can easily be written to a logfile or whatever - so the shell will also need a more basic I/O mode that delays value output display until the job is done - and lets all programs in the job share the same TTY as the shell instead of providing virtual terminals

I have given some thought about what environments I want to be able to run this shell in (and run it well) - things like, I SSH into my home machine from my phone a lot... Or I can ssh into my router - it's a very minimal system but I'd like to be able to run my shell on it

So even in The Distant Future when I have written not just the shell but also a cool alternate GUI terminal for it, there still has to be a couple fallback modes

The idea with the GUI terminal is thatit would be a separate thing the shell can interact with, rather than part of the shell itself. The terminal would usually run on the "local" system while the shell might run remotely. When a job writes out a value stream to stdout, the shell would send that value stream over the net link- rather than the display of it. This also means the locally-running "terminal" could take on responsibilities like handling command history and the basic entry of the commands - stuff that doesn't actually need to go over the network can be handled locally. It also means the terminal can support features like passing data back and forth over the network link - like a ssh client that supports scp, except that the shell running on the remote end would be involved in the process as well, so you could target not only files but shell variables and other data within the shell. The terminal could then exchange this data with programs or shells on the local machine, or establish pipelines over the network between local and remote programs.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Sun Sep 02, 2012 10:31 pm UTC

So I've thought about the curses idea, and while I think in some ways it'd be neat, I don't think it's something I'll want to do. There are some things that I want the "terminal" to do that are just fundamentally not possible with today's terminals (aside from a couple small projects), such as display pictures and render blocks of output that look like English text using a proportional font. (Or maybe... any text. I'm not sure that'd go over well though.) The other problem is if you start making it act like screen in the sense that the curses thing becomes a proxy between the program being run and the actual terminal, you break the terminal's ability to scroll; I consider this unacceptable (and don't use screen largely because of it). I get annoyed enough when I am using a text-based editor and run into that problem. :-)

User avatar
tetsujin
Posts: 426
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Sun Sep 02, 2012 11:40 pm UTC

EvanED wrote:So I've thought about the curses idea, and while I think in some ways it'd be neat, I don't think it's something I'll want to do. There are some things that I want the "terminal" to do that are just fundamentally not possible with today's terminals (aside from a couple small projects), such as display pictures and render blocks of output that look like English text using a proportional font. (Or maybe... any text. I'm not sure that'd go over well though.) The other problem is if you start making it act like screen in the sense that the curses thing becomes a proxy between the program being run and the actual terminal, you break the terminal's ability to scroll; I consider this unacceptable (and don't use screen largely because of it). I get annoyed enough when I am using a text-based editor and run into that problem. :-)


It does have some potentially awkward issues. But I think it's the best available middle ground between a traditional shell session (which limits what the shell can do with the terminal a bit) and a GUI (which may not be ready for a while)

I agree on all points that there are things I want to do with the shell that go beyond what you can get from a terminal window. Basically all the stuff you said, plus a few miscellaneous others like progress bars and so on. I'm all in favor of proportional fonts, I figure they're fair game from any output coming from a "shell-native" program.

As for scrolling - I agree, it sucks to lose that. The only consolation I can find there is that the shell could then implement its own scrollback buffer to make up for it. That makes the situation a little better - but it does have certain implications, like the scrolling feature is implemented at the far end of the user's ssh connection instead of locally, things like that.

So I don't know, really. Could be, like you say, it's better to just skip it, focus on the GUI when I'm ready to do that. But it'd be cool to have a little richer environment in the shell even when I'm sshing from my phone or whatever.
Last edited by tetsujin on Mon Sep 03, 2012 1:11 am UTC, edited 1 time in total.
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?

troyp
Posts: 557
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby troyp » Mon Sep 03, 2012 12:07 am UTC

I think good display is very important for administrative tasks. It's basically the only reason I use a graphical file manager despite their inevitable inadequacies. (Also, drag and drop, etc, but there are probably curses fms that support those.) Something like GTK has a definite edge over curses if you want nice displays (also, integration with the rest of the system).

You know, I share you guys' frustration with the archaic nature of unix shells in general, but sometimes the things that bother me are quite simple. I'd love to have a terminal that's integrated into a graphical fm. There are graphical fms that have a terminal frame in their window (like the KDE KParts terminal), but they always act just like a "spawned" terminal. They start in the right directory (and in some cases you can send in other info using custom actions), but then they're just an independent terminal. You can't even change the directory in the fm pane from the terminal, let alone do stuff like view a list of filenames output from a command as files in the fm pane*.

To me, making a shell an accessory to the fm has it backwards. The graphical pane should be a display (and preferably, drag-and-drop interface) to the shell.

Of course, these capabilities will be part of EvanED's graphical shell, but that's a more ambitious project. There's no reason this sort of integration shouldn't be part of file managers now, afaict. One day, if I ever get around to it, I might try to hack it into one of the lightweight fms (but they're probably all written in C (groan)).

* although some orthodox fms look like they may have at least some of this. I need to try them (I don't know why I don't use them now, I usually have two panes open anyway)

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Mon Sep 03, 2012 3:40 am UTC

@troyp
Lol, I'm actually moving away from graphical file managers all together. I find they just don't offer me much of anything. Though I do miss the ability to copy/move things via ctrl-c+ctrl-v/d'n'd...

What you're describing sounds like a command oriented fm... which could be interesting. Have a pane for the directory's contents and (let's say) an expanding command section (displays contents of last command with all others in a scroll buffer.)

@EvanED
I use tmux specifically for scrolling... so it seems weird that screen would break your scrolling.

@all
Curses could work to provide an X-less version.

Personally, I don't think the shell should do any rendering. If a program wants it can do so for itself (as mplayer does.) I also think that text progress bars work fine as is.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Mon Sep 03, 2012 4:06 am UTC

tetsujin wrote:It does have some potentially awkward issues. But I think it's the best available middle ground between a traditional shell session (which limits what the shell can do with the terminal a bit) and a GUI (which may not be ready for a while)

On the first point, I definitely get the motivation. But I feel like for me, I would use the curses version almost never once the X version was done, and it would be a substantial enough effort that I'd rather just put that time into the X front end instead.

As for scrolling - I agree, it sucks to lose that. The only consolation I can find there is that the shell could then implement its own scrollback buffer to make up for it. That makes the situation a little better - but it does have certain implications, like the scrolling feature is implemented at the far end of the user's ssh connection instead of locally, things like that.

tomCar wrote:@EvanED
I use tmux specifically for scrolling... so it seems weird that screen would break your scrolling.

So the reason I said that was because I was under the impression that both screen and tmux would break both the actual terminal's scroll bar as well as the mouse wheel. This was my conclusion from several years ago, but it occurred to me that things may have changed or I might have tested it badly at the time, so I just gave it another attempt. From my quick test, both are true in the default install. Somehow the terminal figures out that the scroll bar is useless so it removes it. The mouse wheel also doesn't scroll -- however, it does bring up command history. This latter thing makes me hopeful that you could configure the scroll wheel to scroll. Assuming you can, that takes away much of my objection. I'll have to reevaluate my lack of use of those programs. :-)

Though I do miss the ability to copy/move things via ctrl-c+ctrl-v/d'n'd...

This gives me an awesome idea... copy/cut/paste command-line programs. Currently if you want to move stuff from directory A to directory B you have to specify the path from A to B or B to A in the cp command. But imagine you have one terminal open in A and you just type copy * and then you have one terminal open in B and you type paste and it copies stuff over. I sort of view this as being like some of the difference between being able to gradually stage files in Git's index vs in Subversion just having to put everything in the one svn commit command.

If anyone writes this then let me know, because I now want it. :-) If not, I'll add it to my to-do list.

Edit: actually, you could even use "the" clipboard as the interchange between the commands -- just put the file names on the clipboard. With some caveats, this means that copy is even already written... just run echo * | xsel and have paste read from whichever clipboard that writes to! (In actuality I think you need something much more robust to spaces and other weird characters, so maybe you have to run some find command with -print0 or something.)

Personally, I don't think the shell should do any rendering. If a program wants it can do so for itself (as mplayer does.)

I think a lot of things would benefit from being able to render stuff right in the shell. I wouldn't want something like a movie to play in it, but I do think it'd be pretty cool if ls could show you a small thumbnail of each image or something like that. Imagine if every text command you ran now opened in a new window. Personally I think that'd get really annoying really quickly -- but that's the current situation for everything that you don't want to appear as monospace text.

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Mon Sep 03, 2012 4:36 am UTC

tomCar wrote:@EvanED
I use tmux specifically for scrolling... so it seems weird that screen would break your scrolling.

So the reason I said that was because I was under the impression that both screen and tmux would break both the actual terminal's scroll bar as well as the mouse wheel. This was my conclusion from several years ago, but it occurred to me that things may have changed or I might have tested it badly at the time, so I just gave it another attempt. From my quick test, both are true in the default install. Somehow the terminal figures out that the scroll bar is useless so it removes it. The mouse wheel also doesn't scroll -- however, it does bring up command history. This latter thing makes me hopeful that you could configure the scroll wheel to scroll. Assuming you can, that takes away much of my objection. I'll have to reevaluate my lack of use of those programs. :-)

tmux apparently lets you override its scrolling behavior (according to a quick google) to allow you to use the mouse.
Though I do miss the ability to copy/move things via ctrl-c+ctrl-v/d'n'd...

This gives me an awesome idea... copy/cut/paste command-line programs. Currently if you want to move stuff from directory A to directory B you have to specify the path from A to B or B to A in the cp command. But imagine you have one terminal open in A and you just type copy * and then you have one terminal open in B and you type paste and it copies stuff over. I sort of view this as being like some of the difference between being able to gradually stage files in Git's index vs in Subversion just having to put everything in the one svn commit command.

If anyone writes this then let me know, because I now want it. :-) If not, I'll add it to my to-do list.

Edit: actually, you could even use "the" clipboard as the interchange between the commands -- just put the file names on the clipboard. With some caveats, this means that copy is even already written... just run echo * | xsel and have paste read from whichever clipboard that writes to! (In actuality I think you need something much more robust to spaces and other weird characters, so maybe you have to run some find command with -print0 or something.)

One issue with that is there's still a lot of typing. Further, I'm not sure that single terminal copy/paste will be more efficient than cp (considering there's 3 commands required copy; cd; paste.) Of course, if you already have two terminals open and in the correct locations it will be faster (which I'm not sure how often that happens.) Maybe if we have copy assume that the current directory is the target directory (no name change) and just specify the source files.
Personally, I don't think the shell should do any rendering. If a program wants it can do so for itself (as mplayer does.)

I think a lot of things would benefit from being able to render stuff right in the shell. I wouldn't want something like a movie to play in it, but I do think it'd be pretty cool if ls could show you a small thumbnail of each image or something like that. Imagine if every text command you ran now opened in a new window. Personally I think that'd get really annoying really quickly -- but that's the current situation for everything that you don't want to appear as monospace text.

Cool yes, but how practical? At least for me, most of my time isn't spent working with pictures (limited to a single invocation of feh to set my background.) I do think that programs should render in the terminal, but that they should do it themselves. Put differently, the shell shouldn't try to figure out is "some/file" is a render-able file. You could implement a command (say as a dbus interface) that would tell the shell to render an image, but I think having a library that provides generic in terminal rendering might be a better option (or at least more flexible; possibly sdl or something already fills this role.)

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Mon Sep 03, 2012 4:58 am UTC

tomCar wrote:tmux apparently lets you override its scrolling behavior (according to a quick google) to allow you to use the mouse.

This is pretty unsurprising given that it already recognizes mouse input. (I had forgotten that console programs can do that in Linux.) It does make me think I'll probably try it out and see what I think. I'm not happy about the loss of the scroll bar, but I can probably get over that.

Further, I'm not sure that single terminal copy/paste will be more efficient than cp (considering there's 3 commands required copy; cd; paste.) Of course, if you already have two terminals open and in the correct locations it will be faster (which I'm not sure how often that happens.)

My initial thoughts are that it happens pretty often in my workflows, which is why I'm excited, but maybe it's worth paying explicit attention to. That may even be worth some modification of the workflow to accommodate. Another use case that is nice is if you want to copy multiple files that can't be easily globbed -- I think you could say copy file.txt; copy bar.txt; copy foo.txt more easily than some of the alternatives; this is what I was trying to get at with the comparison to Git's index. (Aside: copy is a bad name.)

Some additional thoughts: actually my simplistic assumption of echo * | xsel doesn't really work, as it won't know the source directory. You need to put absolute paths (or a structure that specifies the source directory) on the clipboard.

Maybe if we have copy assume that the current directory is the target directory (no name change) and just specify the source files.

I've thought about the cp "improvement" you mention, and I've never been able to decide what I think of it. On one hand I tend to be in favor of explicitness as a rule (and so the idea that you have to specify . appeals), but OTOH I also omit it by accident a lot and get annoyed at it. And ln doesn't require it -- if you just say ln /foo/bar/baz, it will make a link called baz in the current directory. MS-DOS & Windows also don't require the explicit target on copy/move. I think both tend to work fine. (I think a bigger improvement would be if cp assumed the --recursive flag. I should add that as an alias...)


I do think that programs should render in the terminal, but that they should do it themselves. Put differently, the shell shouldn't try to figure out is "some/file" is a render-able file. You could implement a command (say as a dbus interface) that would tell the shell to render an image, but I think having a library that provides generic in terminal rendering might be a better option (or at least more flexible; possibly sdl or something already fills this role.)

Yeah. I'm not really sure exactly what I want here, but "display this image" should definitely be the responsibility of a program rather than shell heuristics. I don't know details of how that winds up though.

I'm inclined to think that, for all its faults, HTML has a lot going for it, and a terminal that uses a somewhat restricted subset of HTML to provide rendering and then programs whose purpose is to produce human-readable output (including the implicit "display this JSON stuff nicely" that the shell produces) will generate HTML would work reasonably well. Of course the terminal would also have to accept non-HTML stuff, and it should also provide an easy way to say "render the output from the last command as plain text instead of HTML" and stuff like that. Then what happens is just that programs can output some <img> tag. Maybe it's not as general as you might imagine, but it is probably general enough.

But most of the ideas I have about how the terminal itself should behave are by far the least well-formed of any of the stages (utilities/shell/terminal). So I'm not sure what will happen here.

User avatar
tetsujin
Posts: 426
Joined: Thu Nov 15, 2007 8:34 pm UTC
Location: Massachusetts
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tetsujin » Mon Sep 03, 2012 5:25 am UTC

tomCar wrote:@troyp
Lol, I'm actually moving away from graphical file managers all together. I find they just don't offer me much of anything. Though I do miss the ability to copy/move things via ctrl-c+ctrl-v/d'n'd...


On a slightly wacky note, there was this thing I found a while back called (IIRC) "adventure shell" - a command shell that behaved kind of like a text adventure game...

Anyway, the whole thing struck me as kind of silly, except that it had this concept of an "inventory" - you could "pick up" and "drop" files. It's basically cut/paste of files, so the idea may be worth implementing in a less-silly kind of shell design. :)
---GEC
I want to create a truly new command-line shell for Unix.
Anybody want to place bets on whether I ever get any code written?

tomCar
Posts: 2
Joined: Tue Aug 14, 2012 6:14 pm UTC

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby tomCar » Mon Sep 03, 2012 5:51 am UTC

tetsujin wrote:
tomCar wrote:@troyp
Lol, I'm actually moving away from graphical file managers all together. I find they just don't offer me much of anything. Though I do miss the ability to copy/move things via ctrl-c+ctrl-v/d'n'd...


On a slightly wacky note, there was this thing I found a while back called (IIRC) "adventure shell" - a command shell that behaved kind of like a text adventure game...

Anyway, the whole thing struck me as kind of silly, except that it had this concept of an "inventory" - you could "pick up" and "drop" files. It's basically cut/paste of files, so the idea may be worth implementing in a less-silly kind of shell design. :)

I dunno, I'm totally willing to use a silly shell. Computing could use the smiles. (of course, the shell would still have to be easy to use.)

That said, I doubt that pick up/drop is quite as quick as what you can do on most graphical fms. Part of the problem is that the shell simply doesn't know what's in the current directory so it has no means of easily drilling down (aliases can be made for cd .., or popd to improve navigation in those directions.)

A cmdfm could solve these issues... providing me with a pure keyboard fm* while not losing the things I like about graphical fms.

*admittedly, I could probably achieve this with one of the "standard" fms; it would just take learning all of the shortcuts and changing my behavior. Though I still would have to open up a terminal to run commands...

User avatar
phlip
Restorer of Worlds
Posts: 7573
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby phlip » Mon Sep 03, 2012 6:05 am UTC

tetsujin wrote:On a slightly wacky note, there was this thing I found a while back called (IIRC) "adventure shell" - a command shell that behaved kind of like a text adventure game...

Code: Select all

$ cd /proc
You are in a maze of twisty little passages, all alike.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

troyp
Posts: 557
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby troyp » Mon Sep 03, 2012 7:28 am UTC

EvanED wrote:So the reason I said that was because I was under the impression that both screen and tmux would break both the actual terminal's scroll bar as well as the mouse wheel. This was my conclusion from several years ago, but it occurred to me that things may have changed or I might have tested it badly at the time, so I just gave it another attempt. From my quick test, both are true in the default install. Somehow the terminal figures out that the scroll bar is useless so it removes it. The mouse wheel also doesn't scroll -- however, it does bring up command history. This latter thing makes me hopeful that you could configure the scroll wheel to scroll. Assuming you can, that takes away much of my objection. I'll have to reevaluate my lack of use of those programs. :-)

Curses can certainly handle scrolling as well as mouse button input, so it's up to the program what to do with it. Usually the terminal's scrollbar is disabled because that's not part of the curses interface, so if you use it, you'll actually scroll the entire curses "screen" out of your viewing area (you can actually do this under certain terminal emulators/conditions, but I can't remember which). Some programs do let you scroll with the wheel, others move the cursor and scroll when it hits the top/bottom.

Whether scrolling works can depend on the particular terminal emulator, though. For instance less/man mouse scrolling works in gnome-terminal and konsole, but not in many lightweight terminals like urxvt and rox-term (mind you, I'm not sure whether less uses curses or is built from scratch).

This gives me an awesome idea... copy/cut/paste command-line programs. Currently if you want to move stuff from directory A to directory B you have to specify the path from A to B or B to A in the cp command. But imagine you have one terminal open in A and you just type copy * and then you have one terminal open in B and you type paste and it copies stuff over. I sort of view this as being like some of the difference between being able to gradually stage files in Git's index vs in Subversion just having to put everything in the one svn commit command.

If anyone writes this then let me know, because I now want it. :-) If not, I'll add it to my to-do list.

Edit: actually, you could even use "the" clipboard as the interchange between the commands -- just put the file names on the clipboard. With some caveats, this means that copy is even already written... just run echo * | xsel and have paste read from whichever clipboard that writes to! (In actuality I think you need something much more robust to spaces and other weird characters, so maybe you have to run some find command with -print0 or something.)

That is a good idea. I'm tempted to do this. The basic functionality would be very simple, especially for copy and paste. I'm thinking of using separate files /var/copy-clipboard and /var/cut-clipboard to implement copy and cut, with only one existing (or maybe being nonempty) at a time so paste has all the information it needs to complete the operation. Is that reasonable? Or would a more structured "clipboard file" be better (allowing more complex behaviour)?

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Mon Sep 03, 2012 5:48 pm UTC

troyp wrote:Whether scrolling works can depend on the particular terminal emulator, though. For instance less/man mouse scrolling works in gnome-terminal and konsole, but not in many lightweight terminals like urxvt and rox-term (mind you, I'm not sure whether less uses curses or is built from scratch).

Ah. I'll have to check out rox-term's behavior; I actually want to switch to using that. I'm also skeptical that the SSH client I use on Windows will behave properly; I am pretty sure that one is going to be of the "scroll the curses window off screen" variety.

That is a good idea. I'm tempted to do this. The basic functionality would be very simple, especially for copy and paste. I'm thinking of using separate files /var/copy-clipboard and /var/cut-clipboard to implement copy and cut, with only one existing (or maybe being nonempty) at a time so paste has all the information it needs to complete the operation. Is that reasonable? Or would a more structured "clipboard file" be better (allowing more complex behaviour)?

So despite my later caveat, I still like the idea of using the actual clipboard, because then you could paste the list of files to be copied into other programs or vice versa. (E.g. if you have a list of files you want to copy, you could just select that list, copy it to the clipboard using the normal means, then paste using the special command.) I'm not 100% sure that's the way to go because of a couple complications, but the idea is at least attractive. Presumably I'd go with line-separated file names and just forgo dealing with filenames containing newlines.

This would mean you'd actually have to switch the copy/cut logic around to the paste side, and make a paste --copy (default behavior) and paste --move.

Some other things to think about though: if you copy, edit the file, then paste, do you want the version before the edit or after? If you want it before, you'll have to copy or move it to a staging area, and my "list of files on the clipboard" idea won't work.

troyp
Posts: 557
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby troyp » Mon Sep 03, 2012 7:16 pm UTC

troyp wrote:(mind you, I'm not sure whether less uses curses or is built from scratch)

So I just remembered something. I'm pretty sure less doesn't use curses.
<off-topic>
Spoiler:
I know because sometimes when the scrollwheel doesn't work in less, it still works for the terminal window, same as sometimes happens with curses apps. But with less, if you scroll back, you don't see the curses screen move to reveal the session beforehand. Rather you see the entire history of contiguous sections of the document you've looked at with less. ie. less just prints whatever it wants to show to the terminal screen (which I guess is simplest way to do it). So if you scroll down, it prints normally, scrolling off screen, but as soon as you jump or scroll back, it reprints the screen, leaving anything above cut short. When you use a terminal with this behaviour, all those "leftovers" remain after you quit less (which can be annoying).


EvanED wrote:So despite my later caveat, I still like the idea of using the actual clipboard, because then you could paste the list of files to be copied into other programs or vice versa. (E.g. if you have a list of files you want to copy, you could just select that list, copy it to the clipboard using the normal means, then paste using the special command.) I'm not 100% sure that's the way to go because of a couple complications, but the idea is at least attractive.

I like that idea, although I occasionally might want to keep something in the X clipboard while I make a command line administration detour. Really, I was thinking of terminal-only because I have no idea how the X clipboard (or X in general) works. I guess it's probably not that hard, though. I could probably implement it using xsel and/or xclip without having to deal with X directly --- aand it turns out that xclip comes with programs xclip-{copy,cut,paste}file, which do basically this. Preliminary observations: (1) cut seems to delete file before pasting; (2) not sure how to paste graphically (I think xclip uses the "middle-button" selection, so maybe they don't actually integrate). I guess if there's no way to use these with a fm, something could be done with xsel. That can use all 3 selections.

Presumably I'd go with line-separated file names and just forgo dealing with filenames containing newlines.

This would mean you'd actually have to switch the copy/cut logic around to the paste side, and make a paste --copy (default behavior) and paste --move.

Some other things to think about though: if you copy, edit the file, then paste, do you want the version before the edit or after? If you want it before, you'll have to copy or move it to a staging area, and my "list of files on the clipboard" idea won't work.

This was my first thought, except with a text file rather than X selection and with two clipboard files rather than two pastes (so paste could determine which was in use). You could still have newlines if they were quoted, though - just use the format the shell accepts. But then I thought, maybe I should use a more expressive format (since I'd inevitably run afoul of the limitations of a list of filenames). It's tempting to make the cut and copy commands actually generate the code that will execute the operation.

Anyway, you could check out the xclip commands. They're one implementation of this idea, although maybe not an ideal one. My biggest concern is that it looks like if something goes wrong in a cut op, you'll lose your file. Which is really not acceptable. I'm sure there are clipboard managers or whatever that will keep a backup of the selections on disk, which would remove the risk, but that seems like too much overhead.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Mon Sep 03, 2012 7:26 pm UTC

troyp wrote:I like that idea, although I occasionally might want to keep something in the X clipboard while I make a command line administration detour. Really, I was thinking of terminal-only because I have no idea how the X clipboard (or X in general) works. I guess it's probably not that hard, though.

Oh, that's another drawback that I should have mentioned but forgot about: the clipboards need X to be available. So if you rely on them, they won't work when at a virtual terminal or over SSH without -X or -Y.

Anyway, you could check out the xclip commands. They're one implementation of this idea, although maybe not an ideal one. My biggest concern is that it looks like if something goes wrong in a cut op, you'll lose your file. Which is really not acceptable. I'm sure there are clipboard managers or whatever that will keep a backup of the selections on disk, which would remove the risk, but that seems like too much overhead.

I'll take a look.

troyp
Posts: 557
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby troyp » Mon Sep 03, 2012 9:29 pm UTC

re cut/paste:
Spoiler:
I can't work out how cut/paste works in graphical fms. It seems to be inconsistent in how/whether it uses the clipboard. Trying a few at random:
* emelfm doesn't seem to use the clipboard
* In Dolphin, copy/cutting places "file:///path/to/file" in the clipboard. Pasting to a text application doesn't complete a cut op; "forging" the selection and pasting into dolphin doesn't cause a paste.
* In PCManFM, copy/cutting places "path/to/file" in the clipboard. Pasting to text app doesn't complete the cut; but forging a file name does allow pasting into the fm.

I'm not sure how easy integration is going to be. Most graphical fms allow cut/pasting and drag/drop between themselves, but it's not entirely consistent (sometimes it only seems to work within the given fm) and I'm not sure what mechanism(s) are being used for communication.

One thing worth noting is that once we have working "terminal only" commands, you could integrate them quite easily into specific fms using custom actions (which is not as nice as seamless cut/pasting, but still works). As long as the fm allows them *glares at Dolphin*.

re: xclip:
xclip can use the other selections, but xclip-cut et al only use the primary. I looked at the source for those scripts. They basically copy the contents of the file to the primary selection. They handle copying a whole tree by just tar-ing it first. This is why the cut works improperly. Also, cutting a directory just ends up copying because the rm fails (which could be fixed with rm -r or, preferably, rm -ri). I don't understand everything in the scripts (esp re xclip), but it probably wouldn't be too hard to semi-blindly modify them to work better.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby EvanED » Mon Sep 03, 2012 9:45 pm UTC

troyp wrote:I can't work out how cut/paste works in graphical fms. It seems to be inconsistent in how/whether it uses the clipboard. Trying a few at random:

FWIW, Windows doesn't seem to use the clipboard either. (I thought it at least worked like Dolphin -- you could paste the file names into a program.)

xclip can use the other selections, but xclip-cut et al only use the primary. I looked at the source for those scripts. They basically copy the contents of the file to the primary selection. They handle copying a whole tree by just tar-ing it first. This is why the cut works improperly. Also, cutting a directory just ends up copying because the rm fails (which could be fixed with rm -r or, preferably, rm -ri). I don't understand everything in the scripts (esp re xclip), but it probably wouldn't be too hard to semi-blindly modify them to work better.[/spoiler]

Ah. Assuming your description of how it works is correct (read the contents into the clipboard), I pretty thoroughly disagree with their entire approach. :-) I'd rather just start from scratch.

troyp
Posts: 557
Joined: Thu May 22, 2008 9:20 pm UTC
Location: Lismore, NSW

Re: My Unix CLI manifesto, aka why PowerShell is the bees kn

Postby troyp » Mon Sep 03, 2012 10:05 pm UTC

EvanED wrote:Ah. Assuming your description of how it works is correct (read the contents into the clipboard), I pretty thoroughly disagree with their entire approach. :-) I'd rather just start from scratch.

Yeah, I definitely don't think it's the "right way". (To be fair, of course, these are just short scripts meant to take advantage of xclip's capabilities.)


Return to “Religious Wars”

Who is online

Users browsing this forum: No registered users and 3 guests