Free form operators for ruby

!!! -> <- -- +++ -=- */* !! ~~~ <=..< ....

Have you ever wanted to define your own operators in ruby?
Have you ever wanted to play around with experimental hash or range semantics, but wanted a concise syntax for the constructors?
Are you crazy enough to install a compiler patch from someone WHO OPENLY ADMITS TO NOT KNOW C? (for those of you that need everything quantified, that's a metric craziness index of about 0.6 Why)

If you answered yes to all of the above, have I got a patch for you.

If you can compile for source, you can download the beta and try it yourself!

Update July 2008

For some reason I voluntered to talk about this patch at FOSCON 08. The slides are available in either Open Office format or PDF.

Known bugs in this version:

NONE! (If you find one, please let me know!)

Change history:

I've fixed the one bug I found in the pre-alpha version.
“goo”+%w{ a b c }.join(',') should warn on '+%' (and interpolate a space)

FIXED in 0.2
+= and -= give a parse error under some circumstances. This was introduced in the late-night fix to the alpha bug.

FIXED in ver 0.3
proc {|x,*| ...} appears to be a valid way of accepting multiple parameters and throwing away all but the first. It fails to parse with the patch (which I'm tempted to count as a bug fix).

FIXED in ver 0.3
"bool_exp &&! bool_exp" does not warn (it is, after all, the recommended way of spacing-delimiting the '&&!' operator). Unfortuately, this form is used in lib/scanf.rb, to mean what I would write as "bool_exp && !bool_exp", along with '||!'
I think the proper fix is to include "def ||!(other); self || !other; end" etc. as part of the patch.
But I'm going to sleep on it.

After sleeping on it, I decided to change the logic to parse it has it had been before but generate a warning (in the spirit of the anoying but helpful ones about using () for future versions).

FIXED in ver 0.3
"x =<<-EOL" to introduce a here-doc (e.g. in rexml/encodings/UTF-8.rb)

FIXED in ver 0.3
Exceptionally odd behavior with "x ? y : z" in DOS format files when the '?' or ':' is at the end of the line (so directly followed by \r)

FIXED in ver 0.3
"x ||=/regexp/"

BELIEVED FIXED in ver 0.3
Also in ver 0.3: much improved & more helpfull warnings when potential operators are split for backwards compatibility. It now passes 'make test' with apropriate wanrings but no failures.

Thanks everyone, for all the support & constructive criticism (and for the almost complete absence of snide remarks about this sloppy web page).
And thanks especially for the bug reports!

Let me know if you try it out & what you think. Bug reports are especially welcome. E-mail me at "#{me.first_name}@#{me.domain}" unless you are a bot with low low mortgage rates or pictures of people I don't know testing out the results of their online drug purchases.

It's getting better as we pound on it. I'd like anyone who's interested to test it out. Specifically, I'm interested in finding any:

      * Bug/crashes--I'm aiming for none, of course

      * Incompatibility with existing code--I'd like this to break absolutely nothing. Anything that worked in unpatched ruby should work the same way with the patch

      * Shortfalls in addressing the extensibility goals (i.e., does this just get us to the next roadblock, or would it be possible to implement something like Neo_range or Uber_hash with all of the properties of the built-in classes? I suspect that we're not there yet, but I'm not sure where the next hurdle is. Compile-time constructors, probably, but what else?

      * Timing/speed issues--in theory my code should be slightly slower than the original, but I'm not seeing it. It could be that C programmers just habitually write for speed rather than clarity, but it is as or more likely I'm missing something.

Update: I randomly generated a 51000 line program and the patch added 0.04 seconds to the compile time (15.300 seconds vs. 15.340 seconds). I would call this inconsequential.

-----------------

I'm modifying parse.y to extend the idea of tOP_ASGN ( +=, -=, etc.) to include (as user redefinable methods like <=> is presently) all combinations of operator characters.

As mentioned previously, I've been working on a patch that would let you write things like:

class Pair
    attr_accessor :l,:r
    def initialize(l,r)
        @l,@r = l,r
        end
    end

class Object
    def -->(other)
        Pair.new(self,other)
        end
    end

print 1-->5,"\n"

This is the beta release of that patch. Basically, any sequence of "operator characters" that isn't otherwise used is now user definable, though you are warned to use spaces where this would be ambiguous (e.g. x+=-1).

In this version, the operators are always binary, non-associative, and mid-precedence. I can see how to let the users set the precedence, but can not figure out what "scope" the precedence declarations should have. I can NOT see how it could depend on the class of the recipient, which would be in some ways idea and in others hideous.

Related posts from ruby-talk

> Bill Guindon wrote:
> |Personally, I'd like it to remember the order you added things in, use
> |that ordering when iterating through the hash, and _optionally_ use it
> |for compares.

> Matz wrote:
>
> Hmm, I thought hash order would be ignored always for compares.

I would think it would have to, if you don't want to break the semantics
of Hash.

> Matz wrote:
>
> Optional use of order is an interesting idea, although I want better
> looking API than
>
> | {1=>2, 3=>4}.ordered != {3=>4, 1=>2}.ordered

If you make inspect (for example) respect the insertion order then you
have

    {1=>2, 3=>4}.inspect != {3=>4, 1=>2}.inspect

which is at least a little less obtrusive.

It may, however, be better to have an additional class (OrderedHash, for
want of a much needed better name), in which order was respected for
both comparison and iteration.

My main concern is that just making regular hashes respect insertion
order for iteration but not comparison isn't exactly "least surprise";
so far as I know, the iteration order of hashes presently is arbitrary
but not random.  In other words, for any two collections (arrays,
hashes, whatever) it is presently true that:

    a == b

implies

    a.collect { |x| x } == b.collect { |x| x }

but this would no longer be true.

Of course, this line of argument would be weakened if the implication
isn't true at present for hashes.

My preferred answer is to make => an operator that takes two object and
returns a hash in any context (assuming of course that the precedence
issues this raises could by resolved).  That would let people define
their own extension to hash that worked however they wanted.

Or (even better in my book) make anything that matches

      /[:+-=<>|*&^%?~!._]{2,6}/

or some such available as an operator (and thus, the vast majority of
them would be up for grabs for tasks such as this).  Precedence &
associativity could be assigned by some convention (all equal, longer
things lower, or...?) or (though I'm not sure how it could be
implemented) user defined.

It would require a trick sort of like what is used for tOP_ASGN to keep
the parser from exploding (over 8000000 new operators?  Yikes!) but I
think it might work.

To Hal: would a notation like

     ordered_hash(1-->2,3-->4,...)

or

     1-->2 | 3-->4 | ....

suit you (the latter assuming the precedence could be controlled or the
conventions were fortuitous)?

The ideas I'm (slowly) playing with for ranges:

     Extend the Range so that either or both ends can be
         inclusive, exclusive or unbounded (i.e., open, closed, or
         infinite)

     Define construction operators '<..<', '<=..<', '<..<=' and '<=..<='
     Likewise '<.._', '<=.._', '_..<=' and '_..<'
         (the last two being unary prefixes)
     Keep '..' as an alias for '<=..<='
     Keep '...' as an alias for '<=..<'
     Define construction operator '..+' for the start/length-1 case
     Define construction operator '..<+' for the start/length case

     Add Range#by(step)

     Defining a related class for "disordered" ranges like "2..-1" which
          are handy but semantically disjoint for pure ranges. I'm
          thinking something that water would roll off the back of in
          a duck typing world, but that would raise reasonable error
          messages in preference to producing unexpected behaviour.

Typically, the versions of ruby I produce in these experiments are
killed by angry villages before they can show their essential
kindheartedness. But I still hope.

> On Sep 26, 2004, at 11:54 AM, Markus wrote:
>
> >      Perhaps. I keep feeling that there is an "Ah ha!" lurking in
> > here somewhere--if we just look at things on the right way, we could
> > (for 2.0) get nearly full backward compatibility, cleaner semantics,
> > and nice route for expanded expressiveness. I'll post more if the
> > idea still seems reasonable after I think on it for a day or so...
>
> If Association is a subclass of Array or Values, then it should be
> possible to splat it. Here's an extremely bare-bones version
> demonstrating the possible behavior:

     This looks to be very much along the lines I have been thinking,
except that I am totally ignorant of the class Values. Can you give me
a little background?

> It's a neat idea, but I would wonder how practical it would be; I get
> the feeling our nice fast hash lookups would be ruined... But since I
> don't know the internals...

     I'm thinking it could be done in a way that was almost pure sugar.
The key would be finding a way to decompose the current syntax into
cogent chunks that, when combined in the usual ways would have the usual
meanings, but could also be meaningfully combined in _new_ ways.

     For example (WARNING: this should be a 2.0-at-the-earliest change
and in any case I'm still in the process of thinking it out):

      * Define a class Association < Array with the methods key & value
        (perhaps as synonyms for first and last), and Association#hash
        returns self.key.hash.
      * Open up the set of user-definable operators to include anything
        that matches /[+-*/=<>.&^%!~]+/ or what have you.
      * Make Object#=>(other) return Association.new(self,other)
      * Make a hash work on anything that responds to "hash" (in other
        words, anything) by storing the object under its hash.
      * Make { v1,v2,v3...} build a hash, analogous to the way in which
        [ v1,v2,v3...] builds an array. Note:
              * This doesn't depend on the v's being constructed by =>,
                but if they are the semantics would be the same as
                always.
              * The semantics of { k1,v1, k2,v2, k3,v3,... } would
                change; it would produce something more like a set, with
                the k's & v's as elements.
              * The semantics of [ k1=>v1, k2=>v2...] would change;
                instead of producing an array containing a single hash,
                it would produce an (ordered) array of associations.
              * The semantics of implicit hash arguments might change
              * The semantics of Hash#to_a perhaps ought to be changed
      * Let people implement their own functionality with these building
        blocks (e.g., all the flavours of "ordered" hashes discussed a
        few weeks ago should be trivial).
      * Deprecate some of the functions that have been added to array to
        support sets, associations, etc. and could now be better
        implemented with these tools.

     I'd call this a taste of my grand vision if I was more convinced
that it wasn't a whiff of my temporary hallucination. I'm in the
process of trying to puzzle out: 1) what does it break, 2) what would
break it (e.g. cause it to exhibit counter-intuitive behavior), and 3)
could it in fact be implemented.
     Any thought/comments/questions/criticisms are welcome.

> > There has been some discussion going on whether Ruby's Hash should be
> > sorted by default on this list recently.
> >
>
> I would vote in favor of it.

     I'd vote against, for the following reasons.

     1. Every time it comes up the thread goes quite a while before the
        advocates realize that they aren't all assuming the same sorting
        order. Some are thinking "sorted by key (obviously)" while
        others are thinking "sorted by order of insertion (obviously)"
        and still others are thinking "sorted by value" or "sorted by
        some arbitrary key so long as it's consistent," etc.
     2. Many of the proposals are not particularly well defined when you
        consider issues like modification and alternate means of
        construction.
     3. Hash, like String, Array, etc. is a fairly well established data
        structure. Although it might for some uses be nice to have a
        string where the characters were "automatically" sorted into
        alphabetical order, this isn't what anyone familiar with strings
        would expect.
     4. It may well impose a significant performance penalty
     5. It may well break existing code
     6. It is easy enough to produce the desired effect(s) by other
        means.

     Instead, I support the extension of => to act as a general operator
in all contexts or (much better, IMHO) the addition of many more user
definable/overridable operators that would let people add new classes
(with sweet syntax) that worked the way they wanted (see
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/111627 for
more details on this idea). If any of these became popular, it could be
climb it's way into the language after being field tested.

On Thu, 2004-09-23 at 05:04, Florian Gross wrote:

> Yup, you can use this as a case that checks all options:
>
> def greedy_case(obj, cases)
>    cases.find_all do |condition, action|
>      condition === obj
>    end.inject(Hash.new) do |hash, (condition, action)|
>      hash[condition] = action.call; hash
>    end
> end
>
> def test(x)
>    greedy_case(x,
>      1 .. 2 => lambda { puts "In A" },
>      1 .. 3 => lambda { puts "In B" },
>      2 .. 4 => lambda { puts "In C" })
> end
>
> test(2)

Nice. Thats the best examples of writing your own control structure
I've seen in a quite while. *smile* And it even has the
non-deterministic aspect that Dijkstra was so fond of.