Simple Explaination of Monad

Here’s my shot at a simple explanation of what a monad is in the computer world. First, I have to explain what a monoid is.

A monoid is (roughly)  a group of functions who take a parameter of a certain type and return a result of that type.

That’s useful because it allows easy “chaining” or “composition” of functions. Any fluent interface in java is a monoid:

MyThing result = myThing.spin()
                        .treadle()
                        .flip()
                        .invert()
                        .aggregate();

Note that the input to each function is a MyThing, and the result is a MyThing.

A caution; strictly speaking, a monoid needs a few extra things. It needs an “identity” function that just returns the parameter, it needs a binary function that takes two items of the type and returns a new instance of the type, and those binary functions must be associative, so you can change the order of evaluation:

a.do(b).do(c) == a.do(b.do(c))

A monad takes the idea of monoid one step further; instead of mapping a class onto itself, it takes a container type which accepts any underlying type, and lifts that into a monoid space. So it’s a monoid using a container type:

A monad is (roughly) a group of functions who take a container type as an argument and return a result of the container type.

But note that the underlying type of the container may change. So you could map Container<String> to Container<Integer>. But the functions would still line up.

MyContainer<Int> myContainer = new MyContainer(1);
MyContainer<String> result = myContainer.<Int>rehash()
                                        .fondue()
                                        .serializeToString()
                                        .prefix()
                                        .onomatapeize();

So that you are able to “chain” or “compose” the functions and get a result of any arbitrary type! That’s a pretty strong claim for a statically-typed language like Java.

(Note that in Haskell, the signatures of the functions would be a little different. They would be functions of (Int, MyContainer) or (String, MyContainer), and it would be the job of the container to appropriately apply the function to it’s contents. But it’s hard to make a direct comparison from Java to Haskell, because Haskell works much more directly with function composition. In Haskell, the monad is actually composing the type constructors, rather than with instances of the type.)

But the core idea behind the monad is to create a monoid on a container type, in order to get the benefits of chaining and composition across different underlying types.

Leave a comment

Filed under computer architecture, computer OOD, design patterns

Entropy and Learning to Play the Game

I’m speculating with this post — I would really love to hear your opinions and thoughts!

John Cook has an interesting blog post on calculating entropy: http://www.johndcook.com/blog/2013/08/17/calculating-entropy/

Entropy is a curious thing. There are different variations on it. There is the original thermodynamic version of entropy, which is a measure of the heat loss of a system as it does useful work. There is a statistical mechanics interpretation on entropy, which is the amount of information needed to express the state of a  physical system. And there is the information theory (i.e. computing) version of entropy, which is the amount of information needed to specify the state of an information system.

There is disagreement on whether the different versions are fundamentally related to each other, but a common term keeps popping up: “n log(n)”

John Cook’s post looks at entropy based on a number of probabilities that add up to 1.0. The equation he comes up with is:

entropy2

Now, if we pretend for a moment that all the probabilities are equally likely, then the right-hand term goes away, and we’re left with “log N”. That’s the upper bound on the amount of entropy. If we could pin the probabilities down, we could reduce that, but that’s good enough for the moment.

If the calculation is done with log base 2, then the result is the number of “bits” of entropy. That’s means representing the space with the familiar  binary series of 0 and 1. If you use the natural log, then the result is the number of “nats”. I’m not sure what representation that corresponds to. Maybe frequencies or wavelengths?

Taking that, we can track the progress of Machine Learning as it applies to an area that people are interested in. Games.

Lets’s start with Tic Tac Toe.

Tic Tac Toe is a fun game if you’re young, but it doesn’t take long before we consider it to be “solved”. That is, two players who know what they’re doing will always play to a draw. Every time.

So let’s consider Tic Tac Toe as a Machine Learning problem:

  • If we look at the number of possible Tic-Tac-Toe games, we find about 255,000 games. That represents an entopy of 18 bits, or 12 nats. That means you could represent every the Tic-Tac-Toe games with 3 bytes apiece if you really scrunched it down.
  • But we can reduce that even more by observing that the board has rotational symmetry. We can rotate the board in 8 ways. That means there’s really only about 32,000 possible games. That’s 15 bits, or 10.4 nats.
  • However, my source tells me that of those, only 1145 are games that someone who knows what they’re doing will play. All the others result in a loss. So an expert Tic Tac Toe player — a player who will never lose — only has to remember 1145 games. That’s 10 bits, or 7 nats.

Clearly a computer can handle that much information. So at 7 nats, we consider the game to be “trivial” or “solved”. It’s not even interesting to play.

Next consider backgammon.

Backgammon introduces the roll of dice, so instead of games I put in the number of board positions. If a computer knew the value of each board position, then it could roll the dice, evaluate each possible move, and then pick the one that was most likely to result in a win.

  • If we play Backgammon naively, by just memorizing board positions, it is significantly more complicated, requiring 66 bits, or 46 nats to master.

But computer backgammon programs have all but solved backgammon. Except for the throw of the dice, they can beat expert players more often than not. So humans still consider backgammon a challenging game, but computer players can master the 46 nats required to come out ahead.

How about Chess?

There is the Shannon number, which is an estimate of the number of possible board positions. And, with some estimates of the number of possible moves and length of games, you can estimate the number of chess games.

  • The space of possible Chess games comes out to be 398 bits, or 276 nats.

The best humans have not beat the best chess players since 2006. So even at 276 nats, computers can master the space of chess games better than human intelligence.

Jeopardy, anyone?

What about a game show? Recently Watson competed on Jeopardy, and beat its human opponents. I’m not sure how many bits or nats Jeopardy represents, but it has to be pretty big. The contestants are given a study book that contains all the answers, so at least a computer player must be able to handle the entropy contained in that book.

The game of Life

So the entropy contained in a game progresses from “trival” to “interesting” to “captivating”. Captivating in a sense that people will spend their whole lives mastering that game. And people will play on Jeopardy, hoping their ability to learn and remember will outpace their opponents. But in all these areas, machine learning has surpassed human performance.

What’s next? Clearly at some point, machine learning isn’t strong enough to master the entropy of the given space. Shakespeare wrote Hamlet, but writing a compelling tragedy is more than a computer can muster. Even a solid romance novel, for that matter. Or a pop song that climbs the charts.

But given how rapidly storage capacity and processing power is growing, how long will it be before a big processor *can* handle the nats required to write a sonnet?

At what point does an intelligence move from “trivial” to “interesting” to “captivating” to “self-deterministic”? That is, how many nats does a machine need to command before its thinking is so complex that we declare that it has free will?

Space Size bits nats
Tic Tac Toe – expert 1145 10.2 7.0
Tic Tac Toe – symmetic 31896 15.0 10.4
Tic Tac Toe – naive 255168 18.0 12.4
Backgammon – naive 1.10E+20 66.6 46.1
Chess – naive 1.00E+120 398.6 276.3
Jeapardy  ?  ?  ?
Hamlet  ?  ?  ?

Here’s a comparison with byte streams to see how much data you can pack into them.

Space Size bits nats
1 byte 256 8.0 5.5
4 bytes – 32 bits 4294967296 32.0 22.2
8 bytes – 64 bits 1.84E+19 64.0 44.4
16 bytes 3.40E+38 128.0 88.7
32 bytes 1.16E+77 256.0 177.4
64 bytes 1.34E+154 512.0 354.9

Source of game numbers:

Tic Tac Toe: http://www.mathrec.org/old/2002jan/solutions.html
Backgammon http://www.bkgm.com/articles/Levner/BruteForceBackgammon/
Chess: http://en.wikipedia.org/wiki/Shannon_number

Leave a comment

Filed under machine learning, opinionizing

Wherefore Art Thou Architecture?

How should one define “architecture” as opposed to “engineering”?

I’ve always seen the seniority of a engineer as a question of how big a problem they can solve on their own.

Roughly, to convey the idea:

You can give someone new to programming small, contained tasks with lots of explicit instructions about how the task needs to integrate with other pieces

A mid-level dev is someone who can take a description of some portion of an application, and make it work within the context of that application.

A senior dev can build a medium-sized application from scratch, within the technical constraints of a shop.

A more senior dev can do that, and make technology choices along the way about what technologies to use to make it work well.

…but those aren’t hard and fast rules. And some people come out the gate as “senior” IMHO, even if they have to spend some time with a different title.

What the architect is asking is to view the problem even more generally than that. If you had to put together a number of applications to make the system work:

  1. What applications and services will you need?
  2. What pieces interact with customers, and which interact with each other?
  3. How will they communicate?
  4. Where will they store data?
  5. Where are the risks of failures?
  6. How will you provide reliability?
  7. How will you provide security?

So, in a sense technical architecture is like a building architecture. It’s the layout, or the plan. It shows whats needed for the various parts, how they hold together, and just as importantly why.

(BTW, I’ve had a similar growth curve explained to me for architects, ranging from architecting a family of related applications or a set of very complex features, to setting technical direction for a group, to making strategic technical decisions for an entire organization.)

That said, I think most engineers at all levels have to do some “architecting” as well. It’s not a bright line. It sounds like if you focus on the Big Picture first, and not get hung up on the implementation details, you’ll be more in line with what he’s looking for. BTW the ability to see the Big Picture as well as the Little Details is a huge asset in this industry, so this sounds like a great opportunity.

…Here’s an analogy. Let’s say you’re asked to create a Magic Black Box. As an engineer, you’re expected to obsess about the inner workings of the Magic Black Box. But as an architect, your focus is different. You might peek into the box out of curiosity, but you’re expected to obsess about everything around all the Magic Black Boxes.

Similar to that analogy, you might think about the architecture role as viewing the whole system as the magic box. So if take a Gigantic Glass Box and you put the customers, the client apps, the firewall, the service tier, the database, and even the devops guys inside, then as architect you’re to obsess about how to make that huge system box work well.

Leave a comment

Filed under computer architecture, computer career, opinionizing

JSON the Magical Decoupling Tool

JSON/HTTP is a really good decoupling mechanism for communication between applications. The rapid industry adoption of JSON/HTTP interfaces really speaks well about how people view the usefulness of that model. and I’ll throw out a couple of suggestions that will make it even more loosely coupled.

Enforce a MUST IGNORE rule.

That is, when parsing the JSON (client or server), the app MUST IGNORE any fields it don’t recognize.

XML went in the with idea that the app MUST UNDERSTAND each field or else the document was invalid. But that created problems with versioning, because with almost any change, clients needed to upgrade every time the server did. Even adding an informational field broke the spec. With MUST IGNORE, the server can add new fields any time, and as long as it doesn’t remove or change the meaning of other fields (see below). Existing clients can just ignore the new fields. Rather, they MUST IGNORE the new fields.

A search on MUST IGNORE and MUST UNDERSTAND will reveal lots of good articles that talk about that.

Minimize breaking changes.

A “breaking change” is a change that will break existing clients. That is, removing a field the clients use. Or changing the meaning of a field (i.e. changing an amount field from dollars to Yen). That is, something that invalidates a client’s assumptions about the data it’s currently using.

With a breaking change, every client needs to make a change to support the new semantics or stop relying on missing fields. Do don’t do that unless its necessary.

In the extreme you would never make a breaking change. That is, have full backward-compatibility for every release. That may or may not be realistic, and it may require carrying along baggage from early versions, but it will spare a lot of churn for the clients.

How many APIs?

Because of the ubiquity of JSON parsers, maintaining a single API regardless of client will make life a lot easier. Sometimes it doesn’t work out — sometimes a client can only understand XML or some proprietary protocol, but starting simple & adding complexity makes life easier.

Authorization

Just as a tangent, OAuth 2 is a really good bet for a well-thought out, standardized security protocol for a JSON/HTTP API. You could sit down and design something simpler, depending on what compromises are OK. But OAuth is a good fleshed-out protocol that has undergone years of industry scrutiny, so they’ve had lots of time to work out the kinks. And standard libraries are readily available for both client and server. I used an OAuth plugin to DJango for one project and it worked out really well.

Leave a comment

Filed under computer architecture, OAuth, opinionizing, REST

Using Double Underscores in Python, and the Black Death

OK maybe that title is sensationalistic.

A lot of people erroneously use double underscores to simulate “private” members, because double underscores invokes code mangling and makes those members harder to reference outside the class. However, it does not actually make them inaccessible.

Most of the time, it mainly adds a road bump to unit testing.

Really the double underscore mangling mechanism is to hide those members from subclasses that you don’t want clobbering the values inadvertently. Name mangling isn’t intended to hide the member from other programmers. The mangling scheme is simple, and referencing the variables anyway is easy.

Single underscore is the common convention for internal members. That’s saying, we’re all adults here, and although you can see it, this variable is intended for internal use. If you reference it, there’s no guarantee it will still be there in future versions.

The Pep8 doc talks about that, and says use of the double underscores for variables and functions should really be rare.

Leave a comment

Filed under opinionizing, design patterns, python

Wherefore Monads?

I’ve been looking at and playing with monads in the last week or so, and I think I’m getting pretty close to understanding them. From what I’ve seen, here’s why it’s a valuable design pattern.

If you’re familiar with the GoF patterns, monads are like the Decorator pattern and Builder pattern put together, on steroids, bitten by a radioactive badger.

  • monads decorate some core type with additional properties without changing the core type. For example, a monad might “lift” String and add values like “isWellFormed”, “isProfanity” or “isPalindrome” etc.
  • similarly, monads allow conglomerating a simple type into a collection type
  • monads allow late binding of functions into this higher-order space
  • monads allow mixing arbitrary functions and arguments with an arbitrary data type, in the higher-order space
  • monads allow blending pure, stateless functions with an impure, stateful base, so you can keep track of where the trouble is

A familiar example of a monad in Java is List. It takes some core class, like String, and “lifts” it into the monad space of List, adding information about the list. Then it binds new functions into that space like get(), getFirst(), add(), empty(), etc.

On a small scale, suppose there’s a code snippet you apply all over the place, like checking to see if a value is null. You can incorporate the null check into a little monad like Maybe — or better yet, return a Maybe that knows it’s null, avoiding null altogether. Then you can build or “lift” a value into the monad, and eliminate the repetitive code.

On a large scale, imagine that instead of writing a program, you just wrote a big Builder (as the GoF pattern), and the build() method at the end spat out whatever answer the program was supposed to produce. And that you could add new methods to your ProgramBuilder without recompiling the original code. That’s why monads are a powerful design model.

Leave a comment

Filed under utility, opinionizing, design patterns

Monadic Downcast

lol that’s kind of a cool title.

Downcasts are always a pain in the butt in an OO language. If you’ve designed your code correcly, you should never need to do a downcast. But sometimes you get stuck where you have to.

An upcast is the good kind of object cast, for example casting an ArrayList up to a List so you can handle it more generically. That’s a good practice in OO, something you should aim for.

A downcast goes the other way, for example saying, “I know this Object is really a BigInteger, so I’m going to cast it down the class hierarchy to a more specific type.” The problem is, the compiler can’t guarantee that an Object is a BigInteger, so you’re taking a chance that an Integer or String or whatever slipped in at runtime. You’re risking a Class Cast Exception.

The classic downcast situation is parsing a JSON structure into a nested Map heirarchy, and then pulling values out of it. Each Object in the map could be a String, List, or another Map. Sometimes, rather than encapsulating the JSON structure as a class structure, you encapsulate the data operation as an operation on a nested Map structure as a convenience.

Anyway…

C# has a nice “as” operator that will perform a cast, and return null if the cast isn’t valid. I always really liked that operator. So, inspired by that and my latest investigation of Monads, I came up with this utility.

It’s a monadic downcast utility class. The idea is that you instantiate one with the class that you want to cast to, and then invoke that instance to perform the down-cast.

It’s monadic because:

  • It “lifts” any arbitrary class into a monadic casting space.
  • It also provides a bind operation that allows you to perform arbitrary functions on the data, without recompiling the original code.

I’ve taken shortcuts with convention to make the code shorter. Also, note the @SuppressWarnings that suppresses the compiler warning you about the downcast you’re making.

The usage is to instantiate an As object, and then use the of to perform the cast. The result has an “isA” property (here it’s just an attribute, for brevity) if the cast is possible, and a “present” property if the object is not null. The “get” property returns the object as the type that you want.

Example:

// use a prototype for a custom class
    As<Thing> asThing = As.proto(Thing.class);
    As<Thing> t = asThing.of(o);
    if (t.isA) {
        t.get.doSomething();
    }
// or use the one of the convenience classes:
    As<Map<String, Object>> a = As.MAP.of(source);
    if (a.isA) {
        a.get.put("zoom", "pop");
    }
// or do a type conversion
    As<String> s = As.STRING.of("123");
    assert s.isA;

    As<Integer> i = s.bind(TO_INTEGER);
    assert i.isA;
    assert i.get == 123;

    // parse int
    static final Function<String, As<Integer>> TO_INTEGER = new Function<String, As<Integer>>() {
        public As<Integer> apply(String b) {
            try {
                return As.INTEGER.of(Integer.parseInt(b));
            }
            catch (NumberFormatException e) {
                return As.no();
            }
        }
    };

And here’s the implementation.

public class As<A> {

    // info we're tracking
    public final A get;
    public final boolean isA;
    public final Class<?> clazz;

    // some utility prototypes
    public static final As<Map<String, Object>> MAP = As.proto(Map.class);
    public static final As<List<Object>> LIST = As.proto(List.class);
    public static final As<String> STRING = As.proto(String.class);
    public static final As<Integer> INTEGER = As.proto(Integer.class);

    /** create a prototype */
    public static <A> As<A> proto(Class<?> clazz) {
        return new As<A>(clazz, null, false);
    }

    /** no of a certain type */
    public static final <X> As<X> no() {
        return new As<X>(null, null, false);
    };

    /** generic function interface -- note no monadic restrictions here */
    public interface Function<A, B> {
        B apply(A a);
    }

    /** internal constuctor */
    As(Class<?> clazz, A get, boolean isA) {
        this.get = get;
        this.isA = isA;
        this.clazz = clazz;
    }

    /** of === unit ... lift a value into monadic space */
    @SuppressWarnings("unchecked")
    public As<A> of(Object a) {
        if (clazz == null) {
            return no();
        }

        boolean is = clazz.isInstance(a);
        As<A> that = new As<A>(clazz, is ? (A)a : null, is);
        return that;
    }

    /** bind :: m a -> (a -> m b) -> m b */
    public <B> As<B> bind(Function<A, As<B>> f) {
        if (! isA) {
            // I'd like to return a prototype, but can't because of type erasure
            return As.no();
        }

        return f.apply(get);
    }

    /** a nice to string */
    public String toString() {
        StringBuilder builder = new StringBuilder();
        builder.append("{As");

        builder.append(" " + get);
        builder.append(" isA=");
        builder.append(isA);

        builder.append(" }");
        return builder.toString();
    }
}

And here’s a sample, using the classic case of traversing a generic map hierarchy to pull out data.

public class Sample0 {

    public static void main(String[] args) {
        Map<String, Object> source = sample();

        assert !As.MAP.of(null).isA;

        assert !As.MAP.of("booya").isA;

        As<Map<String, Object>> c = As.MAP.of(source.get("ma"));
        if (c.isA) {
            c.get.put("po", "op");
            System.err.println(c);

            As<String> d = As.STRING.of(c.get.get("li"));
            assert !d.isA;

            As<String> e = As.STRING.of(c.get.get("str"));
            if (e.isA) {
                System.err.println("String: " + e.get.substring(0, 1));
            }

            As<List<Object>> f = As.LIST.of(c.get.get("li"));
            if (f.isA) {
                System.err.println("List: " + f.get.get(0));
            }
        }
    }

    private static Map<String, Object> sample() {
        Map<String, Object> source = new HashMap<String, Object>();
        Map<String, Object> hoppop = new HashMap<String, Object>();
        List<String> zing = Arrays.asList("1", "2", "3");
        String booya = "booya";
        String zoopup = "zoopup";

        source.put("str", booya);
        source.put("li", zing);
        source.put("ma", hoppop);
        hoppop.put("str", zoopup);
        hoppop.put("li", zing);
        return source;
    }
}

BTW here is a demonstration that the implementation meets the Monadic Rules. Maybe I’ll post more about that later.

public class Rules {

    /** int to string */
    static final Function<Integer, As<String>> f = new Function<Integer, As<String>>() {
        public As<String> apply(Integer b) {
            return As.STRING.of("" + b);
        }
    };

    /** string to int */
    static final Function<String, As<Integer>> g = new Function<String, As<Integer>>() {
        public As<Integer> apply(String b) {
            return As.INTEGER.of(Integer.parseInt(b));
        }
    };

    /** unit int */
    static final Function<Integer, As<Integer>> u = new Function<Integer, As<Integer>>() { 
        public As<Integer> apply(Integer b) {
            return As.INTEGER.of(b);
        }
    };

    /** associative */
    static final Function<Integer, As<Integer>> r = new Function<Integer, As<Integer>>() { 
        public As<Integer> apply(Integer b) {
            return f.apply(b).bind(g);
        }
    };

    /** test the monadic rules */
    public static void main(String[] args) {

        As<Integer> m = As.INTEGER.of(1);

        leftIdentity(m);
        rightIdentity(m);
        associativity(m);
    }

    /** Left identity: return a >>= f ≡ f a */
    private static void leftIdentity(As<Integer> m) {
        As<String> left = m.bind(f);
        As<String> right = f.apply(1);
        System.err.println(left + " " + right);
        assert left.toString().equals(right.toString());
    }

    /** Right identity: m >>= return ≡ m */
    private static void rightIdentity(As<Integer> m) {
        As<Integer> left = m.bind(u);
        As<Integer> right = m;
        System.err.println(left + " " + right);
        assert left.toString().equals(right.toString());
    }

    /** Associativity: (m >>= f) >>= g ≡ m >>= (\x -> f x >>= g) */
    private static void associativity(As<Integer> m) {
        As<Integer> left = m.bind(f).bind(g);
        As<Integer> right = m.bind(r);
        System.err.println(left + " " + right);
        assert left.toString().equals(right.toString());
    }

}

Leave a comment

Filed under utility, design patterns, computer algorithms, computer OOD