Sifting in Scala

Here’s a fun little class that allows “shifting” a data type, breaking it up into pieces that are included and excluding by some “sifting” criteria.


/**
* companion object has factory methods for ok an not ok results
*/
object InclExcl {

/**
* start with one that's good and none bad
*/
def ok[T](a: T): InclExcl[T] = InclExcl(Some(a), Set())

/**
* end with no good one and one that's bad
*/
def notOk[T](b: T): InclExcl[T] = InclExcl(None, Set(b))
}

/**
* holds an object that's "included" by some critera, and a set of objects
* we've "excluded"
*
*

allows functions to filter and partition what's in and what's out
*
* @param in the date range included
* @param out the date ranges excluded
*/
final case class InclExcl[T](in: Option[T], out: Set[T]) {

/**
* sift or refine into in/out using the supplied function
*
*

The function may split the included object and return a set of
* excluded objects, which we aggregate with everything else we've excluded
*
* @param f function to apply
* @return a new incl excl split by the filter function
*/
def sift(f: T => InclExcl[T]): InclExcl[T] =
in match {
case None =>
// if nothing to split, done!
this
case Some(a) =>
// split the thing
val b = f(a)
InclExcl(b.in, b.out ++ out)
}
}

For example, the Sieve of Eratosthenes could look like


range(0, Math.sqrt(1000) + 1).foreach { n =>
InclExcl.ok(range(0, 1000)).sift(isDivisibleBy(n))
}

.. just add “isDivisibleBy” ūüėČ

This will sift the “included” set by whether or not it’s divisible by the next number, so when it’s done, “incl” will include all the primes and “excl” will contain all the non-primes.

(Note: code example is wrong and riddled with syntax errors)

Advertisements

Leave a comment

Filed under Uncategorized

Can There be Monads in Java?

I put together a presentation on creating monads in Java.

https://docs.google.com/presentation/d/1-eVPi6lsgnHrEj6G91DgbXRkSZknCJ4evESdYccmNQE/edit?usp=sharing

Some highlights:

+ Monad: Container that allows chaining operations, preserving semantics

+ Normally a function maps from one type to another, but the semantics may change. With a monad, operations have the same result type, so multiple operations preserve semantics.

+ Safer way to enforce semantics than as conventions, and relying on out-of-channel means such as using a lead pipe.

+ The monad pattern is all about preserving semantics across types. So first you have to decide on some semantics. This is particularly important for this pattern, otherwise you’re just creating extra work for yourself.

Leave a comment

Filed under Uncategorized

Add Transform Semantics to Arbitrary Objects!

We can use parametric polymorphism to create containers that add common semantics to different types. Here’s an example of defining “data transform” semantics to objects. The idea is that you lift an object of some type into the transform space. Operations on that transform object keep it in the space of transforms. If at some point the data transform is invalid, subsequent operations will simply pass the invalid transform forward. But it still stays in the transform space.

In the end, you can check the “isValid()” property, and see if your chain of operations was valid. If not, you can get the “invalidReason()” as a String. Otherwise, you can “get()” the final object, which was the result of your operations.

Transform defines the operations allowed on a transform container.

GenericTransform defines a basic container that can hold an arbitrary type

EmptyTransform is the starting point for specific transform types, because it defines a transform on nothing at all, which is always invalid.

StringTransform is the shell of a string transform class specific to String. You can add functions to StringTransform, as long as they return a “Transform” type. That ensures that the methods are chainable, and operations stay in the Transform space.

public interface Transform<A> {

    /**
     * @return true if the object is still valid.
     */
    boolean isValid();

    /**
     * @return the reason we're invalid
     */
    String invalidReason();

    /**
     * Make sure the object is valid before you get it.
     *
     * @return the underlying value.
     */
    A get();

    /**
     * Return true if the data is invalid, and we should stop processing,
     * this returns true. The logic should return an invalid instance as
     * the answer.
     *
     * <p>
     * (Really just sugar for ! isInvalid)
     *
     * <p>
     * Nicer than throwing a hard stop with "halt"
     *
     * @return true if we should skip the following logic
     */
    boolean skipIfInvalid();

    /**
     * If the transformation is to a different underlying type,
     * we can't perform that transformation, so halt -- should
     * involve throwing a RuntimeException.
     *
     * Note that is only a safeguard method; the running code should
     * always check isValid to avoid hitting calls to this method.
     */
    void haltIfInvalid();

    /**
     * Support bind semantics for arbitrary extensibility
     *
     * <p>
     * Note that unit is just the constructor, although we don't
     * support function composition on functions like in Haskell
     *
     * <p>
     * Note that the fist argument to bind (Transform<A> ma) is just this
     */
    <B> Transform<B> bind(Function<A, Transform<B>> f);
}

The Empty transform is the starting point for all others, because it defines operations on a transform with no object.

public abstract class EmptyTransform<A> implements Transform<A> {

    private final boolean valid;

    private final String invalidReason;

    protected EmptyTransform(boolean valid, String invalidReason) {
        this.valid = valid;
        this.invalidReason = (valid) ? "Valid" : invalidReason;
    }

    @Override
    public boolean isValid() {
        return valid;
    }

    @Override
    public String invalidReason() {
        return invalidReason;
    }

    @Override
    public boolean skipIfInvalid() {
        return ! valid;
    }

    @Override
    public void haltIfInvalid() {
        if (! isValid()) {
            throw new InvalidResourceException(invalidReason);
        }
    }

    @Override
    public A get() {
        // normally we'd call haltIfInvalid, but since the EmptyResource
        // is always invalid, fail. Subclasses must override this method
        // or else.
        throw new InvalidResourceException(invalidReason);
    }

    public static <B> Transform<B> invalidBecause(String invalidReason) {
        // In this space, all invalid instances look alike ūüėČ
        return (Transform<B>)new EmptyTransform<B>(false, invalidReason) { };
    }

    @Override
    public <B> Transform<B> bind(Function<A, Transform<B>> f) {
        return EmptyTransform.invalidBecause("Empty transform");
    }
}

The GenericTransform is just a holder for some arbitrary type, but doesn’t define any operations for a specific type.

public class GenericTransform<A> extends EmptyTransform<A> {

    final A o;

    public static <A> GenericTransform<A> invalidBecause(String reason) {
        return new GenericTransform<A>(false, reason);
    }

    public static <A> GenericTransform<A> of(A a) {
        return new GenericTransform<>(a, (a == null) ? Object.class : a.getClass());
    }

    public static <A> GenericTransform<A> of(A a, Class<?>; clazz) {
        return new GenericTransform<>(a, clazz);
    }

    private GenericTransform(boolean valid, String reason) {
        super(false, reason);
        this.o = null;
    }

    @SuppressWarnings("unchecked")
    private GenericTransform(Object o, Class<?> clazz) {
        super(clazz.isInstance(o), "Null or wrongly typed " + clazz.getSimpleName());
        this.o = isValid() ? (A)clazz.cast(o) : null;
    }

    public A get() {
        haltIfInvalid();
        return o;
    }

    public String toString() {
        return Objects.toStringHelper(this)
                .add("valid", isValid())
                .add("invalidReason", invalidReason())
                .add("object", o)
                .toString();
    }

    @Override
    public <B> Transform<B> bind(Function<A, Transform<B>> f) {

        // Guard: stop if we're invalid
        if (skipIfInvalid()) {
            return EmptyTransform.because(invalidReason());
        }

        return f.apply(o);
    }
}

The StringTransform is an example of where you would add operations for a specific type.

public class StringTransform extends EmptyTransform<String> {

    private static final String NO_STRING = "";

    private final String string;

    public static StringTransform invalidBecause(String reason) {
        return new StringTransform(false, reason);
    }

    private StringTransform(boolean valid, String reason) {
        super(false, reason);
        this.string = null;
    }

    public StringTransform(Object string) {
        super(string != null, "String is null");
        this.string = isValid() ? string.toString() : NO_STRING;
    }

    public String get() {
        haltIfInvalid();
        return string;
    }

    @Override
    public <B> Transform<B> bind(Function<String, Transform<B>> f) {

        // Guard: stop if we're invalid
        if (skipIfInvalid()) {
            return anyInvalid();
        }

        return f.apply(string);
    }
}

Leave a comment

Filed under Uncategorized

Static Java!

Java (and probably C#) certainly supports the¬†programming style where everything is a static method. At least, I work with people who’s first instinct is, “I can make that a static method!” But there are some subtle costs that catch up with you over time.

1) Java is an Object Oriented language. And it drives the functional guys nuts, but really it holds up pretty well. The idea behind OO is bundling functionality with state to have small units of data and functionality that preserve their semantics by hiding state and exposing only the functions that make sense in that context.

By moving to a class with only static methods, you’re breaking the “state” part of the equation. But the state still has to live somewhere. So what I’ve seen over time is that a class with all static methods begins to have more and more complicated parameter lists, because the state is moving from the class and into the function calls.

After you make a class with all static methods, run through and just survey how many of those methods have a single, common parameter. It’s a hint that that parameter should either be the containing class for those functions, or else that parameter should be an attribute of an instance.

2) The rules for OO are pretty well understood. After a while, you can look at a class design and see if it meets criteria like SOLID. And after lots of practice unit testing, you develop a good sense of what makes a class “right-sized” and “coherent”. But there aren’t good rules for a class with all static methods, and no real reason why you shouldn’t just bundle up everything in there. The class is open in your editor, so what the heck? Just add your new method there. After a while, your application turns into a number of competing “God Objects”, each trying to dominate the world. Again, refactoring them into smaller units is highly subjective and hard to tell if you’ve got it right.

3) Interfaces are one of the most powerful features of Java. Class inheritance has proven to be problematic, but programming with interfaces remains one of the most powerful tricks of the language. (ditto C#) All-static classes can’t put themselves into that model.

4) It slams the door on important OO techniques that you can’t take advantage of. So you may work for years with only a hammer in your toolbox, without realizing how much easier things would have been if you’d had a screwdriver, too.

4.5) It creates the hardest, most unbreakable compile-time dependencies possible. So, for example if you have FileSystem.saveFile() then there is no way of changing that, short of faking out your JVM at run time. Which means that every class that references your static function class has a hard, compile-time dependency on that specific implementation, which makes extension almost impossible, and complicates testing tremendously. You can test the static class in isolation, but it becomes very difficult to test the classes that refer to that class in isolation.

5) You’ll drive your co-workers crazy. Most of the professionals I work with take their code seriously, and pay attention to at least some level of design principles. Setting aside the core intent of a language will have them pulling their hair out because they’ll be constantly refactoring the code.

When I’m in a language, I always try to use a language well. So, for example, when I’m in Java, I use good OO design, because then I’m really leveraging the language for what it does. When I’m in Python, I mix module-level functions with occasional classes — I could only write classes in Python, but then I think I wouldn’t be using the language for what it’s good at.

Another tactic is to use a language badly, and them complain about all the problems its causing. But that applies to pretty much any technology.

The key feature of Java is managing complexity in small, testable units that hang together so they’re easy to understand. Java emphasizes clear interface definitions independent of the implementation — which is a huge benefit. That’s why it (and other similar OO languages) remain so widely used. For all the verbosity and ritualism, when I’m done with a big Java app, I always feel like the ideas are more cleanly separated in code than my projects in a more dynamic language.

It’s a hard one, though. I’ve seen people get the “all static” bug, and it’s kind of hard to talk them out of it. But I’ve seen them have a big sense of relief when they get over it.

Leave a comment

Filed under Uncategorized

My 15 Minutes is Over

Looks like my 15 minutes is over. Some guys in Sweden got in an argument about OAuth and REST, and someone who agreed with me posted a link to one of my blog posts to support his argument. But the furor has died down, and my hits have dropped to lower than where they were to before that.

So I guess that’s the end of my 15 minutes of fame.

I think there needs to be a corollary to the “15 minutes of fame” that says don’t expect your 15 minutes of fame to be contiguous: you’re more likely to experience milliseconds of fame scattered over your adult life, adding up to 15 minutes.

 

Leave a comment

Filed under opinionizing

Genetic vs. Exhaustive

Genetic algorithms are basically a guided trial-and-error methodology. The only advantage I can think of for a GA over a exhaustive search is that since GA optimizes a fitness function in steps, you¬†might¬†get to an optimal solution faster, because the GA will favor solutions that are incrementally better. An exhaustive search that’s guaranteed to find a solution might go the long way around.

  • If “computation resources” means CPU but not memory, then the GA gene pool¬†might¬†have a smaller memory footprint, requiring less memory.
  • On the other hand, because of hill climbing, a GA might get stuck on a local maximum & any mutations might not be sufficient to shake it loose.
  • Also, the search time for GA grows exponentially with the size of the input, so an exhaustive search may end up being faster after all, depending on the size of the space you’re searching, and if you can constrain the size of the space by excluding possibilities.

Lately I’ve been thinking about GAs in terms of “entropy per second” and measuring the progress of my GA as a measure of how many distinct possibilities it runs through per second. Then I can compare that to the total entropy of the problem space. If a brute force search can beat that entropy per second with parallelization or fast processors, then a GA’s “best score” isn’t any better than a discovered “best score”.

But I’ve just been noodling that over; I haven’t actually instrumented a GA that way yet.

(Entropy is ln(N) for N possible states, or ln(N) - sum(n * ln(n) for all n) / N where n is the possible ways to achieve one result out of N possible outcomes.)

Leave a comment

Filed under computer algorithms, machine learning, opinionizing

Simple Explaination of Monad

Here’s my shot at a simple explanation of what a monad is in the computer world. First, I have to explain what a monoid is.

A monoid is (roughly)  a group of functions who take a parameter of a certain type and return a result of that type.

That’s useful because it allows easy “chaining” or “composition” of functions. Any fluent interface in java is a monoid:

MyThing result = myThing.spin()
                        .treadle()
                        .flip()
                        .invert()
                        .aggregate();

Note that the input to each function is a MyThing, and the result is a MyThing.

A caution; strictly speaking, a monoid needs a few extra things. It needs an “identity” function that just returns the parameter, it needs a binary function that takes two items of the type and returns a new instance of the type, and those binary functions must be associative, so you can change the order of evaluation:

a.do(b).do(c) == a.do(b.do(c))

A monad takes the idea of monoid one step further; instead of mapping a class onto itself, it takes a container type which accepts any underlying type, and lifts that into a monoid space. So it’s a monoid using a container type:

A monad is (roughly) a group of functions who take a container type as an argument and return a result of the container type.

But note that the underlying type of the container may change. So you could map Container<String> to Container<Integer>. But the functions would still line up.

MyContainer<Int> myContainer = new MyContainer(1);
MyContainer<String> result = myContainer.<Int>rehash()
                                        .fondue()
                                        .serializeToString()
                                        .prefix()
                                        .onomatapeize();

So that you are able to “chain” or “compose” the functions and get a result of any arbitrary type! That’s a pretty strong claim for a statically-typed language like Java.

(Note that in Haskell, the signatures of the functions would be a little different. They would be functions of (Int, MyContainer) or (String, MyContainer), and it would be the job of the container to appropriately apply the function to it’s contents. But it’s hard to make a direct comparison from Java to Haskell, because Haskell works much more directly with function composition. In Haskell, the monad is actually composing the type constructors, rather than with instances of the type.)

But the core idea behind the monad is to create a monoid on a container type, in order to get the benefits of chaining and composition across different underlying types.

Leave a comment

Filed under computer architecture, computer OOD, design patterns