Skip navigation

I don’t even know if this program I am building deserves the title of grammar checker.  Regardless of what it deserves, it must be referred to as something.  I will refer to it as a probabilistic grammar checker.

My basic premises are these:

  1. It is impossible to precisely parse natural languages, due to grammatical ambiguities
  2. Simple algorithms can often provide results that are superior to more complex algorithms
  3. Simple algorithms are easier to analyze
  4. Effective analysis can bring significant insights on how to improve an algorithm

Based on these premises, I am building a brain-dead grammar checker.  As I test it, I will see its failings, and from those failings will be able to analyze its weaknesses.  With a knowledge of its weaknesses, I will be able to either enhance the algorithm in minor ways, or discover what issues exist in my underlying assumptions.

I’ll end up doing a coloring system that will be similar to those commonly found in word processors, but will highlight the entire text in a gradient, to allow for more effective analysis.  The highlighting will be based on observed usage vs. “standard” usage.  Visual output is a lot easier to analyze and debug than pure numeric output.

Because of its probabilistic approach, I hope that it will have a few unusual strengths.  Foremost in my mind is that it should work on most grammars (I am most familiar with Latin and Germanic grammars, so I would hate to speculate on performance with grammars like Chinese or Finnish).

I also have to further examine my chunking algorithms.  Currently, I am using punctuation as my primary boundary markers.  I am interested in seeing how effective an approach based on rough prosodic boundaries might be.  Generally, I would rather stay away from actual grammatical analysis as much as possible.  I fear that even a cursory consideration of its benefits might open Pandora’s box.

Unfortunately, chunking algorithms of any sort have the ability to disrupt the generality of my approach.

Once I have the something working and run some tests, I will be able to very clearly see how wrong my hypotheses were and may throw the entire thing out, chalking it up as a learning experience.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: