statistics

Counting Words in a Markdown File

I wanted to add an estimate of words and characters in wiki pages for a program I am writing CWiki There was some guidance on this topic (in Python) here. Of course the original Markdown.pl was a good source for matching regular expressions too. What follows is a copy of the “seed page” “About the Word Count” from the CWiki wiki program: Begin Quote Up at the top of each page, below the author and creation/modification dates, there is a line showing the number of characters in the page along with an estimate of the number of words in the page.

A New Version of the Confidence Interval Program

Recently, I wrote about updating an old program that did the Sign Test. Well, I have lots of old programs that could stand a bit of refreshing. Another of the simple ones calculates the confidence interval around the proportion of successes in a series of Bernoulli trials. I wrote about it way back in 2011. The original was written in Java and Swing many years ago. It is still available in a repository on Bitbucket.

An Updated Sign Test Program

Long ago, I wrote a post about a small program to calculate the probabilities of a sign test. A lot has happened since then. The sign test is still useful to me on occasion, but the application framework used to write the original program is now unsupported. Too, the original program used Java’s Swing framework for the GUI. The new official GUI framework for Java is JavaFX. So I’ve updated the program a bit.

Wilcoxon Matched Pairs

At my previous employer, our goal was to stain tissue samples such that a pathologist could examine them microscopically and easily make unambiguous diagnoses of disease state. Experiments typically involved getting subjective judgments from pathologists about which samples were “better” in some way. How do you do statistics on those type of results?

The Sign Test

Sometimes weakness is a strength. That certainly seems to be the case for the lowly sign test. It is about the simplest statistical significance test imaginable. But if it tells you something is important, it probably is. Usually when you hear people talk about the “power” of a statistical test, they are referring to the ability of the test to detect a significant difference when one exists. For example, Student’s t test is a favorite and very powerful test for differences in means when you have data meeting the underlying assumptions of the test.

Binomial Confidence Intervals -- BinomConf

Way back in my career there was a need to calculate binomial confidence intervals on experiments with very large numbers of trials (thousands to tens of thousands.) The statistics packages of the time couldn’t seem to handle such large numbers of trials.