kmeans/README.md

# kmeans

k-means clustering algorithm implementation written in Go

## What It Does

[k-means clustering](https://en.wikipedia.org/wiki/K-means_clustering) partitions
a multi-dimensional data set into `k` clusters, where each data point belongs
to the cluster with the nearest mean, serving as a prototype of the cluster.

![kmeans animation](https://github.com/muesli/kmeans/blob/master/kmeans.gif)

## Example

```go
import "github.com/muesli/kmeans"

// set up a random two-dimensional data set (float64 values between 0.0 and 1.0)
var d kmeans.Points
for x := 0; x < 1024; x++ {
	d = append(d, kmeans.Point{
		rand.Float64(),
		rand.Float64(),
	})
}

// Partition the data points into 16 clusters
km := kmeans.New()
clusters, err := km.Partition(d, 16)

for _, c := range clusters {
	fmt.Printf("Centered at x: %.2f y: %.2f\n", c.Center[0]*255.0, c.Center[1]*255.0)
	fmt.Printf("Matching data points: %+v\n\n", c.Points)
}
```

## Complexity

If `k` (the amount of clusters) and `d` (the dimensions) are fixed, the problem
can be exactly solved in time O(n<sup>dk+1</sup>), where `n` is the number of
entities to be clustered.

The running time of the algorithm is O(nkdi), where `n` is the number of
`d`-dimensional vectors, `k` the number of clusters and `i` the number of
iterations needed until convergence. On data that does have a clustering
structure, the number of iterations until convergence is often small, and
results only improve slightly after the first dozen iterations. The algorithm
is therefore often considered to be of "linear" complexity in practice,
although it is in the worst case superpolynomial when performed until
convergence.

## Options

You can greatly reduce the running time by adjusting the required delta
threshold. With the following options the algorithm finishes when less than 5%
of the data points shifted their cluster assignment in the last iteration:

```go
km, err := kmeans.NewWithOptions(0.05, nil)
```

The default setting for the delta threshold is 0.01 (1%).

If you are working with two-dimensional data sets, kmeans can generate
beautiful graphs (like the one above) for each iteration of the algorithm:

```go
km, err := kmeans.NewWithOptions(0.01, kmeans.SimplePlotter{})
```

Careful: this will generate PNGs in your current working directory.

You can write your own plotters by implementing the `kmeans.Plotter` interface.

## Development

[![GoDoc](https://godoc.org/github.com/golang/gddo?status.svg)](https://godoc.org/github.com/muesli/kmeans)
[![Build Status](https://travis-ci.org/muesli/kmeans.svg?branch=master)](https://travis-ci.org/muesli/kmeans)
[![Coverage Status](https://coveralls.io/repos/github/muesli/kmeans/badge.svg?branch=master)](https://coveralls.io/github/muesli/kmeans?branch=master)
[![Go ReportCard](http://goreportcard.com/badge/muesli/kmeans)](http://goreportcard.com/report/muesli/kmeans)
Initial commit 2018-05-26 12:00:20 +08:00			`# kmeans`
Updated README and added animation 2018-05-26 12:37:17 +08:00
Updated README title 2018-05-26 14:43:47 +08:00			`k-means clustering algorithm implementation written in Go`
Updated README and added animation 2018-05-26 12:37:17 +08:00
			`## What It Does`

Move wikipedia link to second paragraph 2018-05-26 13:55:33 +08:00			`[k-means clustering](https://en.wikipedia.org/wiki/K-means_clustering) partitions`
Don't confuse readers with different meanings of 'n' 2018-05-28 02:17:39 +08:00			a multi-dimensional data set into `k` clusters, where each data point belongs
			`to the cluster with the nearest mean, serving as a prototype of the cluster.`
Updated README and added animation 2018-05-26 12:37:17 +08:00
			`![kmeans animation](https://github.com/muesli/kmeans/blob/master/kmeans.gif)`

Added example to README 2018-05-26 13:34:34 +08:00			`## Example`

Enable syntax highlighting in README 2018-05-26 13:51:58 +08:00			```go
Added example to README 2018-05-26 13:34:34 +08:00			`import "github.com/muesli/kmeans"`

Use random data points in the simple example and README 2018-05-28 07:05:38 +08:00			`// set up a random two-dimensional data set (float64 values between 0.0 and 1.0)`
Use kmeans.Points in examples 2018-05-26 17:32:25 +08:00			`var d kmeans.Points`
Use random data points in the simple example and README 2018-05-28 07:05:38 +08:00			`for x := 0; x < 1024; x++ {`
			`d = append(d, kmeans.Point{`
			`rand.Float64(),`
			`rand.Float64(),`
			`})`
Added example to README 2018-05-26 13:34:34 +08:00			`}`

Fixed simple example 2018-05-26 13:36:22 +08:00			`// Partition the data points into 16 clusters`
Use random data points in the simple example and README 2018-05-28 07:05:38 +08:00			`km := kmeans.New()`
Rename Run to Partition 2018-05-26 14:07:28 +08:00			`clusters, err := km.Partition(d, 16)`
Added example to README 2018-05-26 13:34:34 +08:00
			`for _, c := range clusters {`
Fixed simple example 2018-05-26 13:36:22 +08:00			`fmt.Printf("Centered at x: %.2f y: %.2f\n", c.Center[0]255.0, c.Center[1]255.0)`
Use random data points in the simple example and README 2018-05-28 07:05:38 +08:00			`fmt.Printf("Matching data points: %+v\n\n", c.Points)`
Added example to README 2018-05-26 13:34:34 +08:00			`}`
			```

Added paragraph about algorithm's complexity to README 2018-05-26 15:02:30 +08:00			`## Complexity`

			If `k` (the amount of clusters) and `d` (the dimensions) are fixed, the problem
			can be exactly solved in time O(n<sup>dk+1</sup>), where `n` is the number of
			`entities to be clustered.`

			The running time of the algorithm is O(nkdi), where `n` is the number of
			`d`-dimensional vectors, `k` the number of clusters and `i` the number of
			`iterations needed until convergence. On data that does have a clustering`
			`structure, the number of iterations until convergence is often small, and`
			`results only improve slightly after the first dozen iterations. The algorithm`
			`is therefore often considered to be of "linear" complexity in practice,`
			`although it is in the worst case superpolynomial when performed until`
			`convergence.`

Updated documentation 2018-05-27 21:54:41 +08:00			`## Options`

Added paragraph about algorithm's complexity to README 2018-05-26 15:02:30 +08:00			`You can greatly reduce the running time by adjusting the required delta`
Updated documentation 2018-05-27 21:54:41 +08:00			`threshold. With the following options the algorithm finishes when less than 5%`
			`of the data points shifted their cluster assignment in the last iteration:`

			```go
The Plotter interface lets you attach custom plotters 2018-05-28 04:06:57 +08:00			`km, err := kmeans.NewWithOptions(0.05, nil)`
Updated documentation 2018-05-27 21:54:41 +08:00			```

Documented default delta thrshold in README 2018-05-27 21:57:24 +08:00			`The default setting for the delta threshold is 0.01 (1%).`

Updated documentation 2018-05-27 21:54:41 +08:00			`If you are working with two-dimensional data sets, kmeans can generate`
			`beautiful graphs (like the one above) for each iteration of the algorithm:`
Added paragraph about algorithm's complexity to README 2018-05-26 15:02:30 +08:00
			```go
The Plotter interface lets you attach custom plotters 2018-05-28 04:06:57 +08:00			`km, err := kmeans.NewWithOptions(0.01, kmeans.SimplePlotter{})`
Added paragraph about algorithm's complexity to README 2018-05-26 15:02:30 +08:00			```

Updated documentation 2018-05-27 21:54:41 +08:00			`Careful: this will generate PNGs in your current working directory.`

The Plotter interface lets you attach custom plotters 2018-05-28 04:06:57 +08:00			You can write your own plotters by implementing the `kmeans.Plotter` interface.

Updated README and added animation 2018-05-26 12:37:17 +08:00			`## Development`

			`[![GoDoc](https://godoc.org/github.com/golang/gddo?status.svg)](https://godoc.org/github.com/muesli/kmeans)`
			`[![Build Status](https://travis-ci.org/muesli/kmeans.svg?branch=master)](https://travis-ci.org/muesli/kmeans)`
Added coverage badge 2018-05-26 17:33:14 +08:00			`[![Coverage Status](https://coveralls.io/repos/github/muesli/kmeans/badge.svg?branch=master)](https://coveralls.io/github/muesli/kmeans?branch=master)`
Updated README and added animation 2018-05-26 12:37:17 +08:00			`[![Go ReportCard](http://goreportcard.com/badge/muesli/kmeans)](http://goreportcard.com/report/muesli/kmeans)`