A linux command line to find and remove duplicates with sort and uniq

Everyday is the same thing... how could I be efficient without unix tools... Here are some examples that save my life everyday (ok, not everyday, but at leat one time a week...). The objective is to manipulate a text file, sort lines, find and extract duplicates lines.

The example is based on a very simple data file, but it works the same way with something more complicated… ;-)

So let’s begin with the data file (file to work on). Note that “0000″ is visible twice :

cat data.txt
0000
1111
4444
2222
0000

Now let’s sort the lines :

cat data.txt | sort
0000
0000
1111
2222
4444

Now let’s find unique lines :

cat data.txt | sort | uniq
0000
1111
2222
4444

Now let’s find duplicates lines :

cat data.txt | sort | uniq -d
0000