Sign in to follow this  
Followers 0
Harry_The_Bustard

Identifying Duplicated Media Files

1 post in this topic

My wife’s Music folder is a mess: duplicated files (a mix of mp3 and m4a) in several directories with some having slightly different names - some inside (several) iTunes folders and some (actually most) not. Now whilst there are several applications out there that seem to be able to find these - and delete them if need be - I thought to seek a script or similar which would find them and the results of which (ideally stored in a file) she or I could go through and tidy up. (She’s not concerned with the various iTunes library structures as she’ll create a single one when the duplicate files have been identified and removed - though she may well keep some of the existing structures - e.g. folders by artist and then by album.) Having had a good look around I found a set of nested Unix commands here but they don’t work - curiously they being reproduced on other sites and still not working despite modification there. I had a crack at debugging it/them (despite a rudimentary grasp of Unix) but failed at the last hurdle - i.e. the last command - and it’s help with that I’m after.


The debugging process involved the creation of a folder with three files in it (File 1.m4a / File 2.m4a / File 3.m4a) then copying two of them (File 1.m4a / File 2.m4a) to a sub-folder where I renamed one of them (File 2.m4a -> File 2 Renamed Copy.m4a) and into which I added another file (File 4.m4a) with the following result…


/Volumes/My Data Partition/Scratch


File 1.m4a

File 2.m4a

File 3.m4a


/Volumes/My Data Partition/Scratch/xxx


File 1.m4a

File 2 Renamed Copy.m4a

File 4.m4a


I moved to the higher directory so…


cd ../../Volumes/My\ Data\ Partition/Scratch


and then ran the commands in this…


find . -size +3M ! -type d -exec cksum {} ";" | sort | tee /tmp/all.txt | cut -f 1,2 -d ' ' | uniq -d | grep -hif - /tmp/all.txt > dup.txt


…one at a time as shown here…


find (Find all files in and below current directory with...)


. {I don’t know what this does.}


-size +3M (with size greater than 3 MB)


! -type d (exclude directory names)


-exec cksum {} ";" (Return Checksum)


Example Outcome...


3485438841 4181457 ./File 1.m4a

1721974798 4154234 ./File 2.m4a

3661829629 1102103 ./File 3.m4a

3485438841 4181457 ./xxx/File 1.m4a

1721974798 4154234 ./xxx/File 2 Renamed Copy.m4a

4235682258 2565825 ./xxx/File 4.m4a


| sort (Sort the results in checksum order)


Example Outcome...


1721974798 4154234 ./File 2.m4a

1721974798 4154234 ./xxx/File 2 Renamed Copy.m4a

3485438841 4181457 ./File 1.m4a

3485438841 4181457 ./xxx/File 1.m4a

3661829629 1102103 ./File 3.m4a

4235682258 2565825 ./xxx/File 4.m4a


| tee /tmp/all.txt (Write the output stream - i.e. sorted set of results - to a file called all.txt in folder tmp.)


Example Outcome...


(Exactly as previous.)


| cut -f 1,2 -d ' ' (Extract the first two elements from each line in the input stream - e.g. 1721974798 4154234 - and place them in the output stream.)


Example Outcome...


1721974798 4154234

1721974798 4154234

3485438841 4181457

3485438841 4181457

3661829629 1102103

4235682258 2565825


| uniq -d (Compare each line - i.e. the checksum for and size of each file - in the input stream with the next and if they are the same then write both lines to the output stream.)


Example Outcome...


1721974798 4154234

3485438841 4181457


| grep -hif /tmp/all.txt > dup.txt (Read each line of the input stream - i.e. the checksums and file sizes - and if they pattern match some or all of the contents of all.txt then write the line to dup.txt.)


…which I would hope would show…


1721974798 4154234 ./File 2.m4a

1721974798 4154234 ./xxx/File 2 Renamed Copy.m4a

3485438841 4181457 ./File 1.m4a

3485438841 4181457 ./xxx/File 1.m4a


…in the file dup.txt file but it’s empty. It would be nice too if dup.txt excluded the file size and checksum for ease of reading - though I recognise such would be handy for verifying the results.


[end]

Edited by Harry_The_Bustard
0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0