Harry_The_Bustard 67 Posted April 10, 2015 (edited) My wife’s Music folder is a mess: duplicated files (a mix of mp3 and m4a) in several directories with some having slightly different names - some inside (several) iTunes folders and some (actually most) not. Now whilst there are several applications out there that seem to be able to find these - and delete them if need be - I thought to seek a script or similar which would find them and the results of which (ideally stored in a file) she or I could go through and tidy up. (She’s not concerned with the various iTunes library structures as she’ll create a single one when the duplicate files have been identified and removed - though she may well keep some of the existing structures - e.g. folders by artist and then by album.) Having had a good look around I found a set of nested Unix commands here but they don’t work - curiously they being reproduced on other sites and still not working despite modification there. I had a crack at debugging it/them (despite a rudimentary grasp of Unix) but failed at the last hurdle - i.e. the last command - and it’s help with that I’m after. The debugging process involved the creation of a folder with three files in it (File 1.m4a / File 2.m4a / File 3.m4a) then copying two of them (File 1.m4a / File 2.m4a) to a sub-folder where I renamed one of them (File 2.m4a -> File 2 Renamed Copy.m4a) and into which I added another file (File 4.m4a) with the following result… /Volumes/My Data Partition/Scratch File 1.m4a File 2.m4a File 3.m4a /Volumes/My Data Partition/Scratch/xxx File 1.m4a File 2 Renamed Copy.m4a File 4.m4a I moved to the higher directory so… cd ../../Volumes/My\ Data\ Partition/Scratch and then ran the commands in this… find . -size +3M ! -type d -exec cksum {} ";" | sort | tee /tmp/all.txt | cut -f 1,2 -d ' ' | uniq -d | grep -hif - /tmp/all.txt > dup.txt …one at a time as shown here… find (Find all files in and below current directory with...) . {I don’t know what this does.} -size +3M (with size greater than 3 MB) ! -type d (exclude directory names) -exec cksum {} ";" (Return Checksum) Example Outcome... 3485438841 4181457 ./File 1.m4a 1721974798 4154234 ./File 2.m4a 3661829629 1102103 ./File 3.m4a 3485438841 4181457 ./xxx/File 1.m4a 1721974798 4154234 ./xxx/File 2 Renamed Copy.m4a 4235682258 2565825 ./xxx/File 4.m4a | sort (Sort the results in checksum order) Example Outcome... 1721974798 4154234 ./File 2.m4a 1721974798 4154234 ./xxx/File 2 Renamed Copy.m4a 3485438841 4181457 ./File 1.m4a 3485438841 4181457 ./xxx/File 1.m4a 3661829629 1102103 ./File 3.m4a 4235682258 2565825 ./xxx/File 4.m4a | tee /tmp/all.txt (Write the output stream - i.e. sorted set of results - to a file called all.txt in folder tmp.) Example Outcome... (Exactly as previous.) | cut -f 1,2 -d ' ' (Extract the first two elements from each line in the input stream - e.g. 1721974798 4154234 - and place them in the output stream.) Example Outcome... 1721974798 4154234 1721974798 4154234 3485438841 4181457 3485438841 4181457 3661829629 1102103 4235682258 2565825 | uniq -d (Compare each line - i.e. the checksum for and size of each file - in the input stream with the next and if they are the same then write both lines to the output stream.) Example Outcome... 1721974798 4154234 3485438841 4181457 | grep -hif /tmp/all.txt > dup.txt (Read each line of the input stream - i.e. the checksums and file sizes - and if they pattern match some or all of the contents of all.txt then write the line to dup.txt.) …which I would hope would show… 1721974798 4154234 ./File 2.m4a 1721974798 4154234 ./xxx/File 2 Renamed Copy.m4a 3485438841 4181457 ./File 1.m4a 3485438841 4181457 ./xxx/File 1.m4a …in the file dup.txt file but it’s empty. It would be nice too if dup.txt excluded the file size and checksum for ease of reading - though I recognise such would be handy for verifying the results. [end] Edited April 10, 2015 by Harry_The_Bustard Share this post Link to post Share on other sites