Group all the records by the smiles
column, count the number of occurrences for each unique SMILES string, and then filter the results to show only those strings that appear more than once…
SELECT
smiles,
COUNT(*) AS number_of_duplicates
FROM
master_chems
GROUP BY
smiles
HAVING
COUNT(*) > 1
ORDER BY
number_of_duplicates DESC;
Show how many records in total exist in your table and then subtracts the number of unique SMILES strings. The result is the total number of rows that are duplicates of another row…
SELECT
COUNT(*) - COUNT(DISTINCT smiles) AS total_duplicate_records
FROM
master_chems;