Welcome to Molecule Benchmarks Documentation
Molecule Benchmarks is a comprehensive benchmark suite for evaluating generative models for molecules. This package provides standardized metrics and evaluation protocols for assessing the quality of molecular generation models in drug discovery and cheminformatics.
Features
Comprehensive Metrics: Validity, uniqueness, novelty, diversity, and similarity metrics
Standard Benchmarks: Implements metrics from Moses, GuacaMol, and FCD papers
Easy Integration: Simple interface for integrating with any generative model
Direct SMILES Evaluation: Benchmark pre-generated SMILES lists without implementing a model interface
Multiple Datasets: Built-in support for QM9, Moses, and GuacaMol datasets
Efficient Computation: Optimized for large-scale evaluation with multiprocessing support
Quick Start
Installation
pip install molecule-benchmarks
Basic Usage
from molecule_benchmarks import Benchmarker, SmilesDataset
# Load a dataset
dataset = SmilesDataset.load_qm9_dataset(subset_size=10000)
# Initialize benchmarker
benchmarker = Benchmarker(
dataset=dataset,
num_samples_to_generate=10000,
device="cpu" # or "cuda" for GPU
)
# Your generated SMILES
generated_smiles = [
"CCO", # Ethanol
"CC(=O)O", # Acetic acid
"c1ccccc1", # Benzene
# ... more molecules
]
# Run benchmarks
results = benchmarker.benchmark(generated_smiles)
print(results)
Table of Contents
Contents:
- Installation
- Quick Start Guide
- Examples
- API Reference
canonicalize_smiles_without_stereochemistry()canonicalize_smiles_list()remove_duplicates()canonicalize()canonicalize_list()smiles_to_rdkit_mol()split_charged_mol()initialise_neutralisation_reactions()neutralise_charges()filter_and_canonicalize()calculate_internal_pairwise_similarities()calculate_pairwise_similarities()get_fingerprints_from_smileslist()get_fingerprints()get_mols()highest_tanimoto_precalc_fps()continuous_kldiv()discrete_kldiv()calculate_pc_descriptors()parse_molecular_formula()is_valid_smiles()filter_valid_smiles()download_with_cache()available_cpu_count()mapper()get_mol()average_agg_tanimoto()fingerprints()fingerprint()fraction_passes_filters()get_filters()mol_passes_filters()internal_diversity()compute_scaffolds()get_n_rings()compute_scaffold()cos_similarity()fragmenter()compute_fragments()main()
- Datasets
- Valid, Unique, and Novel Fraction
- Valid Fraction
- Unique Fraction
- Valid and Unique Fraction
- Novel Fraction
- Unique Fraction at 1000
- Unique Fraction at 10000
- Fraction Passing Moses Filters
- SNN Score (Similarity to Nearest Neighbor)
- Internal Diversity (IntDiv)
- Scaffold Similarity
- Fragment Similarity
- KL Divergence Score
- FCD Score (Fréchet ChemNet Distance)
- Quality Assessment
- Model Comparison
- Trade-offs
- Statistical Significance
- Conditional Metrics
- Pharmacophore Metrics
- Scaffold Metrics
- Comprehensive Evaluation
- Context-Aware Interpretation
- Reporting Guidelines
- Contributing
- Changelog
- Changelog