How to mirror CPAN
There are several ways to mirror CPAN depending upon what you want to achieve.
How do I create a private or offline mirror?
minicpan from CPAN::Mini is the best tool for this. Also look at CPAN::Mini::Inject which allows you to add your own modules into your private mirror.
Requirements for a full / public mirror
- Good internet connectivity
- Around 1GB of storage space for just the current modules.
- Around 32GB of storage space for the full mirror.
It's highly recommended that you also subscribe to the announcements-only cpan-mirrors mailing list by emailing cpan-mirrors-subscribe at perl.org.
CPAN::Mini provides you with a minimal mirror of CPAN (the latest version of all modules). This makes working offline easy, it is the best tool if you are running a private mirror.
New: rrr-client allows instant mirroring, and should be used on official public mirrors where possible. See instant mirroring instructions.
rsync is the best tool if you need to mirror the whole of CPAN or if you are providing a public mirror. Rsync Instructions.
Only use FTP if these other methods are absolutely impossible. Never mirror with HTTP - you will end up with a million duplicate files in tens of gigabytes.
Which CPAN Mirror should I use?
You can find your nearest rsync enabled site on http://www.cpan.org/SITES.html, or use mirrors.json especially if you are building a tool which lets the user select a mirror.
You can also sync from
"tier 1 mirrors"), though you currently might get better
performance from a "local" mirror.
Please limit to once or twice a day. For more frequent updates please see Instant mirroring.
On Unix systems
/usr/bin/rsync -av --delete cpan-rsync.perl.org::CPAN /project/CPAN/
Using 'crontab' you can make rsync run once a day, for example
40 4 * * * sleep $(expr $RANDOM \% 7200); /usr/bin/rsync -a --delete cpan-rsync.perl.org::CPAN /project/CPAN/
The "sleep $(...);" statement makes the command delay up to 2 hours before running rsync; the advantage of this is that you (and everybody else) won't access the mirror at the same time.
Unless you are mirroring to an SSD you might get timeouts using --delete-after when many symlinks are being purged. Using --delete will work properly.
If you have a problem with permissions (files are created with mode
-rw-------), set umask in your cronjob :
40 4 * * * umask 022 ; sleep ... ; /usr/bin/rsync ...
The umask 022 allows rsync to set proper permissions for files and directories.
On Windows systems
C:\Program Files\Rsync\rsync -av --delete cpan-rsync.perl.org::CPAN /project/CPAN/
Using the 'AT' tool, you can schedule rsync to run daily, for example:
AT 20:00 /every:M,T,W,Th,F,S,Su "C:\Program Files\Rsync\rsync -a --delete cpan-rsync.perl.org::CPAN /project/CPAN/"
How do I create a public mirror?
- We are not currently adding new public mirrors.
"Instant mirroring" keeps your CPAN mirror up-to-date by continuously tracking the CPAN master; picking up the changes from the master, a short time (minutes) after they occur.
Instant mirroring is used for all Tier 1 mirrors (so cpan-rsync.perl.org stays in sync across mirrors).
To use "instant mirroring", you need a special client: "rrr-client" or "iim".
"rrr-client" is part of the File::Rsync::Mirror::Recent
(also known as
rrr) package ; it is the official client, used
on the CPAN master to get updates from PAUSE : the true heart and soul of "all things
perl", see the setup
guide for more details.
"iim" is an alternative for "rrr-client" ; basically it does the same thing, but it is more efficient (on start-up) and has some features that may be helpful to CPAN mirror operators.