162306a36Sopenharmony_ci.. SPDX-License-Identifier: GPL-2.0
262306a36Sopenharmony_ci.. include:: <isonum.txt>
362306a36Sopenharmony_ci
462306a36Sopenharmony_ci===========================================
562306a36Sopenharmony_ciFast & Portable DES encryption & decryption
662306a36Sopenharmony_ci===========================================
762306a36Sopenharmony_ci
862306a36Sopenharmony_ci.. note::
962306a36Sopenharmony_ci
1062306a36Sopenharmony_ci   Below is the original README file from the descore.shar package,
1162306a36Sopenharmony_ci   converted to ReST format.
1262306a36Sopenharmony_ci
1362306a36Sopenharmony_ci------------------------------------------------------------------------------
1462306a36Sopenharmony_ci
1562306a36Sopenharmony_cides - fast & portable DES encryption & decryption.
1662306a36Sopenharmony_ci
1762306a36Sopenharmony_ciCopyright |copy| 1992  Dana L. How
1862306a36Sopenharmony_ci
1962306a36Sopenharmony_ciThis program is free software; you can redistribute it and/or modify
2062306a36Sopenharmony_ciit under the terms of the GNU Library General Public License as published by
2162306a36Sopenharmony_cithe Free Software Foundation; either version 2 of the License, or
2262306a36Sopenharmony_ci(at your option) any later version.
2362306a36Sopenharmony_ci
2462306a36Sopenharmony_ciThis program is distributed in the hope that it will be useful,
2562306a36Sopenharmony_cibut WITHOUT ANY WARRANTY; without even the implied warranty of
2662306a36Sopenharmony_ciMERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
2762306a36Sopenharmony_ciGNU Library General Public License for more details.
2862306a36Sopenharmony_ci
2962306a36Sopenharmony_ciYou should have received a copy of the GNU Library General Public License
3062306a36Sopenharmony_cialong with this program; if not, write to the Free Software
3162306a36Sopenharmony_ciFoundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
3262306a36Sopenharmony_ci
3362306a36Sopenharmony_ciAuthor's address: how@isl.stanford.edu
3462306a36Sopenharmony_ci
3562306a36Sopenharmony_ci.. README,v 1.15 1992/05/20 00:25:32 how E
3662306a36Sopenharmony_ci
3762306a36Sopenharmony_ci==>> To compile after untarring/unsharring, just ``make`` <<==
3862306a36Sopenharmony_ci
3962306a36Sopenharmony_ciThis package was designed with the following goals:
4062306a36Sopenharmony_ci
4162306a36Sopenharmony_ci1.	Highest possible encryption/decryption PERFORMANCE.
4262306a36Sopenharmony_ci2.	PORTABILITY to any byte-addressable host with a 32bit unsigned C type
4362306a36Sopenharmony_ci3.	Plug-compatible replacement for KERBEROS's low-level routines.
4462306a36Sopenharmony_ci
4562306a36Sopenharmony_ciThis second release includes a number of performance enhancements for
4662306a36Sopenharmony_ciregister-starved machines.  My discussions with Richard Outerbridge,
4762306a36Sopenharmony_ci71755.204@compuserve.com, sparked a number of these enhancements.
4862306a36Sopenharmony_ci
4962306a36Sopenharmony_ciTo more rapidly understand the code in this package, inspect desSmallFips.i
5062306a36Sopenharmony_ci(created by typing ``make``) BEFORE you tackle desCode.h.  The latter is set
5162306a36Sopenharmony_ciup in a parameterized fashion so it can easily be modified by speed-daemon
5262306a36Sopenharmony_cihackers in pursuit of that last microsecond.  You will find it more
5362306a36Sopenharmony_ciilluminating to inspect one specific implementation,
5462306a36Sopenharmony_ciand then move on to the common abstract skeleton with this one in mind.
5562306a36Sopenharmony_ci
5662306a36Sopenharmony_ci
5762306a36Sopenharmony_ciperformance comparison to other available des code which i could
5862306a36Sopenharmony_cicompile on a SPARCStation 1 (cc -O4, gcc -O2):
5962306a36Sopenharmony_ci
6062306a36Sopenharmony_cithis code (byte-order independent):
6162306a36Sopenharmony_ci
6262306a36Sopenharmony_ci  - 30us per encryption (options: 64k tables, no IP/FP)
6362306a36Sopenharmony_ci  - 33us per encryption (options: 64k tables, FIPS standard bit ordering)
6462306a36Sopenharmony_ci  - 45us per encryption (options:  2k tables, no IP/FP)
6562306a36Sopenharmony_ci  - 48us per encryption (options:  2k tables, FIPS standard bit ordering)
6662306a36Sopenharmony_ci  - 275us to set a new key (uses 1k of key tables)
6762306a36Sopenharmony_ci
6862306a36Sopenharmony_ci	this has the quickest encryption/decryption routines i've seen.
6962306a36Sopenharmony_ci	since i was interested in fast des filters rather than crypt(3)
7062306a36Sopenharmony_ci	and password cracking, i haven't really bothered yet to speed up
7162306a36Sopenharmony_ci	the key setting routine. also, i have no interest in re-implementing
7262306a36Sopenharmony_ci	all the other junk in the mit kerberos des library, so i've just
7362306a36Sopenharmony_ci	provided my routines with little stub interfaces so they can be
7462306a36Sopenharmony_ci	used as drop-in replacements with mit's code or any of the mit-
7562306a36Sopenharmony_ci	compatible packages below. (note that the first two timings above
7662306a36Sopenharmony_ci	are highly variable because of cache effects).
7762306a36Sopenharmony_ci
7862306a36Sopenharmony_cikerberos des replacement from australia (version 1.95):
7962306a36Sopenharmony_ci
8062306a36Sopenharmony_ci  - 53us per encryption (uses 2k of tables)
8162306a36Sopenharmony_ci  - 96us to set a new key (uses 2.25k of key tables)
8262306a36Sopenharmony_ci
8362306a36Sopenharmony_ci	so despite the author's inclusion of some of the performance
8462306a36Sopenharmony_ci	improvements i had suggested to him, this package's
8562306a36Sopenharmony_ci	encryption/decryption is still slower on the sparc and 68000.
8662306a36Sopenharmony_ci	more specifically, 19-40% slower on the 68020 and 11-35% slower
8762306a36Sopenharmony_ci	on the sparc,  depending on the compiler;
8862306a36Sopenharmony_ci	in full gory detail (ALT_ECB is a libdes variant):
8962306a36Sopenharmony_ci
9062306a36Sopenharmony_ci	===============	==============	===============	=================
9162306a36Sopenharmony_ci	compiler   	machine		desCore	libdes	ALT_ECB	slower by
9262306a36Sopenharmony_ci	===============	==============	===============	=================
9362306a36Sopenharmony_ci	gcc 2.1 -O2	Sun 3/110	304  uS	369.5uS	461.8uS	 22%
9462306a36Sopenharmony_ci	cc      -O1	Sun 3/110	336  uS	436.6uS	399.3uS	 19%
9562306a36Sopenharmony_ci	cc      -O2	Sun 3/110	360  uS	532.4uS	505.1uS	 40%
9662306a36Sopenharmony_ci	cc      -O4	Sun 3/110	365  uS	532.3uS	505.3uS	 38%
9762306a36Sopenharmony_ci	gcc 2.1 -O2	Sun 4/50	 48  uS	 53.4uS	 57.5uS	 11%
9862306a36Sopenharmony_ci	cc      -O2	Sun 4/50	 48  uS	 64.6uS	 64.7uS	 35%
9962306a36Sopenharmony_ci	cc      -O4	Sun 4/50	 48  uS	 64.7uS	 64.9uS	 35%
10062306a36Sopenharmony_ci	===============	==============	===============	=================
10162306a36Sopenharmony_ci
10262306a36Sopenharmony_ci	(my time measurements are not as accurate as his).
10362306a36Sopenharmony_ci
10462306a36Sopenharmony_ci   the comments in my first release of desCore on version 1.92:
10562306a36Sopenharmony_ci
10662306a36Sopenharmony_ci   - 68us per encryption (uses 2k of tables)
10762306a36Sopenharmony_ci   - 96us to set a new key (uses 2.25k of key tables)
10862306a36Sopenharmony_ci
10962306a36Sopenharmony_ci	this is a very nice package which implements the most important
11062306a36Sopenharmony_ci	of the optimizations which i did in my encryption routines.
11162306a36Sopenharmony_ci	it's a bit weak on common low-level optimizations which is why
11262306a36Sopenharmony_ci	it's 39%-106% slower.  because he was interested in fast crypt(3) and
11362306a36Sopenharmony_ci	password-cracking applications,  he also used the same ideas to
11462306a36Sopenharmony_ci	speed up the key-setting routines with impressive results.
11562306a36Sopenharmony_ci	(at some point i may do the same in my package).  he also implements
11662306a36Sopenharmony_ci	the rest of the mit des library.
11762306a36Sopenharmony_ci
11862306a36Sopenharmony_ci	(code from eay@psych.psy.uq.oz.au via comp.sources.misc)
11962306a36Sopenharmony_ci
12062306a36Sopenharmony_cifast crypt(3) package from denmark:
12162306a36Sopenharmony_ci
12262306a36Sopenharmony_ci	the des routine here is buried inside a loop to do the
12362306a36Sopenharmony_ci	crypt function and i didn't feel like ripping it out and measuring
12462306a36Sopenharmony_ci	performance. his code takes 26 sparc instructions to compute one
12562306a36Sopenharmony_ci	des iteration; above, Quick (64k) takes 21 and Small (2k) takes 37.
12662306a36Sopenharmony_ci	he claims to use 280k of tables but the iteration calculation seems
12762306a36Sopenharmony_ci	to use only 128k.  his tables and code are machine independent.
12862306a36Sopenharmony_ci
12962306a36Sopenharmony_ci	(code from glad@daimi.aau.dk via alt.sources or comp.sources.misc)
13062306a36Sopenharmony_ci
13162306a36Sopenharmony_ciswedish reimplementation of Kerberos des library
13262306a36Sopenharmony_ci
13362306a36Sopenharmony_ci  - 108us per encryption (uses 34k worth of tables)
13462306a36Sopenharmony_ci  - 134us to set a new key (uses 32k of key tables to get this speed!)
13562306a36Sopenharmony_ci
13662306a36Sopenharmony_ci	the tables used seem to be machine-independent;
13762306a36Sopenharmony_ci	he seems to have included a lot of special case code
13862306a36Sopenharmony_ci	so that, e.g., ``long`` loads can be used instead of 4 ``char`` loads
13962306a36Sopenharmony_ci	when the machine's architecture allows it.
14062306a36Sopenharmony_ci
14162306a36Sopenharmony_ci	(code obtained from chalmers.se:pub/des)
14262306a36Sopenharmony_ci
14362306a36Sopenharmony_cicrack 3.3c package from england:
14462306a36Sopenharmony_ci
14562306a36Sopenharmony_ci	as in crypt above, the des routine is buried in a loop. it's
14662306a36Sopenharmony_ci	also very modified for crypt.  his iteration code uses 16k
14762306a36Sopenharmony_ci	of tables and appears to be slow.
14862306a36Sopenharmony_ci
14962306a36Sopenharmony_ci	(code obtained from aem@aber.ac.uk via alt.sources or comp.sources.misc)
15062306a36Sopenharmony_ci
15162306a36Sopenharmony_ci``highly optimized`` and tweaked Kerberos/Athena code (byte-order dependent):
15262306a36Sopenharmony_ci
15362306a36Sopenharmony_ci  - 165us per encryption (uses 6k worth of tables)
15462306a36Sopenharmony_ci  - 478us to set a new key (uses <1k of key tables)
15562306a36Sopenharmony_ci
15662306a36Sopenharmony_ci	so despite the comments in this code, it was possible to get
15762306a36Sopenharmony_ci	faster code AND smaller tables, as well as making the tables
15862306a36Sopenharmony_ci	machine-independent.
15962306a36Sopenharmony_ci	(code obtained from prep.ai.mit.edu)
16062306a36Sopenharmony_ci
16162306a36Sopenharmony_ciUC Berkeley code (depends on machine-endedness):
16262306a36Sopenharmony_ci  -  226us per encryption
16362306a36Sopenharmony_ci  - 10848us to set a new key
16462306a36Sopenharmony_ci
16562306a36Sopenharmony_ci	table sizes are unclear, but they don't look very small
16662306a36Sopenharmony_ci	(code obtained from wuarchive.wustl.edu)
16762306a36Sopenharmony_ci
16862306a36Sopenharmony_ci
16962306a36Sopenharmony_cimotivation and history
17062306a36Sopenharmony_ci======================
17162306a36Sopenharmony_ci
17262306a36Sopenharmony_cia while ago i wanted some des routines and the routines documented on sun's
17362306a36Sopenharmony_ciman pages either didn't exist or dumped core.  i had heard of kerberos,
17462306a36Sopenharmony_ciand knew that it used des,  so i figured i'd use its routines.  but once
17562306a36Sopenharmony_cii got it and looked at the code,  it really set off a lot of pet peeves -
17662306a36Sopenharmony_ciit was too convoluted, the code had been written without taking
17762306a36Sopenharmony_ciadvantage of the regular structure of operations such as IP, E, and FP
17862306a36Sopenharmony_ci(i.e. the author didn't sit down and think before coding),
17962306a36Sopenharmony_ciit was excessively slow,  the author had attempted to clarify the code
18062306a36Sopenharmony_ciby adding MORE statements to make the data movement more ``consistent``
18162306a36Sopenharmony_ciinstead of simplifying his implementation and cutting down on all data
18262306a36Sopenharmony_cimovement (in particular, his use of L1, R1, L2, R2), and it was full of
18362306a36Sopenharmony_ciidiotic ``tweaks`` for particular machines which failed to deliver significant
18462306a36Sopenharmony_cispeedups but which did obfuscate everything.  so i took the test data
18562306a36Sopenharmony_cifrom his verification program and rewrote everything else.
18662306a36Sopenharmony_ci
18762306a36Sopenharmony_cia while later i ran across the great crypt(3) package mentioned above.
18862306a36Sopenharmony_cithe fact that this guy was computing 2 sboxes per table lookup rather
18962306a36Sopenharmony_cithan one (and using a MUCH larger table in the process) emboldened me to
19062306a36Sopenharmony_cido the same - it was a trivial change from which i had been scared away
19162306a36Sopenharmony_ciby the larger table size.  in his case he didn't realize you don't need to keep
19262306a36Sopenharmony_cithe working data in TWO forms, one for easy use of half the sboxes in
19362306a36Sopenharmony_ciindexing, the other for easy use of the other half; instead you can keep
19462306a36Sopenharmony_ciit in the form for the first half and use a simple rotate to get the other
19562306a36Sopenharmony_cihalf.  this means i have (almost) half the data manipulation and half
19662306a36Sopenharmony_cithe table size.  in fairness though he might be encoding something particular
19762306a36Sopenharmony_cito crypt(3) in his tables - i didn't check.
19862306a36Sopenharmony_ci
19962306a36Sopenharmony_cii'm glad that i implemented it the way i did, because this C version is
20062306a36Sopenharmony_ciportable (the ifdef's are performance enhancements) and it is faster
20162306a36Sopenharmony_cithan versions hand-written in assembly for the sparc!
20262306a36Sopenharmony_ci
20362306a36Sopenharmony_ci
20462306a36Sopenharmony_ciporting notes
20562306a36Sopenharmony_ci=============
20662306a36Sopenharmony_ci
20762306a36Sopenharmony_cione thing i did not want to do was write an enormous mess
20862306a36Sopenharmony_ciwhich depended on endedness and other machine quirks,
20962306a36Sopenharmony_ciand which necessarily produced different code and different lookup tables
21062306a36Sopenharmony_cifor different machines.  see the kerberos code for an example
21162306a36Sopenharmony_ciof what i didn't want to do; all their endedness-specific ``optimizations``
21262306a36Sopenharmony_ciobfuscate the code and in the end were slower than a simpler machine
21362306a36Sopenharmony_ciindependent approach.  however, there are always some portability
21462306a36Sopenharmony_ciconsiderations of some kind, and i have included some options
21562306a36Sopenharmony_cifor varying numbers of register variables.
21662306a36Sopenharmony_ciperhaps some will still regard the result as a mess!
21762306a36Sopenharmony_ci
21862306a36Sopenharmony_ci1) i assume everything is byte addressable, although i don't actually
21962306a36Sopenharmony_ci   depend on the byte order, and that bytes are 8 bits.
22062306a36Sopenharmony_ci   i assume word pointers can be freely cast to and from char pointers.
22162306a36Sopenharmony_ci   note that 99% of C programs make these assumptions.
22262306a36Sopenharmony_ci   i always use unsigned char's if the high bit could be set.
22362306a36Sopenharmony_ci2) the typedef ``word`` means a 32 bit unsigned integral type.
22462306a36Sopenharmony_ci   if ``unsigned long`` is not 32 bits, change the typedef in desCore.h.
22562306a36Sopenharmony_ci   i assume sizeof(word) == 4 EVERYWHERE.
22662306a36Sopenharmony_ci
22762306a36Sopenharmony_cithe (worst-case) cost of my NOT doing endedness-specific optimizations
22862306a36Sopenharmony_ciin the data loading and storing code surrounding the key iterations
22962306a36Sopenharmony_ciis less than 12%.  also, there is the added benefit that
23062306a36Sopenharmony_cithe input and output work areas do not need to be word-aligned.
23162306a36Sopenharmony_ci
23262306a36Sopenharmony_ci
23362306a36Sopenharmony_ciOPTIONAL performance optimizations
23462306a36Sopenharmony_ci==================================
23562306a36Sopenharmony_ci
23662306a36Sopenharmony_ci1) you should define one of ``i386,`` ``vax,`` ``mc68000,`` or ``sparc,``
23762306a36Sopenharmony_ci   whichever one is closest to the capabilities of your machine.
23862306a36Sopenharmony_ci   see the start of desCode.h to see exactly what this selection implies.
23962306a36Sopenharmony_ci   note that if you select the wrong one, the des code will still work;
24062306a36Sopenharmony_ci   these are just performance tweaks.
24162306a36Sopenharmony_ci2) for those with functional ``asm`` keywords: you should change the
24262306a36Sopenharmony_ci   ROR and ROL macros to use machine rotate instructions if you have them.
24362306a36Sopenharmony_ci   this will save 2 instructions and a temporary per use,
24462306a36Sopenharmony_ci   or about 32 to 40 instructions per en/decryption.
24562306a36Sopenharmony_ci
24662306a36Sopenharmony_ci   note that gcc is smart enough to translate the ROL/R macros into
24762306a36Sopenharmony_ci   machine rotates!
24862306a36Sopenharmony_ci
24962306a36Sopenharmony_cithese optimizations are all rather persnickety, yet with them you should
25062306a36Sopenharmony_cibe able to get performance equal to assembly-coding, except that:
25162306a36Sopenharmony_ci
25262306a36Sopenharmony_ci1) with the lack of a bit rotate operator in C, rotates have to be synthesized
25362306a36Sopenharmony_ci   from shifts.  so access to ``asm`` will speed things up if your machine
25462306a36Sopenharmony_ci   has rotates, as explained above in (3) (not necessary if you use gcc).
25562306a36Sopenharmony_ci2) if your machine has less than 12 32-bit registers i doubt your compiler will
25662306a36Sopenharmony_ci   generate good code.
25762306a36Sopenharmony_ci
25862306a36Sopenharmony_ci   ``i386`` tries to configure the code for a 386 by only declaring 3 registers
25962306a36Sopenharmony_ci   (it appears that gcc can use ebx, esi and edi to hold register variables).
26062306a36Sopenharmony_ci   however, if you like assembly coding, the 386 does have 7 32-bit registers,
26162306a36Sopenharmony_ci   and if you use ALL of them, use ``scaled by 8`` address modes with displacement
26262306a36Sopenharmony_ci   and other tricks, you can get reasonable routines for DesQuickCore... with
26362306a36Sopenharmony_ci   about 250 instructions apiece.  For DesSmall... it will help to rearrange
26462306a36Sopenharmony_ci   des_keymap, i.e., now the sbox # is the high part of the index and
26562306a36Sopenharmony_ci   the 6 bits of data is the low part; it helps to exchange these.
26662306a36Sopenharmony_ci
26762306a36Sopenharmony_ci   since i have no way to conveniently test it i have not provided my
26862306a36Sopenharmony_ci   shoehorned 386 version.  note that with this release of desCore, gcc is able
26962306a36Sopenharmony_ci   to put everything in registers(!), and generate about 370 instructions apiece
27062306a36Sopenharmony_ci   for the DesQuickCore... routines!
27162306a36Sopenharmony_ci
27262306a36Sopenharmony_cicoding notes
27362306a36Sopenharmony_ci============
27462306a36Sopenharmony_ci
27562306a36Sopenharmony_cithe en/decryption routines each use 6 necessary register variables,
27662306a36Sopenharmony_ciwith 4 being actively used at once during the inner iterations.
27762306a36Sopenharmony_ciif you don't have 4 register variables get a new machine.
27862306a36Sopenharmony_ciup to 8 more registers are used to hold constants in some configurations.
27962306a36Sopenharmony_ci
28062306a36Sopenharmony_cii assume that the use of a constant is more expensive than using a register:
28162306a36Sopenharmony_ci
28262306a36Sopenharmony_cia) additionally, i have tried to put the larger constants in registers.
28362306a36Sopenharmony_ci   registering priority was by the following:
28462306a36Sopenharmony_ci
28562306a36Sopenharmony_ci	- anything more than 12 bits (bad for RISC and CISC)
28662306a36Sopenharmony_ci	- greater than 127 in value (can't use movq or byte immediate on CISC)
28762306a36Sopenharmony_ci	- 9-127 (may not be able to use CISC shift immediate or add/sub quick),
28862306a36Sopenharmony_ci	- 1-8 were never registered, being the cheapest constants.
28962306a36Sopenharmony_ci
29062306a36Sopenharmony_cib) the compiler may be too stupid to realize table and table+256 should
29162306a36Sopenharmony_ci   be assigned to different constant registers and instead repetitively
29262306a36Sopenharmony_ci   do the arithmetic, so i assign these to explicit ``m`` register variables
29362306a36Sopenharmony_ci   when possible and helpful.
29462306a36Sopenharmony_ci
29562306a36Sopenharmony_cii assume that indexing is cheaper or equivalent to auto increment/decrement,
29662306a36Sopenharmony_ciwhere the index is 7 bits unsigned or smaller.
29762306a36Sopenharmony_cithis assumption is reversed for 68k and vax.
29862306a36Sopenharmony_ci
29962306a36Sopenharmony_cii assume that addresses can be cheaply formed from two registers,
30062306a36Sopenharmony_cior from a register and a small constant.
30162306a36Sopenharmony_cifor the 68000, the ``two registers and small offset`` form is used sparingly.
30262306a36Sopenharmony_ciall index scaling is done explicitly - no hidden shifts by log2(sizeof).
30362306a36Sopenharmony_ci
30462306a36Sopenharmony_cithe code is written so that even a dumb compiler
30562306a36Sopenharmony_cishould never need more than one hidden temporary,
30662306a36Sopenharmony_ciincreasing the chance that everything will fit in the registers.
30762306a36Sopenharmony_ciKEEP THIS MORE SUBTLE POINT IN MIND IF YOU REWRITE ANYTHING.
30862306a36Sopenharmony_ci
30962306a36Sopenharmony_ci(actually, there are some code fragments now which do require two temps,
31062306a36Sopenharmony_cibut fixing it would either break the structure of the macros or
31162306a36Sopenharmony_cirequire declaring another temporary).
31262306a36Sopenharmony_ci
31362306a36Sopenharmony_ci
31462306a36Sopenharmony_cispecial efficient data format
31562306a36Sopenharmony_ci==============================
31662306a36Sopenharmony_ci
31762306a36Sopenharmony_cibits are manipulated in this arrangement most of the time (S7 S5 S3 S1)::
31862306a36Sopenharmony_ci
31962306a36Sopenharmony_ci	003130292827xxxx242322212019xxxx161514131211xxxx080706050403xxxx
32062306a36Sopenharmony_ci
32162306a36Sopenharmony_ci(the x bits are still there, i'm just emphasizing where the S boxes are).
32262306a36Sopenharmony_cibits are rotated left 4 when computing S6 S4 S2 S0::
32362306a36Sopenharmony_ci
32462306a36Sopenharmony_ci	282726252423xxxx201918171615xxxx121110090807xxxx040302010031xxxx
32562306a36Sopenharmony_ci
32662306a36Sopenharmony_cithe rightmost two bits are usually cleared so the lower byte can be used
32762306a36Sopenharmony_cias an index into an sbox mapping table. the next two x'd bits are set
32862306a36Sopenharmony_cito various values to access different parts of the tables.
32962306a36Sopenharmony_ci
33062306a36Sopenharmony_ci
33162306a36Sopenharmony_cihow to use the routines
33262306a36Sopenharmony_ci
33362306a36Sopenharmony_cidatatypes:
33462306a36Sopenharmony_ci	pointer to 8 byte area of type DesData
33562306a36Sopenharmony_ci	used to hold keys and input/output blocks to des.
33662306a36Sopenharmony_ci
33762306a36Sopenharmony_ci	pointer to 128 byte area of type DesKeys
33862306a36Sopenharmony_ci	used to hold full 768-bit key.
33962306a36Sopenharmony_ci	must be long-aligned.
34062306a36Sopenharmony_ci
34162306a36Sopenharmony_ciDesQuickInit()
34262306a36Sopenharmony_ci	call this before using any other routine with ``Quick`` in its name.
34362306a36Sopenharmony_ci	it generates the special 64k table these routines need.
34462306a36Sopenharmony_ciDesQuickDone()
34562306a36Sopenharmony_ci	frees this table
34662306a36Sopenharmony_ci
34762306a36Sopenharmony_ciDesMethod(m, k)
34862306a36Sopenharmony_ci	m points to a 128byte block, k points to an 8 byte des key
34962306a36Sopenharmony_ci	which must have odd parity (or -1 is returned) and which must
35062306a36Sopenharmony_ci	not be a (semi-)weak key (or -2 is returned).
35162306a36Sopenharmony_ci	normally DesMethod() returns 0.
35262306a36Sopenharmony_ci
35362306a36Sopenharmony_ci	m is filled in from k so that when one of the routines below
35462306a36Sopenharmony_ci	is called with m, the routine will act like standard des
35562306a36Sopenharmony_ci	en/decryption with the key k. if you use DesMethod,
35662306a36Sopenharmony_ci	you supply a standard 56bit key; however, if you fill in
35762306a36Sopenharmony_ci	m yourself, you will get a 768bit key - but then it won't
35862306a36Sopenharmony_ci	be standard.  it's 768bits not 1024 because the least significant
35962306a36Sopenharmony_ci	two bits of each byte are not used.  note that these two bits
36062306a36Sopenharmony_ci	will be set to magic constants which speed up the encryption/decryption
36162306a36Sopenharmony_ci	on some machines.  and yes, each byte controls
36262306a36Sopenharmony_ci	a specific sbox during a specific iteration.
36362306a36Sopenharmony_ci
36462306a36Sopenharmony_ci	you really shouldn't use the 768bit format directly;  i should
36562306a36Sopenharmony_ci	provide a routine that converts 128 6-bit bytes (specified in
36662306a36Sopenharmony_ci	S-box mapping order or something) into the right format for you.
36762306a36Sopenharmony_ci	this would entail some byte concatenation and rotation.
36862306a36Sopenharmony_ci
36962306a36Sopenharmony_ciDes{Small|Quick}{Fips|Core}{Encrypt|Decrypt}(d, m, s)
37062306a36Sopenharmony_ci	performs des on the 8 bytes at s into the 8 bytes at
37162306a36Sopenharmony_ci	``d. (d,s: char *)``.
37262306a36Sopenharmony_ci
37362306a36Sopenharmony_ci	uses m as a 768bit key as explained above.
37462306a36Sopenharmony_ci
37562306a36Sopenharmony_ci	the Encrypt|Decrypt choice is obvious.
37662306a36Sopenharmony_ci
37762306a36Sopenharmony_ci	Fips|Core determines whether a completely standard FIPS initial
37862306a36Sopenharmony_ci	and final permutation is done; if not, then the data is loaded
37962306a36Sopenharmony_ci	and stored in a nonstandard bit order (FIPS w/o IP/FP).
38062306a36Sopenharmony_ci
38162306a36Sopenharmony_ci	Fips slows down Quick by 10%, Small by 9%.
38262306a36Sopenharmony_ci
38362306a36Sopenharmony_ci	Small|Quick determines whether you use the normal routine
38462306a36Sopenharmony_ci	or the crazy quick one which gobbles up 64k more of memory.
38562306a36Sopenharmony_ci	Small is 50% slower then Quick, but Quick needs 32 times as much
38662306a36Sopenharmony_ci	memory.  Quick is included for programs that do nothing but DES,
38762306a36Sopenharmony_ci	e.g., encryption filters, etc.
38862306a36Sopenharmony_ci
38962306a36Sopenharmony_ci
39062306a36Sopenharmony_ciGetting it to compile on your machine
39162306a36Sopenharmony_ci=====================================
39262306a36Sopenharmony_ci
39362306a36Sopenharmony_cithere are no machine-dependencies in the code (see porting),
39462306a36Sopenharmony_ciexcept perhaps the ``now()`` macro in desTest.c.
39562306a36Sopenharmony_ciALL generated tables are machine independent.
39662306a36Sopenharmony_ciyou should edit the Makefile with the appropriate optimization flags
39762306a36Sopenharmony_cifor your compiler (MAX optimization).
39862306a36Sopenharmony_ci
39962306a36Sopenharmony_ci
40062306a36Sopenharmony_ciSpeeding up kerberos (and/or its des library)
40162306a36Sopenharmony_ci=============================================
40262306a36Sopenharmony_ci
40362306a36Sopenharmony_cinote that i have included a kerberos-compatible interface in desUtil.c
40462306a36Sopenharmony_cithrough the functions des_key_sched() and des_ecb_encrypt().
40562306a36Sopenharmony_cito use these with kerberos or kerberos-compatible code put desCore.a
40662306a36Sopenharmony_ciahead of the kerberos-compatible library on your linker's command line.
40762306a36Sopenharmony_ciyou should not need to #include desCore.h;  just include the header
40862306a36Sopenharmony_cifile provided with the kerberos library.
40962306a36Sopenharmony_ci
41062306a36Sopenharmony_ciOther uses
41162306a36Sopenharmony_ci==========
41262306a36Sopenharmony_ci
41362306a36Sopenharmony_cithe macros in desCode.h would be very useful for putting inline des
41462306a36Sopenharmony_cifunctions in more complicated encryption routines.
415