Software
See also: Electrical Engineering.
Computer science
Algorithms and data structures
https://en.wikipedia.org/wiki/Template:Data_structures_and_algorithms
See also Optimization
Sorting
- mergesort O(n) space
- bucket and radix sort are O(n) time and space
- quicksort O(log n) space
Hashing
- Resolving hash collisions:
- chaining = linked list, O(1+f) if simple uniform hashing
- open addressing using a probe sequence e.g. linear probing
- O(1/(1-f)) if uniform hashing (all m! seqs equally likely)
Binary search tree: O(log d), can become imbalanced e.g. sorted inputs
- left < right node
- in order, preorder, postorder (left, right, root)
Heap: root > children
B-tree is a sorted, balanced search tree allowing many children per node. More shallow and thus faster than a binary search tree. Common in filesystems and databases.
- balancing property: all leaves appear on the same level. Insert into a full node by splitting it into two nodes and inserting the new key into the parent, splitting further as necessary.
- sorted property: inner nodes contain keys which partition child values.
- root, internal, and leaf nodes.
- inner nodes exclude the root and leaves.
- has at least m/2 children
- contains keys so that child subtrees have values between the keys.
- root has at least two children unless it is a leaf
- all leaves on the same level
Red-black tree
- constrain colors so that height is approx balanced
- red node has two black children
- root and every leaf (nil) is black
- every path from node to leaves contain same number of black nodes
Algorithmic paradigms
- Recursion
- Divide and conquer: break the problem into easier subproblems
- Master theorem solves many recurrence relations.
- Optimal substructure: the optimal solution can be constructed from optimal solutions of its subproblems.
- Greedy algorithms take the locally optimal choice each time, without considering the overall consequences.
- Dynamic programming
A stack is last in first out (LIFO).
A queue is first in first out (FIFO). Efficiently implemented using a circular buffer.
Search
Strings
- 1960. A trie stores a set of strings or ordered lists. Each node stores a character or list element.
- A suffix tree is a trie of all possible suffixes of a text.
- Longest common substring
- Longest repeated substring
- Longest palindromic substring
- String search
- 1977. Boyer-Moore is standard. Match on the tail of the pattern, and jump multiple characters on mismatch. Faster as pattern length increases.
- 1970. Knuth-Morris-Pratt (KMP): first linear-time algorithm, matching each character only once. When a mismatch occurs, the next possible match can be many characters later, based on the precomputed partial match table.
- 1987. Rabin-Karp uses a rolling hash to filter out positions of the text that cannot match the pattern. Basis for plagiarism detection.
- Text editing
- A piece table stores a document as a list of spans referencing the immutable original buffer or an append-only add buffer.
- A gap buffer stores data before the cursor and data after the cursor.
- A rope is a binary tree where each leaf holds a span of the string, and each node holds the length of its left subtree. log n string editing.
- https://en.wikipedia.org/wiki/Template:Strings
Mental calculation
Multiplication algorithm
- long multiplication is O(n^2).
- Equivalent methods: grid method and lattice multiplication.
- Often implemented as shift-and-add in binary.
- 1960. Karatsuba algorithm uses three multiplications instead of four to multiply two two-digit numbers: O(n^log_2(3)).
- Toom-Cook multiplication: splitting into more parts brings the exponent close to 1, but the constant factor becomes impractical
- 1968. Schönhage-Strassen algorithm: O(n log n log log n) using a Fourier transform over a modulus.
- 2019. O(n log n) algorithm by van der Hoeven.
- 1999. Kronecker substitution reduces polynomial multiplication to integer multiplication. We can determine the coefficients of a polynomial by evaluating it at a single value x which is larger than any possible coefficient.
- Horner’s method evaluates a polynomial of degree n with n multiplications and n additions.
- Binary multiplier
- Method of complements: half the range encodes the negative numbers.
- https://en.wikipedia.org/wiki/Divided_differences
Computational geometry
- See also: spatial analysis.
- Convex hull
- Smallest enclosing box
- Nearest neighbor search using space partioning via a spatial index
- 1984. R-tree (“rectangle”): group bounding boxes. R*-tree reinserts points to optimize query performance.
- Closest pair of points problem: discretize to a grid. For each point, search the adjacent 3^d grid points.
- Bounding sphere
- Sweep line algorithm
- Boolean operations on polygons or clipping
- Delaunay triangulation of a convex hull into triangles whose circumcircles do not contain any of the points, maximizing the size of the smallest angle.
- Constructive solid geometry
- https://en.wikipedia.org/wiki/Geometry_processing
https://en.wikipedia.org/wiki/Interval_tree
A zipper is a “derivative” of a data type, representing it as a sequence of basic operations. Useful for traversing and functional editing of data structures.
roaringbitmap.org
https://en.wikipedia.org/wiki/Shunting_yard_algorithm
https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
https://en.wikipedia.org/wiki/Forward-backward_algorithm
https://en.wikipedia.org/wiki/Horner%27s_method#See_also
https://en.wikipedia.org/wiki/List_of_algorithms
Graphs
See also graph theory.
https://en.wikipedia.org/wiki/Template:Graph_traversal_algorithms
Representations
- Adjacency matrix: O(V^2), fast adjacency lookup
- Adjacency list: sparse graphs O(E), fast neighbor lookup
dfs: tree, back, forward, cross edges
topological sort
- iteratively remove nodes with 0 indegree, using counter
- reverse postvisit order
Minimum spanning tree MST
- Greedy algorithms use the cut property
- Kruskal’s algorithm: greedy add lightest edge (maintains a forest w/ UF)
- Prim’s algorithm: start with arbitrary node and grow a tree greedily
- Steiner tree problem: minimum tree that contains all terminals.
Dijkstra’s single-source shortest paths, without negative edges:
- repeatedly get the nearest node in queue and improve distance estimates
- O(V^2) using an array with O(n) find_min
- O(ElogV) using binary heap and O(V log V) with Fibonacci heap
- Note that the heap of (priority, key) has to support decreasing the priority of a node; can be done in two ways:
- just add the new priority, and track deleted nodes
- track the location of each key
Negative edges (no neg cycles): Bellman-Ford: update all edges V times
Dag shortest paths: update all edges in source linearized order = O(E)
All-pairs shortest paths: Floyd-Warshall, DP on set of allowed intermediate nodes, O(V^3). Also Johnson’s alg.
https://en.wikipedia.org/wiki/Graph_rewriting
Complexity
Models of computation
Combinatorial logic < finite-state-machine < pushdown automaton < Turing machine
https://en.wikipedia.org/wiki/State-transition_table
https://en.wikipedia.org/wiki/Automata_theory
P contains decision problems solvable in polynomial time.
NP contains problems are verifiable in polynomial time = solvable in polynomial time by a nondeterministic Turing machine, e.g. LP, SAT, factoring.
NP-hard: every other NP problem is reducible to it, e.g. halting problem.
NP-complete: NP and NP-hard, e.g. SAT, max clique, knapsack, independent set (IS), set cover, integer linear programming (ILP), TSP.
Fixed parameter tractable (FPT) such as fixed parameter linear: O(n), but often exponential in a kernel parameter.
polynomial-time approximation scheme (PTAS) produces a solution with 1 + ε of optimal.
See also graph theory
Other complexity classes
- bounded-error probabilistic polynomial time (BPP) are decision problems solvable by a probabilistic Turing machine in polynomial time with accuracy strictly above random.
- PSPACE is solvable using polynomial space.
- Halting problem
- 1994. PPA (Polynomial Parity Argument): find a second vertex of odd degree given an implicit graph, which is a polynomial time algorithm for listing the neighbors of any vertex. Unknown whether PPA = NP.
Method of conditional probabilities: choose random actions so that the conditional probability of failure, given the current state, is less than 1. Can approximate the probability using a conditional expectation of the number of bad events, or a pessimistic estimator.
Probabilistically checkable proof
PCP[r(n),q(n)]
can be verified in polynomial time using r(n) random bits and reading q(n) proof bits. A verifier should always accept correct proofs and reject incorrect proofs most of the time.
- Zero-knowledge proof can be verified without giving the verifier any knowledge of the solution.
- Lagrange polynomial is the unique polynomial of degree at most n-1 that interpolates n points.
- Shamir’s secret sharing uses a finite field. Represent the secret as the polynomial intercept, and randomly choose n-1 coefficients. The shares are n points on the polynomial.
PCP theorem: PCP[O(log n), O(1)] = NP
. NP is defined as PCP[0, poly(n)]
.
PCP[poly(n), poly(n)] = NEXP
- SAT, IS, set cover are hard to approximate.
1948. Claude Shannon.
Codes can be fixed-length or variable length.
Compression
- dictionary coding (LZ)
- Burrows–Wheeler transform and run-length encoding (bzip2): sort all rotations, read off last column
- Huffman code (DEFLATE): optimal prefix code is a full binary tree, cost of a symbol is its depth in the tree. Start with n characters, merge two least frequent chars together.
- https://en.wikipedia.org/wiki/Template:Compression_methods
In a prefix code, no codeword is a prefix of another codeword, so it is uniquely decodable without markers.
- Huffman coding is an optimal prefix code.
- Package-merge algorithm: O(nL) greedy algorithm to find an optimal Huffman code of size n where no code is longer than L.
- Golomb coding is an optimal prefix code for a geometric distribution. It uses a parameter M to split an input into a unary coded quotient and a binary encoded remainder. Rice coding uses a power of two for M.
1949. Complementary sequences have 0 autocorrelation.
1953. A Barker code has an ideal autocorrelation. Used as a synchronizing pattern.
A self-synchronizing code can recover from bit slips like insertions or deletions. No overlapping part of two codewords is a valid codeword.
Noisy-channel coding theorem defines the Shannon limit assuming Gaussian noise. Entropy coding approaches the Shannon limit.
Arithmetic coding represents the message in a single arbitrary-precision fraction, q between 0 inclusive and 1 exclusive.
- Asymmetric numeral systems encodes a message into a single natural number.
Burrows-Wheeler transform: sort all rotations and read off the last column and the index of the original string. Decoding: sort the characters for the first column, then sort the pairs to derive the second column, and so on. Linear time complexity with a suffix array. Combine with run-length encoding in bzip2.
Adaptive coding updates the data model.
Gray code or reflected binary code. An ordering of binary strings such that two successive values differ in only one bit. Equivalently viewed as a traversal of the hypercube with edges between vertices with one bit difference.
A universal code is a near-optimal prefix code over the natural numbers. For any true probability distribution p(n) that decreases monotonically in n, the expected length is within a constant factor of optimal.
- Elias gamma code encodes n as log_2(n) zero bits followed by the binary representation of n.
- Zigzag code encodes integers to naturals. The least significant bit encodes the sign bit, so odd is positive.
- Fibonacci coding (base Fibonacci) is self-synchronizing. “11” only appears at the end of each codeword.
Error correction
- In a linear code, any linear combination of codewords is also a codeword. Rowspace of a generator matrix.
- Block code: Hamming code and Golay code are perfect.
- Convolutional code
- Cyclic redundancy check (CRC) based on cyclic codes, where a circular shift of a codeword gives another codeword.
- Linear-feedback shift register (LFSR): input bit is an xor of its previous state.
- Berlekamp-Massey algorithm finds the shortest LFSR that outputs a given sequence.
- LFSR cannot represent the zero vector. A maximum length sequence generates every binary sequence except the zero vector.
- Maximum length sequence (MLS) generates every binary sequence.
- Reed-Solomon error correction
- Berlekamp-Welch algorithm efficiently corrects errors.
- Low-density parity-check code (LDPC) are capacity approaching for a symmetric memoryless channel.
- Turbo code (1993) approach the Shannon limit.
- Parity bit detects bit flips.
- Sphere-packing: received codewords decode to the closest codeword by Hamming distance. Hamming bound.
- de Bruijn sequence B(k, n): cyclic sequence of length k^n on an alphabet of size k where every possible length-n string occurs exactly once as a substring.
Cryptography
Symmetric encryption needs a shared key.
- Unauthenticated encryption is vulnerable to:
- Replay attacks: intercept a valid ciphertext and resend it later to gain unauthorized access.
- Padding oracle attacks analyze decryption errors to recover original plaintext.
- AEAD: Authenticated Encryption with Associated Data. AD is not encrypted.
- AES-GCM: Galois/Counter Mode
- Counter (CTR) Mode: encrypt each block with a counter so messages are never encrypted identically.
- Galois Field (GF) multiplication generates an authentication tag.
- CCM: Counter with CBC-MAC. Authenticate-then-encrypt.
- message authentication codes (MAC) ensure integrity.
- AES 128-bit block cipher. Advanced Encryption Standard or Rijndael.
- Cipher Block Chaining (CBC) for block ciphers.
- XOR each plaintext block with the previous ciphertext block before encrypting.
- The first block uses an Initialization Vector (IV), a random value the size of a block, to ensure different ciphertexts for repeated plaintexts.
- Replaces Electronic Codebook (ECB) where identical plaintexts result in identical ciphertexts.
- 3DES (Triple Data Encryption Standard) is slower.
- RC4 (Rivest Cipher 4) is broken.
- Mutual authentication: the user and server verify each other’s identity.
Asymmetric encryption: messages encrypted with a public key can only be decrypted by the recipient’s private key.
- Diffie-Hellman (DH) key exchange.
- Ephemeral Diffie-Hellman (DHE) or Elliptic Curve Diffie-Hellman (ECDH) are more secure.
- RSA (Rivest Shamir Adleman) is slower.
- Wiener’s attack computes the private key d for small d.
- Authentication via digital signatures.
- Public Key Infrastructure (PKI) and Certificate Authority (CA).
- Confusion: each bit of the ciphertext should depend on several parts of the key.
- Diffusion or avalanche effect: a bit flip in the plaintext should change half the ciphertext.
Block cipher
- P-box permutes bits. Linear.
- S-box substitutes inputs and is nonlinear.
- A bent function is maximally nonlinear Boolean function. It is furthest by truth table Hamming distance from any linear function.
- A substitution-permutation network combines S and P boxes.
- Feistel cipher is invertible and symmetric. A round function takes a data block and a subkey and returns a data-sized output. Each round runs the round function on half the data and xors it with the other half. The round function does not need to be invertible.
Cryptanalysis
- Linear cryptanalysis approximates a cipher with linear equations like xor of ciphertext and plaintext bits = xor of key bits.
- The bias is the probability that a linear Boolean equation is true. A good approximation has high bias.
- A partial key is the set of key bits in the linear equation.
- Prefer the partial key which satisfies the highest or lowest number of plaintext-ciphertext pairs, and attack the remaining bits with brute force.
- Piling-up lemma: the bias of an xor of independent variables is at least the input bias.
- Walsh functions are an orthonormal basis. A Hadamard matrix contains orthogonal rows with entries of 1 or -1.
- The Hadamard transform or Walsh transform decomposes a Boolean function into Walsh functions. Its square is the power spectrum. The maximum Walsh coefficient is the function linearity. The number of 0 coefficients is the resiliency or correlation immunity.
- Differential cryptanalysis analyses differentials: Δy = S(x ⊕ Δx) ⊕ S(x)through the network of transformation.
- The autocorrelation of a Boolean function is the Hamming weight of the derivative in a direction. The absolute indicator is the maximum autocorrelation coefficient. A bent function has all 0 correlation coefficients.
- The Wiener-Khinchin theorem states that the autocorrelation and the power spectrum are a Walsh transform pair.
- Data Encryption Standard (DES) was designed to be resistant to differential cryptanalysis.
- https://en.wikipedia.org/wiki/Template:Cryptography_block
- https://en.wikipedia.org/wiki/Template:Cryptography_navbox
Classic ciphers
- One-time pad
- Substitution cipher
- Transposition cipher reorders characters. Detectable by frequency count. Breakable using anagrams or n-gram frequencies.
- Fractionation divides each plaintext symbol into multiple ciphertext symbols.
- Columnar: write the message in rows of fixed length, then send the columns in a fixed order based on a keyword.
Lattice problems underlie lattice cryptography.
- Shortest vector problem (SVP): given a basis, find the shortest nonzero vector.
Grammar
A formal grammar specifies syntax (what strings are valid).
- Alphabet
- Non-terminal symbols represent parts of the language structure
- Terminal symbols represent actual characters of the language
- Production rules specify how non-terminal symbols can be replaced by other symbols
- Start symbol
- Categorial grammar assumes that syntactic constituents combine as functions and arguments
- A formal language is the set of strings generated by a grammar.
- Formal semantics studies the meaning, interpretation, or denotation of language structures.
- https://en.wikipedia.org/wiki/Context-free_grammar#Derivations_and_syntax_trees
formal grammar
Context-free grammar (CFG) generates a context-free language (CFL).
- defined by Backus–Naur form or extended Backus–Naur form (EBNF)
- deterministic context-free grammars for deterministic context-free languages. Can be parsed in linear time with deterministic pushdown automata (PDA)
- Dyck language of all properly matched parentheses S -> SS | (S) | ε, where ε is the empty string.
- Linear grammar has no rule with more than one nonterminal on the right-hand side
- Regular grammars describe regular languages: finite automata and regular expressions.
Definite clause grammar (DCG): https://en.wikipedia.org/wiki/Definite_clause_grammar
String rewriting system
- Semi-Thue system does not distinguish between terminal and non-terminal symbols.
- word problem: whether two expressions are equivalent with respect to a set of rewriting identities
- a normal form theorem maps every element in an equivalence class to a single normal form representation, making it possible to compare via syntactic equality.
- L-system is a parallel rewriting system
String format specifiers: %d integer, %f float, %s string
Pattern matching
- In SQL, % means “any number of characters”
- Regular expression
URL encoding
https://en.wikipedia.org/wiki/Formal_language
Parsing
Compilers
Fortran did not have pointer aliasing. Thus, in a function that writes to output and repeatedly accesses input values, the inputs are guaranteed not to change and can be cached in registers. Whereas in C, the input and output pointers could be aliased and thus overhead logic is needed to reread the input variables after each write.
Static analysis
- Pointer analysis computes possible pointer targets.
- Steensgaard’s algorithm tracks equality constraints.
- Typestate analysis and refinement types associate state information to enforce constraints such as call open() before close().
- Shape analysis analyzes common dynamically allocated linked data structures to detect null pointer dereferences, memory leaks, out-of-bounds accesses, race conditions.
- Escape analysis determines whether a pointer allocated in a subroutine can escape.
- Symbolic execution tracks which inputs reach which branches. Maintains constraints for each branch.
- LibCST: concrete syntax tree
- https://redbaron.readthedocs.io
- Decompilation and reverse engineering
- https://en.wikipedia.org/wiki/Radare2
Data-flow analysis
- Control-flow graph
- Cyclomatic complexity
- Code coverage
- Basic block is a code sequence with no branches in or out.
- Extended basic block: subsequent basic blocks have one predecessor basic block which is in the collection.
- A reaching definition is an earlier definition which can reach a point in the code.
- An available expression
- Use-define chain models data flow between definitions and uses of a variable
- Live-variable analysis: a variable is live if it may be read before its next write.
- Copy propagation: replace targets of direct assignments with their values
- Upwards exposed uses occur before a variable is redefined, so we can replace the uses with the variable value.
- Dataflow programming defines an explicit DAG of operations.
Compiler
https://en.wikipedia.org/wiki/Template:Compiler_optimizations
bootstrapping: https://github.com/fosslinux/live-bootstrap/blob/master/parts.rst
Interpreter
https://en.wikipedia.org/wiki/Bytecode
https://en.wikipedia.org/wiki/Virtual_machine
https://en.wikipedia.org/wiki/P-code_machine
https://en.wikipedia.org/wiki/Java_bytecode
- Java class files interpreted by the Java Virtual Machine (JVM)
Type theory
Polymorphism dispatches implementations based on object type
- Static vs. dynamic polymorphism: faster vs. more flexible (duck typing, polymorphic library functions)
- Ad hoc function overloading. Multiple dispatch uses argument types
- Generic data types for containers
- C++ objects are C structs with a vtable pointer for virtual (inherited) functions which provides runtime polymorphism for references
- Design Patterns: Elements of Reusable Object-Oriented Software (1994)
Curry-Howard correspondence: constructive proofs are typed programs. Showing an example of a proposition is equivalent to showing an element of a type.
- A proof of A -> B is a function A -> B.
- A proof of A ∧ B is a pair (A, B).
- We can represent true as a unit type and false as an empty type.
- Natural deduction is typed lambda calculus.
- disjunction A ∨ B: sum type
- Universal quantification ∀: dependent product type
- existential quantification ∃: dependent sum type
- hypotheses: free variables
- modus ponens: application
- implication introduction: abstraction
- Proof normalization is program evaluation. Strong normalization property means that every proof term reduces in a finite number of steps into a normal form.
- A Hilbert-style axiomatic system maps to a type system for combinatory logic, which builds up functions without quantified variables.
- Few inference rules, often just modus ponens
- Simpler structure, but longer, less intuitive proofs.
Type systems
- Most programming languages use extensional types.
- Extensional type theory focuses on values. Types are identical if they have the same elements, and objects are equal if their corresponding fields have the same values. It is easier to check for extensional equality.
- Intensional type theory considers definition or meaning. It can express formal reasoning about programs.
- Generic programming allows abstract algorithms that can operate on arbitrary data types, which are specified as a parameter upon use.
- Girard’s paradox. A predicative type system that permits a type to be its own member, such as System U, is inconsistent.
- 1972. Intuitionistic type theory is an alternative foundation of mathematics. By Per Martin-Löf.
- https://en.wikipedia.org/wiki/Template:Type_systems
A monad is a monoid in the category of endofunctors of some fixed category. Endofunctor maps a category to itself.
- Monads allow a clean way to handle side effects and exceptions. Can be chained using do notation.
A monoid M has two morphisms:
- unit transformation η: I → M
- constructor which embeds values into the monad context
- multiply μ: M ⊗ M → M flattens nested contexts
>>=
in Haskell, flatMap() in Scala.
- Either[Error,A].flatMap(f) calls f: A->Either[Error,B] only if the either is not an error.
- Option[A].flatMap(f)->Option[B] calls f: A->Option[B] only if Option[A] is nonempty.
- Lists are monads. Option is a list with at most one element.
- A free monoid on a set is the monoid whose elements are all the finite sequences or strings from that set, with string concatenation as the monoid operation. The free semigroup contains all the strings except the identity, the empty string.
- For a set of generators Σ, a monoid is defined by a presentation as the quotient of the free monoid generated by Σ over Σ.
Monads are associative with identity. For f: V -> Monad[V],
- Left identity: Monad(v) >>= f == f(v)
- Right identity:
monad >>= Monad(_) = monad
Theorem proving
Boolean satisfiability problem: is there an interpretation (assignment of variables) that satisfies a Boolean formula (evaluates to true).
- Equivalent to propositional logic.
- Normal form: disjunctive normal form is an OR of ANDs (minterms).
- Conjunctive normal form (CNF) is an AND of ORs (maxterms).
- CNF can be computed using term rewriting.
- Davis-Putnam algorithm: resolution-based decision procedure for SAT.
- DPLL algorithm: backtracking search for CNF-SAT.
- Boolean functions.
- A binary decision tree has one level per input. The tree can be reduced, merging isomorphic subgraphs, into a binary decision diagram (BDD).
- Functional completeness or expressive adequacy: set of operators (logic gates) that can express any truth table.
- Universal gates: nand, nor.
- (and, not) is complete, while (and, or) cannot express not.
- A Karnaugh map visualizes the output values for each input in a 2D grid. We split half the input variables for the rows and half for the columns. Each row represents one possible assignment of the row variables and rows are listed in Gray code order.
- A minterm is a rectangle of 1’s in the map and can wrap around the edges of the grid.
Version space learning for binary classification
- Hypothesis space is a parameterized list
- Candidate elimination produces a most specific boundary and a most general boundary.
A rough set approximates blurry concepts
- lower approximation: elements that definitely belong in the set
- upper approximation: elements that possibly belong in the set.
- boundary region in between.
Pattern matching
https://en.wikipedia.org/wiki/Forward_chaining from facts
https://en.wikipedia.org/wiki/Backward_chaining from goals
Rete algorithm https://news.ycombinator.com/item?id=40480242
Logic programming and constraint programming
Automated theorem proving and proof assistants
- Interactive theorem proving
- Automated proof checking
- Herbrand’s theorem reduces first-order logic to propositional logic.
- Prenex normal form: string of quantifiers and bound variables (the prefix) followed by a quantifier-free part (the matrix).
- Vampire theorem prover for first-order logic.
- Lean
- Coq and Agda use intensional types.
- 2005. Georges Gonthier formalizes the four color theorem in Coq.
- 2012. Georges Gonthier formalizes the odd order theorem in Coq. It says that every finite group of odd order is solvable, i.e. can be constructed from abelian groups using extensions.
- 2014. Kepler conjecture: no close-packing of equal spheres has greater density than hexagonal packing. Flyspeck project formalizes the proof in Isabelle and HOL Light.
- 2024. New Foundations proved consistent in Lean.
- Robbins Conjecture was solved by automated theorem proving.
Modeling
Testing
- Seven Principles of software testing
- Agile testing quadrants
Systems
Computer architecture
- A motherboard holds the CPU, memory, and connectors for I/O peripherals.
- The system bus connects CPU and memory, including control, address, and data lines.
- Memory hierarchy and caching
https://en.wikipedia.org/wiki/Patch_(computing)
Processors
Instruction pipelining
- Superscalar processors execute multiple instructions per clock cycle using multiple execution units. Instruction-level parallelism.
- Instruction fetch, decode, execute, memory access, register write back.
- Out-of-order execution
- Register renaming maps logical to physical registers. Eliminates false data dependencies due to register reuse.
- Tomasulo’s algorithm uses reservation stations and a common data bus.
- memory barrier enforces an ordering constraint on memory operations.
https://en.wikipedia.org/wiki/Superscalar_processor
A microarchitecture is a processor design that implements an ISA.
https://en.wikipedia.org/wiki/Register_file
- Half adder adds two bits and outputs sum S = A ⊕ B and carry C = A ⋅ B.
- Full adder computes S = A ⊕ B ⊕ C_in and C_out = (A ⋅ B) + (C_in ⋅ (A ⊕ B)).
- Ripple-carry adder chains the carry bits. Simple but slow, since it needs to wait 2 gates for each carry bit.
Arithmetic logic unit (ALU)
Interrupt or trap
- Processor has an interrupt mask register to allow selective masking (disabling) of hardware interrupts.
- Non-maskable interrupts cannot be ignored under any circumstances.
- Device sends an interrupt request (IRQ)
- general protection fault (GPF): memory or privilege error.
- Processor saves thread state so it can resume the current thread
- Processor executes the interrupt handler or service routine (ISR).
- Interrupt vector table or interrupt descriptor table maps IRQ to ISR memory address.
- Replaces polling or polled I/O.
Signal
- SIGCHLD: child process exited.
- Processes cannot intercept SIGKILL and SIGSTOP (which allows resumption).
Watchdog timer triggers a reboot
- During normal operation, the computer regularly restarts the watchdog timer.
Instruction set architecture (ISA) defines machine code or bytecode.
- binary encoding for instructions, data types, registers, memory, and I/O.
- complex instruction set computer (CISC)
- 1985. i386 or IA-32 for the 80386 microprocessor. 32-bit registers and arithmetic.
- 1989. i486 for the 80486.
- 1993. i586 for the Pentium P5.
- 1995. i686 for the Pentium Pro P6.
- reduced instruction set computer (RISC) leads to a simpler processor. Usually instructions have the same length.
- Iron law of processor performance documents the tradeoff between minimizing instructions/program (CISC) and cycles/instruction (RISC).
- Addressing mode: immediate address, PC-relative offset, and register indirect
- MIPS
x86
Instruction encoding is variable length.
- REX prefix (0x48) enables an 8-byte operand and r8-15.
- opcode
- ModR/M byte: Mod, reg, r/m
- Arguments can be a register (r), memory address (m), or an immediate (byte literal).
- Little-endian byte order (least significant byte first).
General-purpose registers, 16-bit. Can also access low and high half separately (al, ah).
- 0. ax: accumulator, 1st return register
- 1. cx: counter register for loops and shift/rotate
- 2. dx: data results from arithmetic and I/O.
- 3. bx: base pointer
- 4. sp: stack pointer
- 5. bp: pointer to base of previous stack frame
- 6. si: source index pointer for string/buffer operations
- 7. di: destination index pointer for stream operations
- 32-bit (extended) DWORD registers are eax, etc.
- Low bits shared with the 16-bit registers.
- 64-bit: rax, etc and r8-15.
- rbx, rsp, rbp are saved across function calls.
- Stack frame preserved as: ax, bx, cx, sp, bp, di, si, dx
Other registers
- ip: instruction pointer
- Control register CR0
- FLAGS register (16-bit), EFLAGS, and RFLAGS. A status register, flag register, or condition code register (CCR) contains flag bits.
- ZF (6, zero flag), CF (0, carry flag), OF (11, overflow flag), TF (8, debugging trap), IF (9, interrupt)
Instructions
- Control flow
- near jumps go to an address in the same code segment.
- ret: near 0xC3 and far 0xCB
- optional retn operand to pop n bytes from the stack.
- jmp 0xEB imm16, with an optional displacement operand relative to next instruction
- jnz 0x0F 85 imm8
- call 0xE8 imm64: push return address on the stack
- Data
- push (0x50+r) and pop (0x58+r) register r onto the stack.
- xchg eax, reg (0x90+r). NOP is 0x90.
- mov r32 imm32 0xB8
- mov eax, r: r32 r/m8 (0x8B r).
- Arithmetic operands look like opcode, modr/m, imm8
- ModR/M is C0 for eax, C2 for edx
- mod=11 (direct register), reg = 11, r/m = 00
- xor 0x33, add 0x04, and 0x21, and 0x83, imul 0x0FAF
- x86 Bit manipulation instruction set
- count ones or Hamming weight (popcnt): 48 0F0FB8 C0.
- leading zeros count (lzcnt)
- CPUID (0x0FA2) for supported instructions. Uses the EAX register as a parameter.
- Security: AES (2008), SHA (2013), RDRAND (2012), SGX Software Guard Extensions for secure enclaves (2015)
- x86 SYSENTER/SYSEXIT for syscalls from ring 3 to ring 0. Replaces call gates, which were slower.
- x86-64: 64-bit word size for registers, arithmetic, and addressing.
- Intel ISA reference
Assembly consists of alphabetical machine instructions.
https://en.wikipedia.org/wiki/Firmware
https://en.wikipedia.org/wiki/Embedded_system
https://en.wikipedia.org/wiki/System_on_module
https://en.wikipedia.org/wiki/Single-board_computer
Operating systems
- 1993. Windows NT.
- 2001. Windows XP.
A process is a program in execution with its own program address space.
- Code. The stack pointer esp points to the next instruction to execute.
- Static data: declared outside any function.
- Stack. Local variables, grows down. Stack frame includes return address, arguments, and local variables.
- Heap. Dynamic data via malloc, needed if we create objects that live beyond function return.
- Process state: created -> admission -> waiting in run queue -> running. Can also be blocked on I/O or shared resources.
Early function calling did not use a stack and could not support recursive function calls. Every local variable was assigned a static address at compile time. The previous function would pass arguments in registers or global variables, then jump to the start of the function. To return, the function would jump to the link register holding the return address. The branch with link instruction automatically stores the next address in the link register. The branch to subroutine bsr instruction stores the return address in the first word of the subroutine, and begins execution and the next word. To return, the subroutine executes an indirect jump to the address stored at its first word.
Machine representations
- Two’s complement.
- Floats are stored with sign, mantissa, exponent
- overflow and catastrophic cancellation
Memory management
- A page table stores the mapping from virtual to physical addresses.
- Two-level page translation: Control register CR3 holds a page directory with 1024 4-byte page table addresses. Each page table is 1024 4-byte page addresses.
- 1993. Page Size Extension (PSE) allows 4 MiB pages, which reduce TLB misses. Page directory has a page size flag.
- 1995. Physical Address Extension (PAE) defines page tables with 64-bit addresses. Three-level page translation where CR3 points to a page directory pointer table, since tables only have 512 entries. 2 MiB pages.
- Virtual memory
- An address space is a range of addresses to store data.
- Translation lookaside buffer (TLB) caches page table lookups. Often 12 bits, 1 clock cycle for a hit, 1% hit rate. A TLB miss requires multiple memory accesses, costing 100 clock cycles.
- Process isolation: prevents one process from writing to the address space of another process.
- Memory segmentation: access by segment and offset.
- Demand paging: copy page into physical memory only upon a page fault, reducing context switching time.
- Copy-on-write (COW): create new pages only on write.
- Memory errors: memory leak, double free, use after free, buffer overflow.
- A segmentation fault is a memory access violation, e.g. dereferencing a null pointer. Can debug with gdb or valgrind.
- Real mode: All CPUs start in real mode for backward compatability with the Intel 8086. 1MB 20-bit segmented address space with direct access to I/O addresses. No memory protection, multitasking, or code privilege levels.
- The 8086 had 640KB of conventional memory, reserving the upper memory area for BIOS ROM and I/O.
- Protected mode (Intel 386) enables virtual memory. 32-bit.
- Protection ring for privilege levels: ring 0 kernel, ring 1-2 device drives, ring 3 applications.
- Global Descriptor Table: 8-byte entries of memory segments and their access privileges.
- Long mode allows 64-bit instructions and registers. 34-bit programs are executed in compatability mode.
- System Management Mode (Intel 386, 1986, ring -2) suspends the OS and runs a firmware
Memory access
- Direct memory access (DMA) allows memory operations without CPU involvement.
- CPU initiates an operation using the DMA controller memory address register, byte count register, and control registers. The controller sends an interrupt when it finishes the task.
- Bus mastering allows peripherals to become a bus master, completely bypassing the CPU.
- Programmed I/O: every data transfer is initiated by a CPU instruction.
- Memory-mapped I/O allows I/O via memory addresses. mmap syscall.
- Processing in memory (PIM) allows data operations on memory without having to load data to CPU registers.
- A framebuffer is a memory buffer to drive a video display.
Virtualization
- x86 virtualization emulates protected mode.
- 2005. Intel virtualization (VT-x) and AMD-V hardware support.
- A hypervisor runs virtual machines (VM) and executes instructions on native hardware.
- Type I native run on bare metal. MS Hyper-V.
- Type-2 or hosted run in another OS.
- Kernel-based Virtual Machine (KVM) allows the kernel to function as a hypervisor.
- OS-level virtualization allows multiple isolated user space instances, called containers or jails.
- Namespaces limit visible resources for a process: mnt, pid, net, ipc, etc.
- Docker
- Flatpak and Ubuntu Snap runs applications in isolation, requiring explicit permissions for files, network, devices, recording. Snap packages bundle dependencies, while Flatpak uses system libraries. Flatpak uses a sandbox for full isolation, while Snap uses confinement.
- An emulator typically interprets instructions.
- https://en.wikipedia.org/wiki/Timeline_of_virtualization_development
- https://en.wikipedia.org/wiki/Template:Virtualization_software
Object files contains object code and symbol table.
- ELF: Executable and Linkable Format
- .text machine code
- .data static data
- Symbol table: function labels, and .data variables for other programs
- Relocation table: external labels and static data we needed addresses for
- Mach-O object file
- PE: Portable Executable
- Shared library
- executable files with a symbol table can be used as a shared library.
- statically link at compile time
- Windows dynamic-link library (DLL)
Execution
- Linker combines object .o programs and libraries, resolving references into absolute addresses.
- Static linking copies the entire library.
- Dynamic linking defers symbol resolution until program execution.
- Relocation: assign load addresses for position-dependent code and data, and adjust code and data to reflect the assigned addresses.
- PC-relative addresses never relocate.
- position-independent code or position-independent executable (PIE) is machine code can run from any memory address. Must use relative addressing.
- Loader memory map executable and libraries.
execve()
initializes registers and sp and jumps to the _start
entry point.
- Inter-process communication (IPC)
- Socket sends data over a network interface
- OS message queue
- Anonymous pipe or standard streams (standard input and standard output)
- Named pipe is treated like a file
- Shared memory or memory-mapped file.
Kernel is the core of an OS.
- Process scheduler is usually preemptive (can pause tasks) rather than cooperative. Long-term admit scheduler. Medium-term scheduler for swapping processes. Short-term scheduler or CPU scheduler acts on every time slice.
- FIFO, priority, shortest job next (SJN), fixed-priority pre-emptive scheduling, round-robin scheduling (cycles a fixed time per process), multilevel queue (interactive vs. batch).
- https://en.wikipedia.org/wiki/Work_stealing
- Processor affinity: avoid resuming jobs on different CPUs.
- Memory manager, IPC manager.
- Device I/O or hardware I/O
- Linux represents hardware with block devices with buffered IO unless the raw driver is used.
- FreeBSD uses raw character devices, which provide direct, unbuffered I/O. They argue that caching reorders write operations, making it hard to reliably recover disk data structures or report the specific write operation that encountered a write error.
- Power management
- Sleep mode or suspend to RAM: suspends the CPU, but requires RAM power draw.
- Hibernation or suspend to disk: turns off power completely. Requires a few seconds to turn back on.
- https://en.wikipedia.org/wiki/Wake-on-LAN
- Virtual files
- kernfs handles file operations, directory structures, and attributes
- sysfs uses kernfs to provide device information and writable device attributes mounted at /sys.
- tmpfs uses volatile RAM which discards data on reboot.
- sysctl exposes kernel attributes. Exported by procfs mounted at /proc/sys.
- ioctl syscall handles device-specific operations. Win32 API DeviceIoControl.
- Kernel space for kernel, kernel extensions, and device drivers.
- Windows svchost.exe runs shared service processes for DLLs.
- System call interface between user and kernel
- 1988. POSIX: Portable Operating System Interface
John Hennessy and David Patterson’s Computer Organization and Design
https://news.ycombinator.com/item?id=22116914
https://en.wikipedia.org/wiki/Template:Firmware_and_booting
- https://en.wikipedia.org/wiki/Windows_NT_startup_process
- 1981. BIOS (Basic Input/Output System) firmware stored in ROM. Loads partition table into memory and runs the bootloader. Video BIOS initializes the graphics card.
- Power On Self Test (POST)
- Partition layouts
- Master boot record (MBR)
- Unified Extensible Firmware Interface (UEFI)
- GUID Partition Table (GPT)
- Preboot Execution Environment (PXE): Windows network boot via Trivial File Transfer Protocol (TFTP) over UDP.
Mobile devices
Android
- Hardware Abstraction Layer (HAL) proxies access to device drivers.
https://github.com/seemoo-lab/openhaystack
https://makezine.com/projects/find-my-diy-airtag-tracker/
Samsung Find and Offline Finding protocol
https://www.usenix.org/conference/usenixsecurity24/presentation/yu-tingfeng
Tile
pet GPS trackers
Fleet managment SIM tracks location
SIM + phone that replies to text with location data.
User space
User space or userland for the rest of the OS (application software, UX).
- init daemon bootstraps user space. First process, pid 1.
- Ancestor of all other processes and adopts orphaned processes.
- systemd handles device management, login management, network connection management, and event logging.
- macOS launchd.
- System daemons
- polkitd allows non-privileged processes to communicate with privileged ones
- sshd, smbd (Server Message Block and Samba networking)
- udevd manages device nodes or pseudo-device in /dev.
- /dev/null, /dev/zero, /dev/full, /dev/random
- process standard file streams: /dev/stdin, /dev/stdout, /dev/stderr
- terminal /dev/tty
- process file descriptors /dev/fd/n.
- /dev/bus/usb/: direct USB access
- Virtual Function I/O: Direct hardware access to PCI devices
- Window manager: X window system or X11, Wayland
- Graphics: Mesa, GTK, Qt
- SDL often used for games. Video, audio, input devices, threads, networking
- C standard library: fopen, execv, malloc, memcpy, localtime, pthread_create
- glibc aims to be fast, musl aims to be lightweight, bionic for Android.
Environment variables
- PATH specifies directories to look for executables.
File systems
- Storage virtualization
- logical volume management
- An inode contains file attributes which can be accessed via the stat syscall. Attributes include mode, ino, dev device ID, uid owner, gid, size in bytes, ctim, mtim, atim.
- open, read, dup, write, and close syscalls. splice syscall moves data between a file descriptor and a pipe by remapping pages.
- Extended file attributes (xattr) in user, trusted, security, or system namespace.
- File signatures and FourCC.
- Linux:
- 1984. Network File System (NFS)
- 2001. ext3 journaling filesystem
- 2006. ext4.
- 2005. ZFS
- 2009. Btrfs.
- Windows
- 1977. File Allocation Table (FAT).
- 1993. New Technology File System (NTFS) journaling filesystem.
- Mac
- 1985. HFS: Hierarchical File System.
- 1998. HFS+ journaling filesystem.
- 2017. APFS: Apple File System.
1983. Richard Stallman: GNU Project
https://en.wikipedia.org/wiki/Systems_management
Databases
https://en.wikipedia.org/wiki/Database
Relational database management system (RDMS) based on relational algebra
- https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model#See_also
- Projection picks a subset of the columns.
- Star schema: central fact table with foreign keys into dimension tables with descriptive attributes for query filtering and query result set labeling.
- Foreign key of a fact table points to a key in the dimension table.
- Foreign keys attributization: we should get full objects where we can condition on “other_table.property”.
- Snowflake schema
- Wide table
- Entity–attribute–value model or a narrow table
- A natural key is derived from business data. A surrogate key has no meaning outside the database.
- Page compression can reduce space.
The rowstore data format stores rows in a B-tree. Most common and optimal on real-time table seeks for specific rows.
- online transaction processing (OLTP)
- B-tree key is a 64-bit signed integer rowid. B-tree value is the byte array of all the column values.
- SQLite
- every value has a serial type varint, which is one byte for nulls and numbers
- INTEGER PRIMARY KEY aliases with rowid, so that column is saved as a null value.
- SQL Server
- A clustered index sorts data in the B-tree by PRIMARY KEY. A UNIQUE constraint creates a nonclustered index like a heap, which stores a pointer to the data.
- If there is no unique index, a 4-byte uniqueifier is added automatically.
A columnar data format is optimized for OLAP.
Databases
Parallel processing
Amdahl’s law: latency speedup is limited by the portion that is not parallelizable.
Flynn’s taxonomy on instruction vs. data streams
- SISD is simplest
- SIMD: vector or array processor
- SSE Streaming SIMD Extensions, 128-bit xmm0-15 registers (Pentium III, 1999)
- AVX Advanced Vector Extensions, 256-bit ymm registers (Sandy Bridge, 2011)
- AVX-512: zmm0-31 registers
- MIMD: multiple cores
Dining philosophers
Concurrency control
Consensus and synchronization
- Non-blocking algorithm: failure of one thread
- Wait-free guarantees thread-level progress–each operation completes in a bounded number of steps.
- Lock-free guarantees system-level progress and allows individual threads to starve.
- 1985. FLP impossibility: a deterministic algorithm for achieving consensus is impossible.
- Open consensus group allows a Sybil attack via sockpuppets.
- Paxos consensus protocol.
- Google Chubby distributed lock service and nameserver.
- Raft
- Redis Redlock distributed lock.
Multiprocessing uses multiple CPUs or cores.
- Context switching requires loading the new process control block with registers, stack pointer, program counter, process ID, process state, page tables.
- fork creates a child process which is initially a copy of the process.
- spawn is the DOS/Windows model which is less powerful but doesn’t virtual memory.
- A fork bomb is an infinite loop.
- ulimit is the maximum number of processes that a single user may own.
- cgroups (control groups) isolates resource usage of a collection of processes.
- clone creates threads by sharing the address space and other resources.
- exec overlays an executable file. Replaces the previous executable and destroys its address space, creating a new stack and heap. The process ID (PID) does not change.
- exit ends a process.
- wait suspends the calling process until the child process returns.
- kill command sends a SIGTERM or SIGKILL signal to its target. The signaling process must have the same owner or be a superuser.
- Symmetric multiprocessing (SMP): multiple identical cores.
- DOS Program Segment Prefix contains program state and command line arguments.
Multithreading
- A process can have multiple kernel threads with shared memory address space and file handle resources. Lower IPC overhead and no TLB flush vs. multiprocessing.
- Unix thread control block contains stack pointer, program counter, thread state, register values, and a pointer to its process control block.
- Windows Thread Information Block (TIB) or Thread Environment Block (TEB).
- In Python the GIL prevents more than one thread from running at the same time.
- User threads are scheduled in userspace.
- Green threads are scheduled by a runtime library or virtual machine.
- Cooperative multitasking (async/await, yield) has one thread with an event loop where fibers or tasks must yield control.
- Coroutines like generators at the language level.
Atomic instructions
- Consensus number is the maximum number of processes that can reach wait-free consensus.
- Compare-and-swap (CAS) has infinite consensus number.
- A test-and-set instruction allows a memory-based lock. Consensus number 2.
- Shared registers and locks have consensus number 1. Stacks and queues have consensus number 2.
- https://en.wikipedia.org/wiki/Software_transactional_memory
Distributed computing
- Byzantine failure includes adversarial signals.
- https://en.wikipedia.org/wiki/MapReduce
- Dropbox sync does a 3-way merge. The merge base (synced tree) distinguishes local delete vs. remote add. Synced when there are no local or remote diffs.
- Unique ID per node, so a move is an attribute update and atomic filesystem operation.
- Single control thread, with dedicated threads for network, file I/O, and workers for hashing.
https://en.wikipedia.org/wiki/Template:Parallel_computing
Security
Foundational
Human level
Hardware
Process level
https://en.wikipedia.org/wiki/Improper_input_validation
Authentication, authorization, and accounting (AAA)
- prove identity, grant permissions, and log audit trails.
- Improper privilege management
- An access-control list (ACL) specifies which users, groups, or processes can read, write, or execute a resource.
- Implementations
- /etc/passwd maps usernames to uids. Default world-readable.
- /etc/shadow contains password hashes and is only root readable to prevent hash cracking.
- LDAP (Lightweight Directory Access Protocol) provides directory services such as passwords over IP.
- 1987. Windows NT LAN manager (NTLM) is a weak authentication provider.
- https://en.wikipedia.org/wiki/User_Account_Control
- 1988. Kerberos
- 1991. RADIUS (Remote Authentication Dial-In User Service)
- Identity driven networking (IDN)
- Google BeyondCorp: use fine-grained, user-specific, application-level AAA instead of trusting the network.
- chroot jail prevents the process from accessing files outside a designated directory.
- seccomp allows a process to make a one-way transition into a sandbox which limits syscalls to read() and write() on already-open file descriptors. seccomp-bpf allows filtering syscalls using Berkeley Packet Filter rules. Used to sandbox Chrome rendering.
- File permissions
- Set via chmod, chown, chgrp.
- Special mode flags: setuid and setgid flags allow users to run an executable with the file system permissions of the executable’s owner. For example, ping requires sending control packets on a network interface.
- sticky bit. Normally, directory write permission grants write permissions on all contained files. /tmp usually has the sticky bit so users cannot touch each other’s files.
- su and sudo run programs with the privileges of another user.
- Discretionary Access Control (DAC): based on permissions which users can set. Linux default.
- Mandatory access control (MAC): centralized based on labels
- SELinux (Security-Enhanced Linux)
- AppArmor can restrict per-program network, socket, and file access.
- Role-based access control
- Attribute-based access control
- https://en.wikipedia.org/wiki/Template:Object-capability_security
- https://en.wikipedia.org/wiki/Two_factor_authentication
- Security keys eliminate phishing attacks.
Web security
- Cross-site request forgery (CSRF) tricks a user into submitting a malicious request.
- Example:
- User logs into bank.
- User loads an untrusted site.
- Site does POST bank.com/transfer, and the browser sends bank cookies.
- Origin: header allows sites to check where a request is coming from.
- Same-origin policy restricts malicious websites from reading secure data like emails and bank information from other websites where the user is authenticated.
- An origin is a scheme (protocol), hostname (domain), and port.
- Cross-origin resource sharing (CORS) allows whitelisted requests from websites on other origins.
- A CORS preflight is a CORS OPTIONS /resource/foo request that sends Access-Control-Request-{Method,Headers} and Origin.
- Server should respond with Access-Control-Allow-{Origin,Methods,Headers,Credentials}
- If the server denies
- Form and other simple requests will be sent directly
- There is a carveout for simple requests like form, which will send cookies by default.
- Requests can be made with credentials (cookies and HTTP Authentication). A server must explicitly respond with allow credentials: true.
- https://en.wikipedia.org/wiki/Cross-site_request_forgery
- https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy#cross-origin_network_access
https://en.wikipedia.org/wiki/Security_engineering
Networking
Network topology
OSI seven layer model
- Physical layer
- Data link layer
- Packet forwarding
- Broadcast to all receivers.
- Multicast to a specified group of receivers. Copied only if recipient is in the domain.
- Anycast to any one out of a group of nodes.
- A bridge connects two segments to form a single network. A switch uses packet switching to forward data by MAC address.
- Medium access control (MAC): ARP/MAC address
- A gateway connects multiple networks using more than one protocol.
- local area network (LAN) connects computers with a switch.
- A virtual local area network (VLAN) is a software-defined broadcast domain.
- Logical link control (LLC): frame sync and error checking.
- 802.3 Ethernet
- CSMA/CD: carrier-sense multiple access with collision detection.
- Ethernet bridges only forward well-formed Ethernet packets, isolating collisions and packet errors. Improves performance by associating addresses to network segments, and allows mixing of speeds to allow phased-in upgrades.
- 802.11 Wi-Fi is a wireless LAN. FHSS or DSSS.
- Network layer:
- IP: internet protocol. Connectionless or stateless.
- Network byte order is big endian (the first byte is most significant).
- A router forwards packets between different IP networks using the network address and its routing table or routing policy.
- Border Gateway Protocol (BGP) exchanges routing and reachability information among autonomous systems (AS) on the Internet.
- The default gateway serves as the forwarding host or router when no other route specification matches the destination IP address of a packet.
- Transport layer
- TCP: transmission control protocol provides reliable, ordered streams. Connection-oriented communication.
- UDP: user datagram protocol provides lower latency and higher throughput. Connectionless communication protocol.
- Session layer: HTTP
- TLS (Transport Layer Security) used in HTTPS and SMTPS, replacing SSL.
- Let’s Encrypt provides free HTTPS certificates.
- Presentation layer: HTTP
- Application layer: HTTPS, FTP, SMTP
NAT: network address translation. Router modifies IP headers to map IP addresses.
DHCP: Dynamic Host Configuration Protocol server assigns IP addresses over UDP.
STUN: session traversal utilities for NAT. A STUN server helps clients behind routers form peer-to-peer (P2P) connections by providing their private IP address and port number.
TURN: traversal using relays around NAT
ICE: interactive connectivity establishment selects the best communication option between devices.
Proxy server forwards requests.
Reverse proxy: forwards client requests to multiple web servers: Apache, Nginx, HAProxy, Squid.
https://en.wikipedia.org/wiki/End-to-end_principle
https://en.wikipedia.org/wiki/Regional_Internet_registry
https://en.wikipedia.org/wiki/Template:Internet
nVidia GPUs
CUDA
Threads in the same block execute in the same core and can be synchronized
SXM: Server PCI Express Module is a faster socket.
NVLink allows pooling VRAM between GPUs (only 2 for consumer cards). Version 5 has 2 TB/s bandwidth (compare PCIe Gen 5 at 128 GB/s). Replaces SLI. Cross-fire is the AMD equivalent.
Kepler (2012): 28 nm. K10 K80
- K40: 12B RAM, 4 fp16 TFLOPS. 7B transistors, 235W TDP.
Maxwell (2014): 20 nm.
Pascal (2016) P100: 16GB, 30 TFLOPS. 16 nm, 15B transistors, 610mm2, 300W. NVLink 1 is 20 GB/s bidirectional.
Volta (2018) V100. 32GB, 100 TFLOPS. 12 nm, 21B transistors, 815mm2.
- server only. Turing consumer line has RT cores for real-time raytracing.
- 80 SMs, each with 64 FP32 CUDA cores, 32 FP64 CUDA cores, 8 Tensor cores. Tensor core is 64 fused multiply-add FMA/clock = 128 flop/per clock.
- NVLink 2: 300 GB/s or 6x25 GB/s bidirectional. Cache coherent with CPU.
Ampere (2020) includes A100: 80 GB, 312 TFLOPS, 400 W. 7 nm. $8k.
- NVLink 3 600GB/s, up to 8 linked GPUs.
- 1555 GB/s HBM2e bandwidth
- 3090 24GB. Has NVLink.
Hopper (server) H100 and Lovelace consumer (2022). 1000 TFLOPS, 700W. 4 nm, 80B transistors. 3000 GB/s HBM. NVLink 4 900 GB/s. server only.
- H100 NVL has 188 GB, 2000 TFLOPS.
- SXM version integrates GPU and memory. PCIe version is half-power.
- RTX 4090: 24GB, 512 tensor cores, 16,384 CUDA cores, 76B transistors, 160 TFLOPS, 1008 GB/s, 450W. No NVLink. $2k.
- A6000? No NVLink
- Sparsity doubles performance. Transformer Engines tune fp8 vs. fp16 as needed.
- 132 streaming multiprocessors (SM), each with four quadrants, each with a tensor core and 512 vector registers, each holding 32 4-byte words.
- Tensor Memory Accelerator (TMA) asynchronously fetches subtiles into global and shared memory.
- a warp is 32 threads in one quadrant
- a warp group is 128 threads in a streaming multiprocessor
- warp group matrix multiply accumulate (wgmma) instructions asynchronously launch a matrix multiply from shared memory or registers. They can do other work in parallel.
- memory coalescing: parallel threads running the same instruction access consecutive locations of global memory, allowing optimal use of global memory bandwidth. The parallel requests are coalesced into a single memory access.
- a bank conflict is when the same memory bank is asked to provide multiple different pieces of memory, which leads to requests being serialized.
- swizzle patterns arrange memory to optimize coalescing and avoid conflicts.
- Lovelace consumer line.
Blackwell B100. 192 GB, 2 PFLOPS. 1000W. 2 dies x 104B transistors/die. 4nm.
PTX: Parallel Thread Execution
https://hazyresearch.stanford.edu/blog/2024-05-12-tk
https://en.wikipedia.org/wiki/OpenCL compute API
Programming languages
https://en.wikipedia.org/wiki/Applications_architecture
https://en.wikipedia.org/wiki/Client%E2%80%93server_architecture
- Multitier architecture
- Model-view-controller (MVC)
Imperative programming: carry out tasks in order.
Object-oriented programming: based on instances and classes, which provide encapsulation and inheritance. A class is an interface and a template for creating objects.
Functional programming: functions can be passed as arguments and return values. Natural way to express transformations on data. Minimizes mutable state (pure functions) making it easier to reason about, debug, and rerun code.
- Fold, reduce, accumulate, or catamorphism analyzes a recursive data structure with a combining operation
- 1975. Lisp and Scheme by Guy L. Steele and Gerald Jay Sussman in the Lambda Papers.
Dispatch
Declarative programming like SQL and Prolog
https://en.wikipedia.org/wiki/Domain-specific_language (DSL)
https://en.wikipedia.org/wiki/Template:Programming_paradigms_navbox
https://en.wikipedia.org/wiki/Evaluation_strategy
- Lazy evaluation
- Binding
- Pass by value means copying object data (default)
- Pass by reference: void pbr(std::string& s); // call and use normally
- Pass by pointer
Python and Java: variables are bindings and not pointers. There’s no way to directly modify memory locations and variable reassignment doesn’t affect the underlying object. Function arguments don’t copy so mutations will change the object. “Pass references by value”.
void pass_by_pointer(std::string* s);
std::string s = "a";
pass_by_pointer(&s, n);
s->insert(0, "b"); // or (*s).insert
Garbage collection frees memory used by inaccessible objects
- Methods:
- Reference counting doesn’t handle cycles
- Tracing handles the full accessibility graph
- Stop-the-world mark/sweep: mark live objects then sweep all memory
- Tri-color: gray set contains objects reachable from root, white set other objects, black set are objects reachable from root without pointers to white set objects, can GC white set when gray set is empty
- Conservative GCs assume any bit pattern can be a pointer to an allocated object; precise GCs know all objects exactly
- Generational GC improves performance by collecting the young region more frequently
- GC pauses: stop-the-world, incremental (short pauses), concurrent
https://en.wikipedia.org/wiki/Product-family_engineering
https://en.wikipedia.org/wiki/Component-based_software_engineering
https://en.wikipedia.org/wiki/Anti-pattern
Data
Metadata
Transactional data
Hierarchical data
Unstructured data
Business intelligence (BI)
https://en.wikipedia.org/wiki/Data_integration
https://en.wikipedia.org/wiki/Cluster_analysis
Embedding and clustering
Visualization
- UI elements: line weight, color, style.
- One dimensional distributions.
- Pie chart: percentages.
- Histogram: counts.
- Box plot: summary statistics.
- Violin plot
- Two dimensional
- Categorical vs. continuous
- Categorical vs. categorical variables: two-way frequency tables, mosaic plots, and segmented bar charts
- Euler diagram shows only relevant relationships between sets.
- Venn diagram shows all 2^n relationships between sets.
- Continuous vs. continuous
- Scatter plot
- Line plot
- Contour plot
- A thematic map visualizes geographic patterns.
- A choropleth map uses color to represent some variable over space, such as population density or income.
- Vega https://news.ycombinator.com/item?id=41328749
- Specific
Graphics
Gamut is the set of colors that can be accurately represented by a device.
- The line of purples represents the spectral colors. It is the bottom edge of a chromaticity diagram.
- CIE 1931 color space: spectral sensitivity curves conevrt objective color to tristimulus values in LMS color space. LMS describes long, medium, and short-wavelength human color receptor response.
- sRGB is the default colorspace which is fairly limited. Colors are encoded in the range 0-1. These are gamma-expanded to linear light values, then multiplied by a matrix encoding CRT color primaries that define the gamut.
- YCbCr encodes luminance, blue difference, and red difference.
https://en.wikipedia.org/wiki/Digital_image_processing
- Unsharp masking increases contrast by subtracting a blurred version of the image.
Graphics pipeline
- Vertex shader and vertex pipeline transforms 3D to 2D points.
- transform local global viewport coordinates
- Object coordinates -> model & camera transformation -> camera coordinates
- Hull shader, tesselation, domain shader
- Geometry shader: Lighting
- perspective projection: 3D to 2D, z-buffer
- Window-viewport transform to device coordinates
- Clipping disables rendering outside the view volume (frustum).
- Hidden-surface determination
- backface culling removes polygons facing away from the camera.
- Triangle rasterization
- Texture filtering
- Pixel shading computes local light levels
- Phong shading interpolates surface normals across rasterized polygons.
- Phong reflection models local illumination as diffuse reflection, specular reflection highlights, and ambient light.
- Z-test, alpha blending, anti-aliasing
- https://en.wikipedia.org/wiki/Geometry_pipelines
- https://en.wikipedia.org/wiki/Template:Graphics_Processing_Unit
- https://en.wikipedia.org/wiki/Perspective_(graphical)
Rendering
- Gamma
- Rendering equation: radiance leaving a point = emitted + reflected radiance.
- Ray tracing
- Path tracing is unbiased.
- bidirectional path tracing.
- volumetric path tracing considers the light scattering.
- Metropolis light transport.
- Photon mapping is biased.
- Diffuse reflection
- Radiosity is a finite element method to solve the rendering equation.
- color buffer
- Z-buffer
- stencil buffer limits the area of rendering (stenciling).
- Shadow mapping tests whether a point is visible from a light source using z buffers.
- Shadow volume uses shadow geometry.
Photogrammetry: measurement through photography.
https://en.wikipedia.org/wiki/Photogrammetry
https://en.wikipedia.org/wiki/Collinearity_equation
https://en.wikipedia.org/wiki/Bundle_adjustment
https://en.wikipedia.org/wiki/3D_reconstruction
https://github.com/leandromoreira/digital_video_introduction
Collision detection GJK
https://news.ycombinator.com/item?id=40660761
Web
Languages
- HTML
- CSS
- JavaScript
- TypeScript is a typed language that compiles to JS and adds syntactic sugar.
UX
https://ux.stackexchange.com/questions/456/when-should-i-use-a-select-box-instead-of-radio-buttons
UI elements
Menu bar or Application Bar at the top
Panels such as tools and properties
Rectangle select
1968. The Mother of All Demos by Douglas Engelbart: computer mouse, hypertext, and graphical user interfaces.
https://en.wikipedia.org/wiki/Mode_(user_interface)
https://en.wikipedia.org/wiki/Internet_service_provider
https://en.wikipedia.org/wiki/Session_ID
https://en.wikipedia.org/wiki/HTTP_cookie
Artificial intelligence
- 1950. Claude Shannon builds the Theseus mouse to solve mazes.
- 1952. MIT and Rockefeller Foundation conference on machine translation.
- 1954. Georgetown-IBM experiment: machine translation of 60 Russian sentences to English using six grammar rules and 250 words and stems on punch cards on an IBM 701 mainframe.
- 1956. Dartmouth workshop founds AI. John McCarthy, Marvin Minsky, Julian Bigelow, D.M. Mackay, Ray Solomonoff, Nathaniel Rochester, Claude Shannon.
- Automata theory
- Cybernetics by Norbert Wiener emphasises analogue feedback.
- 1957. Perceptron by Frank Rosenblatt.
- 1969. Perceptrons book by Marvin Minsky and Seymour Papert.
- Perceptron cannot solve the XOR problem.
- 1964. IBM Automatic Language Translator for the US Air Force performs word-for-word translation. Stores 150,000 words as black or clear rectangles on a 16" plastic disk spinning at 2,400 RPM or 30Mbit/s.
- 1966. ALPAC report causes the first AI winter. It argued that machine translation was 20% slower and 30% less understandable.
- 1973. Lighthill report cuts AI funding in Britain.
- 1978. Xcon or R1 expert system for configuring VAX computer systems.
- 1979. Blackboard system: common knowledge base successively refined by many specialist knowledge sources.
- 1982. Japan Fifth Generation Computer Systems emphasizes parallel processing and concurrent logic programming.
- 1983. DARPA Strategic Computing Initiative by the Department of Defense invests $1B in general AI.
- 1987. Jack Schwarz cuts AI funding.
- 2004. DARPA Grand Challenge for autonomous vehicles and 2007 DARPA Urban Challenge.
- https://en.wikipedia.org/wiki/History_of_machine_translation
Information science
https://en.wikipedia.org/wiki/Symbolic_artificial_intelligence
https://en.wikipedia.org/wiki/Structure_mining
https://en.wikipedia.org/wiki/History_of_artificial_intelligence
Decision tree
Search
Planning
Knowledge representation and reasoning (KRR)
https://en.wikipedia.org/wiki/Ontology_(information_science)
- Individual, instance, token, or object
- Class, collection, set, concept, or type
- Attribute, aspect, property, feature, characteristic, or parameter
- An event is a change in attribute
- Attribute-value system
- rows, records, or dependent variables.
- columns, fields, dimensions, or independent variables.
- cell, value, or state.
- RDF
- Relation
- https://en.wikipedia.org/wiki/Hierarchy#Subsumptive_containment_hierarchy
- https://en.wikipedia.org/wiki/Blazegraph triple store
- https://en.wikipedia.org/wiki/Web_Ontology_Language
- https://en.wikipedia.org/wiki/SPARQL query language
- Concept map or mind map
- Argument map and issue map
- https://en.wikipedia.org/wiki/Knowledge_representation_and_reasoning
- https://en.wikipedia.org/wiki/Semantic_network
- https://en.wikipedia.org/wiki/Open_Mind_Common_Sense
- https://en.wikipedia.org/wiki/Knowledge-based_systems
- https://en.wikipedia.org/wiki/Record_linkage
- https://en.wikipedia.org/wiki/MIT_Media_Lab
- https://en.wikipedia.org/wiki/BRMS
- https://en.wikipedia.org/wiki/Cognitive_systems
- https://en.wikipedia.org/wiki/Intelligent_agents
- https://en.wikipedia.org/wiki/OpenCog
- Document management
- Notion supports tables with list and gallery views
- Airtable
- Excel
- OneNote and Evernote support written notes
2012. Wikidata
Ontology or taxonomy
- Ontology is for item typing, by subclass relationships.
- Category relationships are different: map
- Entity schema: instance of human should have certain properties.
- Entity type Q29934218
- class P2308 relation P2309
- Wikidata ontology seems too nested
- entity Q35120 is the ontology root
- observable entity > phenomenon > state Q3505845 > condition Q813912
- mental state > emotion Q9415
- sadness
- happiness Q8 > grief
- collective entity > system
- conceptual system > knowledge system > science
- exact science > formal science: mathematics, logic, statistics, computer science
- Other ontologies: DOLCE, BFO, SUMO, Lemon, RDA
- First items: universe Q1 earth 2 life 3 death 4 human 5
- Africa 15 Canada 16 Japan 17 South America 18 cheating 19 Norway 20
- https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology
- https://www.wikidata.org/wiki/Wikidata:WikiProject_Ontology/Top-level_ontology_list
Properties
- https://www.wikidata.org/wiki/Help:Properties
- https://www.wikidata.org/wiki/Wikidata:List_of_properties
- title 1476 language 407 image P18 video 10
- father 22 mother 25 spouse 26 child 40 sibling 3373 pet 1429
- DOB 569 sex 21 ethnicity 172 religion 140 citizenship 27 pronoun 6553 political affiliation 102 ideology 1142
- occupation 106 creator 170 notable work 800 role 2878 member 463 participant 1344 military branch 241
- award received 166 depicted by 1299 influenced by 737 social media followers 8687
- languages 1412 birthplace 19 culture 2596
- work location 937
- eye color 1340 blood type 1853 handedness 552 medical condition 1050
- height 2048 mass 2067
- official website 856
- caused by 828 causes 1542
- genre 136
- Meta
- Qualifier: point in time P585 start 580 end 582 nature of statement 5102 deprecated?
- cites work 2860 is most used, with 290M usages.
- instance of 31: person, place, thing, concept
- main Wikimedia category 910
- subclass of 279
- facet of 1269
- usage instruction 2559
- item of this property 1629
- stability of property value 2668
- language code
- https://en.wikipedia.org/wiki/Wikipedia:Wikidata includes #statements
Authority ID properties
- https://en.wikipedia.org/wiki/Wikipedia:Authority_control
- Google Knowledge Graph P2671
- OpenAlex 10283
- Microsoft Academic Graph (shut down) 6366
- ISNI 213
- VIAF 214
- WordNet 8814
- BabelNet 2581
- WorldCat FAST Linked Data 2163
- topics
- Quora 3417
- GitHub 9100
- ScienceDirect 10376
- Semantic Scholar 6611
- Specialized
- Social
- Facebook 2013
- Twitter 6552
- YouTube 2397
- LinkedIn 6634
- Medium 3899
- Gene Ontology 686
- ChemSpider 661
- PubChem 662
- MusicBrainz artist 434, work 435, album 436, label 966, area 982, place 1004
- ORCID 496
- genealogics.org 1819
- Geni.com 2600 (150M records)
- WikiTree 2949 (22M records)
- National authorities
- Library of Congress Control Number LCCN 244
- German National Library GND 227
- Bibliothèque nationale de France 268
- Japan National Diet Library NDL 349
- National Library of Israel 8189
- Norway Authority File NORAF 1015
- Bibliothèque et Archives nationales du Québec 3280
- Encyclopedia Britannica 1417
- Data Category Registry ISOCAT 2263
- Getty Art & Architecture Thesaurus 1014
- Encyclopedia of China 10565
- number of records 4876
Property constraint P2302
- constraint status P2316
- mandatory constraint Q21502408
- suggestion constraint Q62026391
- exception to constraint P2303
- inverse constraint Q21510855
- single-value Q19474404
- one-of Q21510859
- distinct-values Q21502410
- conflicts-with Q21502838
- format Q21502404
- allowed qualifiers Q21510851
- allowed-entity-types Q52004125
- property scope Q53869507
- subject type Q21503250
https://query.wikidata.org
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
# note that ?p is a wd:, while ?pd is a wdt:
# wdt: does not have readable labels or any properties at all
SELECT ?p ?pLabel ?o ?oLabel {
wd:Q31 ?pd ?o .
?p wikibase:directClaim ?pd .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
LIMIT 40
# directClaim allows removing common filters:
# schema:description, schema:version, schema:dateModified
# FILTER(!STRSTARTS(STR(?property), "http://schema"))
# FILTER(!STRSTARTS(STR(?property), "http://wikiba.se/")) # wikibase:statements
# FILTER (?property NOT IN (rdfs:label, skos:altLabel))
defaultView:BubbleChart
Cloud
Infrastructure as a Service (IaaS) includes virtual private servers (VPS).
Platform as a Service (PaaS)
Software as a Service (SaaS).
Smaller providers: Alibaba Cloud, IBM Cloud, Oracle Cloud, Rackspace, DigitalOcean, Salesforce
Amazon Web Services (AWS)
Largest and most mature.
Compute
- EC2: Elastic Compute Cloud: virtual servers. Auto Scaling groups. Availability Zones (AZ).
- Systems Manager (SSM) Agent provides SSH access without exposing an SSH port.
- AWS Firecracker runs lightweight microVMs.
- Lightsail: simple app deployment with virtual servers, storage, databases, and networking.
- Lambda: run code without managing servers.
Storage
- S3: Simple Storage Service object store. Objects cannot be edited once created.
- EFS: Elastic File System.
- EBS: Elastic Block Store.
- FSX: File Storage for Windows and Linux.
Database
- RDS: Relational Database Service: managed database
- DynamoDB: NoSQL database with under 10 ms latency.
- Aurora: global scale enterprise database. On-demand replica scaling within minutes.
Networking
- Virtual Private Cloud (VPC): isolated virtual networks behind an internet gateway. NAT gateway for outbound traffic from a private subnet.
- Application Load Balancer
- ElastiCache in-memory key-value store, such as user sessions. Memcached or Redis.
- AWS Certificate Manager automatically rotates SSL certificates.
- CloudFront CDN
- Route 53: DNS service.
Analytics
- Redshift: data warehouse service
- Kinesis: streaming data processing
- Athena: SQL query service for data in S3
Management
- CloudTrail: governance, compliance, and operational auditing
- CloudWatch: performance metrics.
- IAM: Identity and Access Management.
- WAF: Web Application Firewall
- Inspector: application vulnerability scanner.
- Secrets Manager rotates application secrets.
- Systems Manager Parameter Store manages shared storage endpoints, database, and cache configs.
CI/CD
- CodeCommit git repos.
- CodeBuild with tests and packaging.
- CodeDeploy zero-downtime blue-green deployments.
- CodePipeline automates build and deploy.
Microsoft Azure
Second-largest.
Vertex AI (ML), AI Platform Notebooks
Compute Engine (VM), Kubernetes Engine (containers), App Engine, Functions (serverless)
Cloud Storage, Cloud SQL, Cloud Spanner (relational), BigQuery warehouse, Cloud Bigtable (NoSQL), Dataflow (Apache Beam processing), Pub/Sub message delivery
Cloud VPN (connect on-prem and cloud), Load Balancing, CDN, DNS
Monitoring, Logging, IAM, Armor (DDoS, etc protection), Key Management Service, Security Command Center
History
First computers
- 200 BC. Antikythera mechanism is the first known analog computer, using bronze epicyclic gears to predict moon precession.
- 1642. Adding machine by Blaise Pascal and in 1623 by Wilhelm Schickard.
- 1694. Stepped reckoner by Leibniz uses a Leibniz wheel.
- 1820. Arithmometer by Thomas de Colmar is the first mass-produced calculating machine.
- 1837. Charles Babbage: Analytical engine is the first general purpose computer.
- 1820. Difference engine mechanical calculator tabulates functions using the method of divided differences.
- 1872. Tide-predicting machine by Sir William Thomson uses harmonic analysis.
- 1881. Tide-predicting machine by William Ferrel in the US.
- 1877. AT&T or “Ma Bell” dominated telecom for 100 years and employed over 1 million people.
- 1881. Bell acquires Western Electric as its main manufacturer. Western is Elisha Gray’s business, its main competitor.
- 1885. AT&T (American Telephone and Telegraph Company) builds the telephone network. In 1899 it absorbs the telephone patent held by Bell (est. 1877).
- 1925. Bell Labs founded.
- 1984. Bell breaks up into 7 Baby Bells or Regional Bell Operating Companies (RBOCs) after the US v. AT&T (1982) antitrust suit.
- 1890. Tabulating machines or unit record equipment like IBM 407 are an early electromechanical computers that processed punched card data using metal gears. By Herman Hollerith, whose company eventually becomes IBM.
- 1904. Vacuum tube by Sir John Ambrose Fleming is the first generation computer. Programmed via front panel switches.
- 1928. Differential analyser by Vannevar Bush solves arbitrary diferential equations.
- 1876. James Thomson describes a general-purpose differential analyzer.
- 1912. Argo Clock battleship fire-control system by Arthur Pollen.
- 1938. Oslo Analyzer is the largest computer for four years.
- 1940. George Stibitz develops the Complex Number Calculator at Bell Labs. First remote computer, over teletype.
- 1944. Harvard Mark I is the first programmable computer, using 3,500 multipole relays. By Howard Aiken.
- Used by John von Neumann for Manhattan Project implosion computations.
- 50’ long drive shaft synchronized the calculating units, which were 8’ tall and 2" deep. 5 tons.
- 60 sets of 24 switches for manual data entry.
- 72 calculating units operating on one 23-digit decimal number each.
- 24-channel punched tape for instructions. Add 0.3 s, multiply 6 s, divide 15 s, and log 60 s.
- 500 miles of wire, 3 million connections, 35,000 contacts, 2,225 counters, 1,464 tentpole switches.
- 1945. ENIAC is the first programmable electronic computer. By John Mauchly and J. Presper Eckert.
- Used by John von Neumann to calculate hydrogen bomb reactions.
- 1947. Association for Computing Machinery (ACM).
- 1947. Point-contact transistor by John Bardeen, Walter Brattain, and William Shockley at Bell Labs. 1956 Nobel Prize. Second generation of computer and assembly programming.
- 1948. Curta is a portable mechanical calculator nicknamed the pepper grinder or math grenade.
- 1949. Magnetic-core memory.
- 1951. UNIVAC I helps CBS predict an Eisenhower landslide. Developed for businesses by the inventors of the ENIAC.
- 1957. Optical amplifier and laser by Gordon Gould.
- 1957. Casio all-electric pocket calculator.
- 1958. IBM 7000 series mainframe computer and 1400 series midrange computer.
- 1959. Grace Hopper invents COBOL and English words for programming.
Mainframes
- 1959. Integrated circuits by Robert Noyce at Fairchild Semiconductor. Third generation of computer.
- 1962. LINC is a 12-bit minicomputer.
- 1964. MOS SRAM. 1966 DRAM. 1987 commercial NAND flash memory by Toshiba.
- 1964. Multics is an influential time-sharing operating system for mainframe computers.
- 1965. PDP-8 12-bit computer. 50,000 units sold.
- 1965. IBM System/360 mainframe.
- 1968. Go To Statement Considered Harmful by Edsger Dijkstra pushes for structured programming.
- 1969. Multics is an influential time-sharing operating system.
- 1969. ARPANET.
- 1970. PDP-11, 16-bit computer.
- 1970. Warez scene of software cracking on bulletin board systems (BBS). Release groups upload to topsite FTP servers. The pre release is the first, while all later releases for the same material are duplicates or dupes. .nfo files describe the content and the group.
- 1973. C and Unix by Ken Thompson and Dennis Ritchie. 1983 Turing Award.
- Brian Kernighan and Dennis Ritchie write the The C Programming Language book.
- Unix pipelines proposed by Douglas McIlroy.
- 1973. Xerox PARC introduces the Alto computer with GUI, mouse, WYSIWYG editor, and multitasking.
- 1977. VAX (Virtual Address eXtension) ISA.
- 1980. CompuServe popularizes online message forums and games.
- 1980. Demoscene.
- 1982. NEC (Nippon Electric Company) PC-98 32-bit computers dominate Japan.
- 1983. MIT Project Athena creates the X window system and Kerberos.
- 1983. GNU Project founded by Richard Stallman.
- 1985. C++ by Bjarne Stroustrup.
- 1991. Linux kernel by Linus Torvalds.
- 1992. Plan 9 from Bell Labs by Rob Pike and Ken Thompson
- 1994. The UNIX-HATERS Handbook republishes an essay that the “Worse is better” Berkeley west coast approach is more successful than the Right Thing or MIT east coast approach.
- 1982. Alan Perlis, Epigrams on Programming
- 54: Beware of the Turing tar-pit in which everything is possible but nothing of interest is easy.
- 1993. Code Complete by Steve McConnell
- 1999. The Cathedral and the Bazaar by Eric S. Raymond
- Linus’s law: given enough eyeballs, all bugs are shallow
Internet
- 1989. World Wide Web by Tim Berners-Lee at CERN via dial-up.
- 1990. EFnet (“Eris Free Net”) replaces A-net (Anarchy Net) which had a spammy server at eris.berkeley.edu. Undernet forks off EFnet in 1992, and DALnet forks off Undernet in 1994. European IRCnet forks off a EFnet in 1996.
- 1993. Mosaic web browser with inline images by Marc Andreessen.
- 1994. Netscape founded by Marc Andreessen. Jamie Zawinski develops the Netscape Navigator and later runs the DNA Lounge nightclub in SF. Brendan Eich creates JavaScript.
- 1995. Java applets. Newgrounds.
- 1997. AOL or America Online provides dial-up internet and pioneers AOL Instant Messanger (AIM). AOL buys ICQ, the first instant messenger. Warner buys AOL in 2001 and Verizon buys AOL in 2015, merging it with Yahoo.
- 1998. PayPal founded by Peter Thiel, Max Levchin. It merges with x.com founded by Elon Musk and is acquired by eBay (26B).
- 1999. Something Awful forums and Albino Blacksheep.
- 2001. Internet Explorer wins the browser wars as the default browser on with Windows. Netscape releases Mozilla Firefox. Microsoft antitrust lawsuit allows PC manufacturers to sell non-Microsoft software.
- YTMND (You’re the Man Now, Dog) shares fad web pages with looping sound.
- 2001. BitTorrent peer-to-peer file sharing. Uses a distributed hash table.
- 2002. Tor anonymous onion routing project.
- 2003. Gaia Online anime forums.
- 2004. WHATWG develops the HTML and DOM standard.
- 2005 iGoogle is a personal start page with Ajax widgets. Ends in 2013.
- 2009. https://en.wikipedia.org/wiki/Omegle
- 2010. https://en.wikipedia.org/wiki/Chatroulette
Misc
Recover Python code
https://news.ycombinator.com/item?id=13847465
C# GUI programming: https://www.unterminated.com/random-fun/how-to-copy-a-file-from-a-30-year-old-laptop
github.com/maybe-finance/maybe shows bank accounts and credit cards.
https://en.wikipedia.org/wiki/Information_retrieval
GoogleContentWarehouse client publishes ranking features.
- badClicks, goodClicks, lastLongestClicks and unsquashedClicks
- Change history
- Twiddlers
- isElectionAuthority and isCovidLocalAuthority.
- smallPersonalSite
- bylineDate, syntacticDate, semanticDate
- titlematchScore
- RegistrationInfo
https://news.ycombinator.com/item?id=40505708
GQL: Graph Query Language
https://hal.science/hal-04094449
https://arxiv.org/abs/2112.06217
https://en.wikipedia.org/wiki/Template:Software_development_process
https://en.wikipedia.org/wiki/Scheduling_(computing)#See_also
https://en.wikipedia.org/wiki/Template:C_standard_library
https://en.wikipedia.org/wiki/Template:Memory_management
https://en.wikipedia.org/wiki/Template:Internet_protocol_suite
https://en.wikipedia.org/wiki/Template:Linux_kernel
https://en.wikipedia.org/wiki/Template:Unix_commands
https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_components
https://en.wikipedia.org/wiki/Template:Processor_technologies
https://en.wikipedia.org/wiki/Template:Computer_laws
https://en.wikipedia.org/wiki/Template:Design_patterns
https://en.wikipedia.org/wiki/Bell_Labs