Skip to content

Instantly share code, notes, and snippets.

@fanfeilong
Last active March 12, 2020 17:10
Show Gist options
  • Save fanfeilong/9974180 to your computer and use it in GitHub Desktop.
Save fanfeilong/9974180 to your computer and use it in GitHub Desktop.
具体编程语言重要概念笔记

编程语言笔记

scheme笔记


语言基础
  • atom: null?, zero?, number?, quote?
  • list: car,cdr,cons
  • control: cond else, let, define
  • s expression = atom or list
  • lambda expression
重要概念
  • recursion
  • tail recursion
  • continuation
  • call with current continuation
  • continuation passing style (CPS)
  • CPS transform
  • lazy calculation
  • collection
  • closure
  • coroutine
  • lambda recursion
  • Y combinator
  • close variable, free variable

c 语言


c++ 语言


  • basic
  • class
    • default functions
    • function override
    • const
  • template
    • function template
    • class template
    • specification
    • partial specification
    • concept lite
    • template template
    • variant template arguments
    • trait
  • stl
    • functor
    • allocator
    • container
    • algorithm
    • iterator
    • adaptor
  • c++ 98/03/11/14
  • More C++ Idioms

csharp 笔记


核心概念
  • lambda expressions, delegate, event, Action<T> and Func<T>
  • anonymous types
  • closure
  • extension methods, extension attributes
  • implicity typed local variable
  • object and collection initializer
  • query expressions
  • expression tree
  • linq to object,linq to xml, linq to sql, linq to anything
  • LINQ
  • Task<T>, async/await
  • IEnumrable<T>, IComparable<T>, IQueyable<T>, IObserverable<T>
  • dynamic

Java


Basic

Haskell

ErLang

  • 轻量多进程是一种分布式开发的可行构架,强隔离性。
  • 基于消息传递的分布式消息传递机制,一定程度上解决了对低粒度可见锁的依赖,但逻辑上的“锁”不属于他要解决的问题
  • 多进程间的并发执行,单进程内的顺序执行。多进程设定在分布式机器上,带来伸缩性;单进程的顺序性带来开发的便利性。
  • 基于监督树的速错机制(And和Or节点),多层次定义一个系统的错误粒度,如果干复杂的事出错,就干点简单的事
  • 基于Behavior的模块化插件开发方式。将软件开发的不同抽象层分离出来,核心人员写Behavior,应用开发写具体的实现。
  • 提出“乖函数”的概念。
  • 利用尾递归技术,实现一定程度的热更新。

Program Practice

  • Values
    • Communication(Readable)
    • Simplicity
    • Flexibility
  • Principles
    • Local Consquences
    • Minimize Repetition
    • Logic and Data Together
    • Symmetry
    • Declarative Expression
    • Rate of Change
  • Cost of software
    • Cost of Develop
    • Cose of Maintain
      • Cost of Understand
      • Cost of Change
      • Cost of Test
      • Cost of Deploy

Y Combinator 简析


#####什么是Combinator

  1. 首先必须是个lambda表达式
  2. 其次所有的变量都必须是lambda表达式的参数,也就是没有自由变量

例子

(lambda (x) x) //是
(lambda (x) y) //不是,有自由变量y
(lambda (x) (lambda (y) x)) //是
(lambda (x) (lambda (y) (x y))) //是
(x (lambda (y) y)) //不是,非lambda表达式
((lambda (x) x) y) //不是,非lambda表达式

#####如何用lambda做递归

  1. 传入自己(待递归的lambda表达式)的拷贝。
  2. 延迟(传入自己拷贝后的lambda表达式在调用自己拷贝处的)求值,以免整个lambda还没被使用就死递归在无限求值上。

所以这样得到的lambda递归会具有如下形式

(define (part-factorial self)
    ((lambda (f)
       (lambda (n)
         (if (= n 0)
             1
             (* n (f (- n 1)))))) //此处f是(self,self),如果不延迟求值,会无限递归
     (self self)))

(define factorial (part-factorial part-factorial))

#####剥离Y Combinator,以便复用

(define (Y f)
    ((lambda (x) (x x))       //这个是对下面那个lamba的(x x)
     (lambda (x) (f (x x))))) //这个是对目标lambda的(x x)

这种剥离只是为了复用,如果你不想复用,每次都像上面的factorial一样写那个结构也可以,理解这点才是重点。这就跟C语言的宏一样,很多用宏做的东西,你展开后看很简单,用宏只是在理解的基础上的复用。学习算法和数据结构也是一样,泛型容器和算法只是在你理解基础上的复用,但你要会手写才真的知道其中关键点。复用只是为了减少代码量,手写结构,并理解才是基础。

#####Y Combinator可以写成更直观:

(define (Y f)
    ((lambda (y) (y y))       //这个是对下面那个lamba的(y y)
     (lambda (x) (f (x x))))) //这个是对目标lambda的(x x)

全部都是x的那个版本,只是为了装代数符号的逼格而把上面的小y也写成x绕晕你。

#####Y Combinator刚好符合函数不动点性质

(Y f) = (f (Y f))

Monad


描述

In pure functional programming languages you aren't allowed to cause side effects. The only way a function can 'interact' with the outside world is to return something. So if you have a function that needs to return a number, but has a certain side effect, the only way to do this is to return the number and return some data 'containing' the side effect. But that's a pain in the ass - you have to write all of your code to return multiple values, and you have to write plumbing code to pass these side-effect containing values around. What monads can do is give a uniform API to side-effect containing data so you can just concentrate on the number that you want to return, and have the side-effect carried along automatically in the background. What's neat about monads is the same API can be used for many different problems, not just handling side-effects.

探讨

基于集合论的结构:群、环、域、模、拓扑空间、算子、单子等。唯一目的就是寻找这样的集合,其元素在给定几条公理化操作下,保持某些不变的性质。这样的集合结构上可以建立公理、推导引理、推导定理,使得只要是这种结构的集合,就可以使用这些公理、引理、定理证明其拥有怎样的性质和结论。

单子,正是这样一种结构,只是其刚好被这帮搞lisp的人用在了以函数为元素的集合上,并且这样的一种结构刚好能用解决函数级别的抽象封装问题。想清楚这点,就不至于陷入细节而不知其所以然。

参考资料


  1. Monads in C
  2. Monads in C++
  3. Introduction to Category Theory
  4. [Monoids, Functors, Applicatives, and Monads: 10 Main Ideas] (http://monadmadness.wordpress.com/2015/01/02/monoids-functors-applicatives-and-monads-10-main-ideas/)
  5. a page about call/cc
  6. stackoverflow:what is call/cc
  7. douban:call/cc
  8. wesdyer:cps.net
  9. schemewiki:call/cc
  10. Introduction to Y Combinator
@fanfeilong
Copy link
Author

关于特性的思考

  • 语言特性并不是越多越好
  • 并不是掌握语言特性多,或者知道细节多就牛逼,和牛逼没关系
  • 只是如果要经常用,不得不做这些索引,以便快速查询,免得浪费人生,仅此而已

@fanfeilong
Copy link
Author

C++98的问题
C++11虽然出来了,但国内很多公司都不允许使用,从实际情况来看,似乎能掌握好C++11标准的新手程序员也是参差不齐。只用C++98的问题在于,需要自己关心的细节太多。最显著的是静态函数和成员函数之间的不能自如切换,其次是闭包的缺失,智能指针就更不说了。很多人说用boost库,boost库那一坨代码看着就想吐,根本没有让人用到欲望,还不如手工做一些简洁的封装。

@fanfeilong
Copy link
Author

关于私有继承的第一个规则正如你现在所看到的:和公有继承相反,如果两个类之间的继承关系为私有,编译器一般不会将派生类对象(如Student)转换成基类对象(如Person)。这就是上面的代码中为对象s调用dance会失败的原因。第二个规则是,从私有基类继承而来的成员都成为了派生类的私有成员,即使它们在>基类中是保护或公有成员。行为特征就这些。

这为我们引出了私有继承的含义:私有继承意味着 "用...来实现"。如果使类D私有继承于类B,这样做是因为你想利用类B中已经存在的某些代码,而不是因为类型B的对象和类型D的对象之间有什么概念上的关系。因而,私有继承纯粹是一种实现技术。

评价
私有继承简直是鼓励错误的代码设计,废品。

@fanfeilong
Copy link
Author

Well, the compiler is pretty explicit: class1 is a class template, so it needs template parameters. You don't have any here:

typedef typename class2<typename class1::Type1> Type3;
          //                           ^ here!

You need something of the form

typedef class2<typename class1<T>::Type1> Type3;

where T is probably T2. Note there is no need for the first typename.
As in your previous question,the typedefs should be public.

nested-template-typedef-type-definition

@fanfeilong
Copy link
Author

如何在构造函数里初始化C++数组成员

参考initializing-a-member-array-in-constructor-initializer

@fanfeilong
Copy link
Author

C/C++里,函数返回固定大小的数组的做法?

  • 通过固定大小数组参数返回:
void GetInternalIp(uint8_t internalIp[4]){
    for (int i=0;i<4;i++){
        internalIp[i] = m_internalIp[i];
    }
}
  • 通过把固定大小数组放在结构体里,函数直接返回结构体
struct result{
    uint8_t ip[4];
}
struct result GetInternalIp(){
    struct result r;
    for (int i=0;i<4;i++){
        r.ip[i] = m_internalIp[i];
    }
    return r;
}
  • 指针参数+长度参数
void GetInternalIp(uint8_t* pArray,size_t arrayLength,size_t* pWritedLength)

@fanfeilong
Copy link
Author

C++里返回string的最佳做法是什么?

参考:What is the best way to return string in C++?

It's a good question and the fact that you're asking it shows that you're paying attention to your code. However, the good news is that in this particular case, there's an easy way out.
The first, clean method is the correct way of doing it. The compiler will eliminate unnecessary copies, in most cases (usually where it makes sense).

@fanfeilong
Copy link
Author

C++如何校验一个模板类是某个类的子类

Following an example from Stroustrup:

template<class Test, class Base>
struct AssertSameOrDerivedFrom {
  AssertSameOrDerivedFrom() { &constraints; }
public:
  static void constraints() {
    Test *pd = 0;
    Base *pb = pd;
  }
};
template<class T>
struct YourClass {
  YourClass() {
    AssertSameOrDerivedFrom<T, CBaseClass>();
  }
};

In C++0x, this becomes:

template<class T>
struct YourClass {
  static_assert(std::is_base_of<CBaseClass, T>::value);
};

@fanfeilong
Copy link
Author

@fanfeilong
Copy link
Author

常量以k开头命名的原因

"c" was the tag for type "char", so it couldn't also be used for "const"; so "k" was chosen, since that's the first letter of "konstant" in German, and is widely used for constants in mathematics.

http://stackoverflow.com/questions/5016622/where-does-the-k-prefix-for-constants-come-from

It's a historical oddity, still common practice among teams who like to blindly apply coding standards that they don't understand.

Long ago, most commercial programming languages were weakly typed; automatic type checking, which we take for granted now, was still mostly an academic topic. This meant that is was easy to write code with category errors; it would compile and run, but go wrong in ways that were hard to diagnose. To reduce these errors, a chap called Simonyi suggested that you begin each variable name with a tag to indicate its (conceptual) type, making it easier to spot when they were misused. Since he was Hungarian, the practise became known as "Hungarian notation".

Some time later, as typed languages (particularly C) became more popular, some idiots heard that this was a good idea, but didn't understand its purpose. They proposed adding redundant tags to each variable, to indicate its declared type. The only use for them is to make it easier to check the type of a variable; unless someone has changed the type and forgotten to update the type, in which case they are actively harmful.

The second (useless) form was easier to describe and enforce, so it was blindly adopted by many, many teams; decades later, you still see it used, and even advocated, from time to time.

"c" was the tag for type "char", so it couldn't also be used for "const"; so "k" was chosen, since that's the first letter of "konstant" in German, and is widely used for constants in mathematics.

@fanfeilong
Copy link
Author

匈牙利命名法

评价:

  • 匈牙利命名法这种编码方式并不见得就带来更好的阅读信息。以Google的Chromium为例,其命名风格就完全抛弃了匈牙利命名法。至今还在用匈牙利命名法风格的,都是品味和惯性问题。实际上很多工程上的老古董都是不思上进的结果,那些被这些老古董虐待过的,并以此为荣,觉得懂这些老古董才是高级货的,都是没见过世面的,以为非此路不可。可惜了,思维的局限限制了进步。

@fanfeilong
Copy link
Author

C的数组下标是指针偏移的缩写

With C arrays, why is it the case that a[5] == 5[a] ?
The C standard defines the [] operator as follows:
a[b] == *(a + b)
Therefore a[5] will evaluate to:
*(a + 5)
and 5[a] will evaluate to
*(5 + a)
and from elementary school math we know those are equal.

This is the direct artifact of arrays behaving as pointers, "a" is a memory address. "a[5]" is the value that's 5 elements further from "a". The address of this element is "a + 5". This is equal to offset "a" from "5" elements at the beginning of the address space (5 + a).

评价
指针计算是最基本的计算,其他的各种操作都是建立在指针计算的基础上重新定义的计算,这样就有了可推导的性质。

@fanfeilong
Copy link
Author

位操作

How do you set, clear and toggle a single bit in C/C++?

& 
0 0 0
1 0 0
0 1 0
1 1 1
|
0 0 0
0 1 1
1 0 1
1 1 1
^
0 0 0
0 1 1
1 0 1
1 1 0

=>

Setting a bit:
number|=(1<<n)

Clearing a bit:
number&=~(1<<n)

Toggling a bit:
number^=1<<n

Checking a bit:
bit=number&(1<<n)

评价:
高级编程语言完全可以把对位的操作封装成普通函数操作,比如

bit.set(number,i)
bit.clear(number,i)
bit.toggle(number,i)
bit.get(number,i)

然后通过编译器或者解释器的优化来达到同样的性能。这样就可以去掉这样的语法噪音
发明&,|,^完全是增加语法噪音,估计是被数学的符号发明症所影响,问题是数学的符号一般
都只是用来做逻辑上的推理和解释,并不需要被天天写和运行。

@fanfeilong
Copy link
Author

多个提前返回时的资源释放

xxx x = new xxx();
bool quit=false;
do{
  if(...){
    quit=true;
    break;  
  }

  if(...){
    quit=true;
    break;  
  }
}while(false);
if(quit){
  delete x;
  return;
}
//other code

@fanfeilong
Copy link
Author

转轴

很多时候,一个代码有好多个分支,写代码的时候需要使用if else if esle if else之类的繁琐的分支语句,我们可以通过Enum+Translate+SwitchCase的方式让代码更扁平化点,我称之为转轴

typedef enum tagXXXState{
  XXXState_1,
  XXXState_2
}XXXState;
XXXState TranslateState(condition){
  if(condition){
    return XXXState_1;
  }else{
    return XXXState_2;
  }
}
void Func(condition){
  switch(TranslateState(condition)){
    case XXXState_1:break;
    case XXXState_2:break;
    default:break;
  }
}

此处实例的分支只有2个,也没有嵌套,自然是看上去多余的,不过如果分支很多,又有很多个嵌套,则使用这种转轴代码会把嵌套的if else转换成扁平的switch-case代码。

@fanfeilong
Copy link
Author

明确加锁范围

加锁的范围要明确,利用C++的RAII,使用自动锁,同时最小化局部范围

{ 
  AutoThreadLock lock(m_lock); 
  //Data
}

@fanfeilong
Copy link
Author

内存对象的大小一律用size_t

内存对象的大小,位置,全部用size_t表示

@fanfeilong
Copy link
Author

malloc, calloc, realloc and free

函数原型

维基百科-C Dynamic memmory allocation

void *malloc(size_t size);
void * calloc(size_t number, size_t size);
void * realloc(void *ptr, size_t size);
void free(void *ptr);
DESCRIPTION
  • The malloc() function allocates size bytes of uninitialized memory. The allocated space is suitably aligned (after possible pointer coercion) for storage of any type of object.
  • The calloc() function allocates space for number objects, each size bytes in length. The result is identical to calling malloc() with an argument of ``number * size'', with the exception that the allocated memory is explicitly initialized to zero bytes.
  • The realloc() function changes the size of the previously allocated memory referenced by ptr to size bytes. The contents of the memory are unchanged up to the lesser of the new and old sizes. If the new size is larger, the contents of the newly allocated portion of the memory are undefined. Upon success, the memory referenced by ptr is freed and a pointer to the newly allocated memory is returned. Note that realloc() and reallocf() may move the memory allocation, resulting in a different return value than ptr. If ptr is NULL, the realloc() function behaves identically to malloc() for the specified size.
  • The free() function causes the allocated memory referenced by ptr to be made available for future allocations. If ptr is NULL, no action occurs.
内存分配原理

猛击这个页面

free(NULL)是合法的

Yes, it's safe. The language reference for the c language (and, I believe, c++ language by extension) indicate that no action will be taken.
This is OK in the sense that you can safely (re)free a pointer variable without worrying that it's already been deleted.
That's the language taking care of things just so you don't have to do stuff like
if (x) free(x);
You can just free(x) without worry.

malloc(-1)

此时,由于malloc的参数是size_t,所以-1被转成SIZE_MAX..

malloc的时候,core dump

Note that a crash in malloc or free is usually indicative of a bug which has previously corrupted the heap.

malloc超过128k会发生什么

稍微大一点(通常是超过128k)的内存是通过mmap来分配的,小的内存是在heap上通过brk分配的
1、brk是将数据段(.data)的最高地址指针_edata往高地址推;
2、mmap是在进程的虚拟地址空间中(堆和栈中间,称为文件映射区域的地方)找一块空闲的虚拟内存。

MALLOC_CHECK_环境变量

Recent versions of Linux libc (later than 5.4.23) and GNU libc (2.x)
include a malloc implementation which is tunable via environment vari-
ables. When MALLOC_CHECK_ is set, a special (less efficient) implemen-
tation is used which is designed to be tolerant against simple errors,
such as double calls of free() with the same argument, or overruns of a
single byte (off-by-one bugs). Not all such errors can be protected
against, however, and memory leaks can result. If MALLOC_CHECK_ is set
to 0, any detected heap corruption is silently ignored; if set to 1, a
diagnostic is printed on stderr; if set to 2, abort() is called immedi-
ately. This can be useful because otherwise a crash may happen much
later, and the true cause for the problem is then very hard to track
down.

malloc & HeapAlloc

  • malloc-vs-heapalloc
  • malloc依赖于CRT,是C标准。HeapAlloc是Windows API,Windows其他语言亦可使用。
  • malloc在Windows上可能基于HeapAlloc,在Linux下可能基于sbrk和mmap
  • Win16时代,有GlobleAlloc,LocalAlloc
  • Win32时代,有HeapAlloc,可以指定在哪个Heap上分配内存,每个进程有一个default Heap,而malloc不能指定堆
  • malloc是模块可见度,在一个模块内分配的只能在同一个模块内free

VisualAlloc

  • VisualAlloc可以用来声明对某块虚拟内存的占用,或者提交
  • 被提交过的虚拟内存可以再次被提交

Comparing Memory Allocation Methods

The following is a brief comparison of the various memory allocation methods:

  • CoTaskMemAlloc
  • GlobalAlloc
  • HeapAlloc
  • LocalAlloc
  • malloc
  • new
  • VirtualAlloc

Although the GlobalAlloc, LocalAlloc, and HeapAlloc functions ultimately allocate memory from the same heap, each provides a slightly different set of functionality. For example, HeapAlloc can be instructed to raise an exception if memory could not be allocated, a capability not available with LocalAlloc. LocalAlloc supports allocation of handles which permit the underlying memory to be moved by a reallocation without changing the handle value, a capability not available with HeapAlloc.
Starting with 32-bit Windows, GlobalAlloc and LocalAlloc are implemented as wrapper functions that call HeapAlloc using a handle to the process's default heap. Therefore, GlobalAlloc and LocalAlloc have greater overhead than HeapAlloc.
Because the different heap allocators provide distinctive functionality by using different mechanisms, you must free memory with the correct function. For example, memory allocated with HeapAlloc must be freed with HeapFree and not LocalFree or GlobalFree. Memory allocated with GlobalAlloc or LocalAlloc must be queried, validated, and released with the corresponding global or local function.
The VirtualAlloc function allows you to specify additional options for memory allocation. However, its allocations use a page granularity, so using VirtualAlloc can result in higher memory usage.
The malloc function has the disadvantage of being run-time dependent. The new operator has the disadvantage of being compiler dependent and language dependent.
The CoTaskMemAlloc function has the advantage of working well in either C, C++, or Visual Basic. It is also the only way to share memory in a COM-based application, since MIDL uses CoTaskMemAlloc and CoTaskMemFree to marshal memory.

COM内存管理规则

  • The lifetime management of pointers to interfaces is always accomplished through the AddRef and Release methods found on every COM interface. (See "Reference-Counting Rules" below.)
  • The following rules apply to parameters to interface member functions, including the return value, that are not passed "by-value":
    • For in parameters, the caller should allocate and free the memory.
    • The out parameters must be allocated by the callee and freed by the caller using the standard COM memory allocator.
    • The in-out parameters are initially allocated by the caller, then freed and re-allocated by the callee if necessary. As with out parameters, the caller is responsible for freeing the final returned value. The standard COM memory allocator must be used.
  • If a function returns a failure code, then in general the caller has no way to clean up the out or in-out parameters. This leads to a few additional rules:
    • In error returns, out parameters must always be reliably set to a value that will be cleaned up without any action on the caller's part.
    • Further, it is the case that all out pointer parameters (including pointer members of a caller-allocate callee-fill structure) must explicitly be set to NULL. The most straightforward way to ensure this is (in part) to set these values to NULL on function entry.
    • In error returns, all in-out parameters must either be left alone by the callee (and thus remaining at the value to which it was initialized by the caller; if the caller didn't initialize it, then it's an out parameter, not an in-out parameter) or be explicitly set as in the out parameter error return case.

COM引用计数规则

  • Rule 1: AddRef must be called for every new copy of an interface pointer, and Release called for every destruction of an interface pointer, except where subsequent rules explicitly permit otherwise.
    The following rules call out common nonexceptions to Rule 1.
    • Rule 1a: In-out-parameters to functions. The caller must AddRef the actual parameter, since it will be Released by the callee when the out-value is stored on top of it.
    • Rule 1b: Fetching a global variable. The local copy of the interface pointer fetched from an existing copy of the pointer in a global variable must be independently reference counted, because called functions might destroy the copy in the global while the local copy is still alive.
    • Rule 1c: New pointers synthesized out of "thin air." A function that synthesizes an interface pointer using special internal knowledge, rather than obtaining it from some other source, must do an initial AddRef on the newly synthesized pointer. Important examples of such routines include instance creation routines, implementations of IUnknown::QueryInterface, and so on.
    • Rule 1d: Returning a copy of an internally stored pointer. After the pointer has been returned, the callee has no idea how its lifetime relates to that of the internally stored copy of the pointer. Thus, the callee must call AddRef on the pointer copy before returning it.
  • Rule 2: Special knowledge on the part of a piece of code of the relationships of the beginnings and the endings of the lifetimes of two or more copies of an interface pointer can allow AddRef/Release pairs to be omitted.
    • From a COM client's perspective, reference-counting is always a per-interface concept. Clients should never assume that an object uses the same reference count for all interfaces.
    • The return values of AddRef and Release should not be relied upon, and should be used only for debugging purposes.
    • Pointer stability; see details in the OLE Help file under "Reference-Counting Rules," subsection "Stabilizing the this Pointer and Keeping it Valid."

@fanfeilong
Copy link
Author

MessageLoop

@fanfeilong
Copy link
Author

AsyncIO

  • AsyncIOOperation
    • Cancel
  • AsyncIOOperationManager
  • IOObject
    • AsyncRead,Return Operation to Cancel
    • AsyncWrite,Return Operation to Cancel

@fanfeilong
Copy link
Author

Lock & Reentry

  • Lock
    • Lock, check threadid, enable or disable reentry
    • Unlock, check threadid, enable or disable reentry

@fanfeilong
Copy link
Author

StateMachine

  • define states
  • function dowork by switch states
  • in branch methods or events
    • check state
    • do work
    • change state
    • reentry dowork function

@fanfeilong
Copy link
Author

RefBase

  • MultiThread
    • AddRef with atomic increase
    • Release with atomic decease
      check reference count to delete
  • SingleThread
    • AddRef with normal increase
    • Release with normal decease
      check refcount to delete

@fanfeilong
Copy link
Author

AsyncEvent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment