Sunny's Tech Blog

과제4에 대한 수업

Recursive structures

스크린샷 2023-06-10 오후 3.39.44.png

ML에서는 exp으로 recursive tructure을 선언해서 abstract tree를 만들었었음.
eval 함수는 Expression을 인자로 받아서 integer로 리턴했었음

→ 이걸 과제에 힌트가 되도록 조금씩 바꿔볼거임

Change how we do this

exp를 받아서 int를 리턴했었는데 지금은 exp → exp로 해보자
- general propose에는 multiple kind of result가 있을 수 있음. int, clousre, bool 등등 타입을 eval에서도 받을 수 있게 expression type으로 받을 수있도록 하자!
- any exp가 아니라 value expression을 받아서 더이상 처리가 되지 않은 타입으로 리턴

//old_eval.cc 
//+ string함수랑 is(Functions for check variants.)함수 추가됨, toString함수도 추가됨
// eval: expr -> int
**int** eval(Expr e) {
    return std::visit(overload {
        [&](Const& i) { return i.val;},
        [&](box<struct Add>& a) {
          int i1 = eval(a->e1);
          int i2 = eval(a->e2);
          return i1+i2;
        },
        [&](box<struct Multiply>& m) {
          int e1 = eval(m->e1);
	        int e2 = eval(m->e2);
					return e1*e2;
        }, 
      }, e);

// eval: expr -> expr
Expr eval(Expr e) {
    return std::visit(overload {
        [&](Const& i) { return e;},
        [&](box<struct Add>& a) {
          Expr e1 = eval(a->e1);
          Expr e2 = eval(a->e2);
          if (is<Const>(e1) && is<Const>(e2)) { // 무조건 constant여야 하기 때문에 is로 체크한 후에 get해서 가져옴
            Const i1 = std::get<Const>(e1);
            Const i2 = std::get<Const>(e2);
            Expr res(Const(i1.val+i2.val));
            return res;
          } else {
            throw std::runtime_error("Unexpected types for sub-expressions of Add");
          }
        },
        [&](box<struct Multiply>& m) {
          Expr e1 = eval(m->e1);
          Expr e2 = eval(m->e2);
          if (is<Const>(e1) && is<Const>(e2)) {
            Const i1 = std::get<Const>(e1);
            Const i2 = std::get<Const>(e2);
            Expr res(Const(i1.val*i2.val));
            return res;
          } else {
            throw std::runtime_error("Unexpected types for sub-expressions of Multiply");
          }
        }, 
      }, e);
}

new_eval
- base case시 return entire expression ex) Const(17)
- recursive case
  - check variant: is함수로 체크
  - extract data: get으로 정확한 타입으로 가져옴
  - return Expr: Expr을 만들어서 리턴

Using new eval

Expr e = Add(Const(-3), Const(2));
Expr res = eval(e);
Const ires = std::get<Const>(res);
std::cout << "Add(Const(-3), Const(2)):" << std::string(ires) << std::endl;

Typical workflow

compile, evaluate가 어떻게 되는지 한번 보쟈

스크린샷 2023-06-10 오후 3.48.28.png

parsing
- expression이 grammer에 문제가 있다면 파싱 에러가 날 거임
type checking
- 파싱이 된다면 그다음에는 syntax tree를 만들거임
- 타입체킹을 해야하는데 만약에 에러가 난다면(+인데 bool을 보내면?? ) 에러가 날거임
rest of implementation
- 타입에러가 안되면 execution이 될 거임, ast를 받아서 result를 만들기위해 실행할 것

Interpreter or compiler

rest of implementation은 AST(abstract syntax tree)를 받아서 result를 만들기위해 프로그램을 실행할 것
PL B를 구현하는 접근방식
- PL B는 target language, 위의 예시는 ML이 될 거임, 과제에서는 MUPL
- interpreter(evaluator, executor) in A
  - evaluate하기 위해서 interpreter를 만들어야하는데 얘는 다른 언어 A로 작성되어야 함
  - B를 받아서 B를 생성
    - new eval은 expression을 받아서 const expression을 리턴
- compiler(translator) in A to C
  - == compiler에 c언어(머신러닝 언어, 어셈블리, 바이트코드)로 바꿈
  - translation must preserve meaning(equivalence)
Language A
- metaLanguage
- implement compiler or interpretor
- A와 B를 잘 구별해야 함

Reality more complicated

evaluation(interpreter) and translation(compiler)는 option이지만 다부분 모두 가지고 있음
- 현실에서 compiler, interpreter은 구별하기 어려움. 왜냐면 같이 합쳐져 있는 경우가 있기 때문

ex)

java 컴파일러: 자바 코드→ 바이트코드
바이트코드를 virtual machine(interpretor)을 이용해서 만듦
바이트코드가 자주 실행된다면 자바 컴파일러는 그 부분을 바이트코드로 바꿔서 실행됨

ex)

칩도 interpreter for binary
fetch opcode and execute in hardware … 같은 동작을 하기 때문

Sermon

interpreter vs compiler vs combination은 particular language implementation이지 language definition가 아님

~~compiled language. interpreted language 이런 건 없다는 말임~~

rest of implemnation을 interpreter로 해보자!

Skipping parsing

PL A에서 PL B를 구현할 때 parsing을 생략
- B programmer가 PL A에서 AST를 바로 쓴다는 소리
- embed B programs as tree in A

스크린샷 2023-06-10 오후 4.03.35.png

PL B(arithmetic language)이고 A는 c++인데 expr object를 c++ 생성자로 만들었음
파싱을 스킵하고 Parse tree는 A에서 바로 만들었다라고 가정

→ 사실 B에서 ast에서 만들지만 A(c++)에서 바로 생성되게!

What we know

B를 c++ struct를 이용해서 abstract syntax language를 정의
B program을 바로 c++ 생성자를 이용해서 c++로 작성할 거임
arithmetic 언어라고 부르기 보다는 b를 MUPL이라고 부를 거임
- make up programming language
중요한 점
- interpreter는 주어진 input이 legal AST for B라고 가정
- interpreter는 recursive result가 맞는 type인지 꼭 체크해야 함 아니라면 exception

Legal ASTs

AST의 syntax는 c++ compiler가 확인해줄 것
eval function에 illegal AST가 올 수 없다고 가정

스크린샷 2023-06-10 오후 4.09.22.png

Interpreter result

interpreter의 결과는 value가 되어야 함
- evaluate itself인 게 value라고 했으니까 더 evaluate될 게 없거나 evaluate itself인 애 여야 함
- MUPL에서는 int, pair, closures, aunit이 value

Example

ast node에 두개의 struct인 Bool, ifThenElse 추가해보자

Q. legal AST라고 가정했을 때 evalauation 시 어떤 걸 체크?

A. ex) add boolean

detect를 무조건해서 error message를 보여줘야 함
특정한 값이 필요할 때 recursive result를 체크

//Bool이랑 ifThenElse가 ast node에 추가됨
Expr eval(Expr e) {
    return std::visit(overload {
        [&](Const& i) { return e;},
        [&](Bool& i) { return e;}, // bool은 value니까 그냥 자기를 리턴
        [&](box<struct Add>& a) {
          Expr e1 = eval(a->e1);
          Expr e2 = eval(a->e2);
          if (is<Const>(e1) && is<Const>(e2)) {
            Const i1 = std::get<Const>(e1);
            Const i2 = std::get<Const>(e2);
            Expr res(Const(i1.val+i2.val));
            return res;
          } else {
            throw std::runtime_error("Unexpected types for sub-expressions of Add");
          }
        },
        [&](box<struct Multiply>& m) {
          Expr e1 = eval(m->e1);
          Expr e2 = eval(m->e2);
          if (is<Const>(e1) && is<Const>(e2)) {
            Const i1 = std::get<Const>(e1);
            Const i2 = std::get<Const>(e2);
            Expr res(Const(i1.val*i2.val));
            return res;
          } else {
            throw std::runtime_error("Unexpected types for sub-expressions of Multiply");
          }
        }, 
        [&](box<struct IfThenElse>& i) {
          Expr e1 = eval(i->e1);
          if (is<Bool>(e1)) { // bool인지 체크 아니라면 exception
            Bool b = std::get<Bool>(e1);
            if (b.val) {
                Expr e2 = eval(i->e2); // true일 때만 e2를 evaluate
                return e2;
            } else {
                Expr e3 = eval(i->e3); // false면 e3을 evaluate
                return e3;
            }
          } else {
            throw std::runtime_error("Unexpected types for condition of IfThenElse");
          }
        },
      }, e);
}

Dealing with varaible

이 때까지의 interpreter은 variable가 없었음
- ex) let exprewssion, function with argument 등등 같은 것들
- MUPL은 let, function, var 등등 다 가지고 있음
environment
- c++ string을 variable(key)에 value(Expr)를 map
- eval 시 environment를 받음
  - environment를 인자로 보냄
  - 대부분 subexpression은 same enviroment이지만 let expression은 body는 larger environment에서 evaluate해야함
  - variable expression은 environment에서 variable을 Look up
eval
- empty environment를 생성
- eval_under_env recursive function을 부르고 이 때 인자로 env를 전달

스크린샷 2023-06-10 오후 4.31.19.png

ex)

function이나 clousre이 들어갈 수 있는데 closure은 value지만 function은 value가 아니기에 clousure로 리턴해야 함
foo(a)라면 얘는 fun(”foo”, a, code)이렇게 되고 얘는 closure을 리턴
call(e1, e2)라면 e1는 클로저로 리턴하고 extend envionment in arugent mapping하고나서 closure body를 under the enviroment

The best part

과제에서 가장 흥미로운 부분이 first-class clousre을 구현하고 closure을 실행하는 거래

closure은 lexical scope을 지원할 예정

Higher order function

closure을 구현하고 얘를 인자로 보낼려면 env과 코드를 저장해야 함
- string과 expr으로 mapping된 env를 가짐
- 실제 function 도 가지고 있음
function을 evaluate
- function은 value가 아니므로 closure을 리턴
- function과 current enviroment(evaluate to closure)을 가진 closure을 리턴

Function calls

call(e1,e2)

과제에는 call라는 structure 존재 contain two subexpression

eval이 call을 만나면 current environment에서 e1을 evaluate해서 closure을 받아야함
- e1이 closure에서 evaluate되지 않으면 std runtime error를 발생시켜야함
current enviroment에서 e2를 evaluate
- 과제에서의 모든 함수는 싱글 인자(e2)를 받고 있음
body를 evalaute 시 closure’s enviroment에서 evalaute
- function argument name과 arugment value를 map
- for recursion을 위해 function name과 closure을 map
lexical scope으로 계산해줄 것

Expr clos = eval_under_env(call->funExpr, env); // closure
            Expr argument = eval_under_env(call->actual, env); //인자

            if (is<Closure>(clos)) {
                box<struct Closure> closure = std::get<box<struct Closure>>(clos);
                Fun f = closure-> f;
                std::map<string, Expr> funEnv = makeNewEnvFrom(closure->env);
                if (f.funName != "") {
                    funEnv.insert_or_assign(f.funName, closure);
                }   
                funEnv.insert_or_assign(f.argName, argument);

                Expr result = eval_under_env(f.body, funEnv);
                return result;
            } else {
                throw std::runtime_error("Unexpected types for sub-expressions of Call");
            }

is that expensive?

eval이 fun을 보면 closure을 만들고 그 때 env를 만들어야 하는데 이게 비쌀 수 있겠지만 MUPL같은 거는 문제가 안되겠지만 real programming language에는 어떨까?

build closure
- 시간 거의 들지 않음
- two field만 가지기 때문
space to sore closure
- might be large
- closure 만드는데는 시간은 적게 들지만 space는 많이 필요해서 런타임 넘는 경우도 존재

→ 실질적으로는 closure 생성 시 다 저장하지 않고 free variable만 저장함

free variable
- not defined in function but use in function
- variable that occur not counting shadowed uses of the same varaible name
- should be capture

스크린샷 2023-06-10 오후 4.50.34.png

Computing free variables

Q. interpreter가 closure을 만들 때마다 code body를 분석해야 하는 건가?

A. no evaluation 전에 free varaible을 찾아서 store!

런타임에 뭐가 free variable이고 아닌지는 모든 코드를 분석해야하기 때문에 어려운 편
컴파일 타임에 컴파일러가 computation free varaible해서 store하는 거는 어렵지 않은 편

→ naive store entire environment approach보다 building a closure은 더 많은 시간이 걸리지만 space는 적게 씀

free variable 수에 비례, 다른 optimization도 가능

Compiling higher-order functions

Q. 과제에서는 eval함수를 interpretator로 하게 될 건데 lexical scope, closure을 compiler based language에서 한다면 컴파일은 low level code로 바꿀 때 lexical scope이나 closure을 지원하지 않기 때문에 이걸 어떻게 유지할까?

인터프리터는 keep the env in map하면 되지만 컴파일러의 경우, 어셈블리 같은 low level code는 클로져가 없기에 env를 rely할 수 없음

compile function은 extra explicit argument인 enviroment를 받는 regular function을 생성
- 모든 free variable은 enviroment argument를 이용해서 lookup해서 사용

// ML 코드
fun foo(x) = x+y

//어셈블리가 c++인 걸로 가정
// lexical scope, closure semantic을 이렇게 만족
int foo(int x, std::map<std::string, Value> **env**) { 
	return x + convertToInt( **env.loopup("y")**);
}

Macros, updated

우리가 구현해야하는 언어의 접근 방식
- implement PL B in PL A
- skip parsing by writing PL B program directly in terms of PL A constructor
- interpreter written in A recursively evaluate

→ 이제 할거는 macro로 language의 syntax를 확장시키고 사용해볼 거야!

What is a macro

#define ADD(a,b) (a+b)
int main() {
int a = 10;
int b = 20;
return ADD(a,b);
}

macro definition
- how to transform some new syntax into different syntax in the source language
- 아주 간단한 프로그래밍 언어
- c++에서의 매크로에서는 premitive임 not complicate semantic
- one syntax → another macro syntax in c code
  - Add(a,b)를 c++ syntax인 (a+b)로 변경
macro도 syntatic sugar라고 볼 수 있음(code가 바뀌어서 evaluate되니까!)
Macro system
- macro를 정의하는 programming language임
- 매크로 개념을 알아야 과제를 할 수 있지만 디테일까지는 알 필요는 없대
- syntax는 c와 아주 비슷함
- syntax → target source code
macro expansion
- process of rewriting the syntax for each macro use
- compile도 되기 전에 바뀜!!

#define ADD(a,b) (a+b)
int main() {
int a = 10;
int b = 20;
return ADD(a,b);
}

int main() {
int a = 10;
int b = 20;
return (a+b);
}

cf. 매크로 vs 인라인? 인라인이 대부분 무조건 좋대, 인라인으로 못하는 거만 매크로로 하는 게 좋대, 그거만의 syntax가 존재한다고 하네에.

ADD(10,20) * 3 ⇒ 10+20*3이런 실수가 매크로에서 일어나니까 인라인이 좋대

실행 중간에 로그를 찍는 것도 매크로만 할 수 있대

Put it together

PL A(c++) function을 PL B abstract syntax를 생성하는 PL B macro라고 볼 수 있음!!
AST MUPL을 바로 C++에서 하기에 macro는 extend syntax of language and generate target language source code(MUPL) when compile before

c++ function → MUPL language → define MUPL function

⇒ macros처럼 동작하는 거 아닐까? 런하기 전에 apply되니까!!

way to create MUPL by calling constructor AST node인데 이렇게 생각하면 c++ function은 return MUPL language이고 use defining MUPL program

PL B program은 macro를 사용
interpreter나 struct definition은 바뀌는 게 없음
programiming ideiom

//macro인데 addThree = MUPL ast를 만드는데 1+2+3을 하는 ast를 리턴
Expr addThree(int a, int b, int c) {
	Expr e = Add(Const(a), Add(Const(b), Const(c))); // c++함수지만 MUPL로 바꿔줌
	return e;
}

int main() {
	Expr e4 = addThree(1, 2, 3);
	res = eval (4);
	ires = std::get<Const>(res);
	std::cout << toString(e4) <<": " << std::string(ires)<< std::endl; // const(6)
};
/*
Add(Const(1), Add(Const(2), Const(3))): Const(6)
*/

//addFromTo정의시 recursive를 쓰지 않는다면??
Expr addFromTo(int from, int to) {
	Add add(from,0);
	add.e2 = Add(from+1, 0);
	add = add.e2;
	add.e2 = Add(from+2, 0);
	...
}

//recursive를 쓴다면??
Expr addFromTo(int from, int to) {
    if(to==from+1) {
        return Add(Const(from), Const(to));
    } else if(to>from+1) {
        return Add(Const(from), addFromTo(from+1, to));
    } else { // from >= to
        throw std::runtime_error("from >= to");
    }
}

int main() {
	Expr e5 = addFromTo(0,10);
	res = eval (e5);
	ires = std::get<Const>(res);
	std::cout << toString(e5) <<": " << std::string(ires)<< std::endl; //const 55가 될 것
}
/*
Add(Const(0), Add(Const(1), Add(Const(2), Add(Const(3), Add(Const(4), Add(Const(5), 
Add(Const(6), Add(Const(7), Add(Const(8), Add(Const(9), Const(10))))))))))): Const(55)
*/

Hygiene issue(cleaness)

#define RUN_FOREVER(code) while (0) {\
int abc;\
// do somthing with the varaible\
code;\
}

hygiene issue
- shadowing variable when using local variable in macros
- 변수 이름에 PREFIX_abc라고 붙여서 헷갈리지 않도록 하는 방법이 있지만 그래도 어렵대!!(그래도 변수이름이 같을 수 있을 거임)
  
  → 인라인을 쓰자!
- 하이제닉이려면 충돌 시 매크로 시스템이 변수이름을 automatic으로 바꿔줘야 함
- c++은 위처럼 변수 이름이 충돌나기 때문에 macro가 hygienic이 아님
- gcc는 하이제닉이래