This is a new class and there is no textbook. We will post relevant reading material. The assignments are as follows, and will be released one week before they are due:
- HW1: Bandits and Contextual Bandits
- HW2: Value and Policy Iteration
- HW3: Deep Q-Learning
- HW4: Policy Gradients
- HW5: Reward Function Design
- HW6: Sim-to-Real
- HW7: Learning from Demonstrations
- HW8: Model-based Learning
- HW9: Offline Reinforcement Learning (Optional)
We'll be using Gradescope for problem set submission and grading. Each problem set is weighted equally. The login code for this class will be posted on Piazza -- please create an account and add yourself to this class using that code only if you are taking this class for credit. Grading will rely on review of the submitted code and writeup. More details will be provided when assignments are released. Assignments are due one week after the assignment release. Late assignment submissions will be penalized 10% every 24 hours.
Collaboration is encouraged, but the work you submit for assignments is expected to be entirely your own. That is, the writing and code must be yours, and you must fully understand everything that you hand in. Discussing the details of how to solve a problem is fine, but you must write the solution yourself. To avoid plagiarizing, you shouldn't be looking at someone else's solution while you write down your own. If you collaborated significantly (use your own discretion for "significantly") on a problem, list the people you collaborated with next to your solution.
The final project will be your opportunity to explore some of the topics introduced in the course more deeply.
- Your project should be related to the course.
- You are welcome to work on the project in a team (3 people at most).
|Project midterm report
|Project final report