Syllabus for 06-642 Data Science and Machine Learning in Chemical Engineering

John Kitchin Spring 2024

Read the Syllabus

Course details

Instructor: Prof. John Kitchin (jkitchin@andrew.cmu.edu)

Teaching assistants:

Office hours by appointment.

Course location

Class will be on Monday and Wednesday, 2-3:50pm via Zoom (Find the links on Canvas). Attendance is expected.

We will not be using a textbook. The course notes and materials will be provided through this site, and links to them in Canvas.

Each class will be organized in a Canvas module. You should go to the Module of the day. Everything you need should be there.

Please consider using NameCoach to upload a recording of how your name is pronounced, and to indicate your pronouns.

Use Discord for questions outside of class.

Course description

This class will examine topics related to data science and machine learning in chemical engineering. This may include topics in data visualization and modeling, differentiable programming, and the use of data and models to design experiments. The course will emphasize computational implementations of these topics using Python, with applications in chemical engineering. Students will need to be comfortable with scientific programming using Python. Students who have take 06-623 and/or 06-625 should have the skills needed in this class.

Course objectives

After completing this course you will able to use Python to solve problems involving the:

Reading data from files and urls for analysis

Getting data is a foundational skill in data science. Data comes in many formats and from many sources including files and internet services. We will learn about some of these and how to use them.

Basic use of Python, numpy, scipy and matplotlib in data analysis

Basic analysis like filtering, regression, etc. is possible with basic Python. We will learn about the limitations of this approach and why libraries like Pandas and scikit-learn exist.

Use of Pandas for data analysis and visualization

Pandas is a multipurpose library for interacting with tabular data. It has many features to streamline data science tasks.

Use of scikit-learn for regression analysis and machine learning

scikit-learn is a library with many interfaces to model building ranging from linear regression, neural networks, Gaussian Process Regression and decision tree/random forest models. We will learn how to apply this library to applications of interest.

Grading

Your grade in this class will be based on assignments, participation and a project. Assignments will typically be a single problem, and several of them will be assigned each week.

Each assignment will receive two grades, one for the technical work, and one for the presentation. The technical grade will account for 2/3 of the grade and the presentation will be the other 1/3.

Each lecture will have a participation component. This will usually be an assignment in canvas, and may include reflection, quizzes, etc. These will be due within 24 hours of the lecture date.

We will have a project. The project will have a written proposal, and report at the end of the mini.

We are not planning to have any exams. Instead, the problems we would normally have in exams will be spread out over the semester.

Category

weight

Assignments

30%

Participation

20%

Project

50%

Grading criteria

You are transitioning into a young professional at this point. That means assignments are done professionally too. In addition to the technical correctness of your work, we will also be assessing the professionalism with which it is presented. Each assignment will show the rubric it will be graded with at the top of the file.

There will be a straight scale (no curve) so you will always know exactly what your grade is at all times. Each problem will be graded considering the approach used, the correctness of the answer, the neatness and quality of presentation, etc… Each category of the rubric will be given a letter grade that indicates your level of performance in that category.

“A” work has the following characteristics: The correct approach is used and the problem is set up correctly. The work is not over-simplified and it is easy to see it is done correctly. Any assumptions made were stated and justified. The answers are correct or only the most trivial errors are present, and were identified by the student. All of the correct units were used. The presentation is complete, clear, logical, neat and in order. Error analysis was performed if appropriate. Any figures used have properly labeled axes with units, and a legend if there is more than one curve. Essentially everything that should have been done was done and done correctly. This is the kind of work an employer wants their employees to do, and the kind of work you will be promoted for doing. You should be proud of this work.

“B” work is deficient in one or more of the properties of “A” work. It might be basically right, but essential details are missing such as units, or the presentation is sloppy. You will get by with this kind of work, but you should not expect to be praised for it.

“C” quality work is deficient in more than two of the properties of “A” work. You would probably not get fired for this kind of work, but you may be notified you need to improve and you should not expect any kind of promotion. This is the bare minimum of expected performance.

“D” work is not considered acceptable performance. Repeat offenses could lead to the loss of your job.

“R” work is totally unacceptable performance. You will be fired.

The letter grades will map onto these grade multipliers:

grade

multiplier

A

1.0

A/B

0.9

B

0.8

B/C

0.7

C

0.6

D

0.4

R

0

That means if an assignment is worth 10 points, and you get a B for the technical work and C for the presentation you will earn the following points (rounded for clarity):

Technical: 2/3 * 10 * 0.8 = 5.33 Presentation: 1/3 * 10 * 0.6 = 2

For a total of 7.33 points. That is between a B and a B/C for this assignment.

At the end of the semester your final grade will be determined by the total points earned weighted by each category divided by the total possible points.

If you earn 90% or more of the points you will get an A. If you earn more than 70% of the points, but less than 90% you will get a B. If you earn more than 50% of the points, but less than 70% you will get a C.

Late policy

The deadlines on assignments are highly recommended times to turn assignments in by, because all of us need a schedule to do our work on, and if you don’t follow them you will fall behind. I am not going to worry if you turn them in late by a few minutes or even hours. If you turn your assignments in late, Canvas will mark them as late, but there is no penalty for that. At least up until we grade your assignment you can even turn in your assignment again.

Once the solutions are posted, which usually happens after we grade them, then we will not accept late submissions. So there is some flexibility in being late, but it is not infinitely flexible, and the longer you wait, the more likely it is the solution will be posted and your assignment will get a zero for not being turned in.

Of course, if there are extenuating circumstances like an illness, or emergency, you should reach out to me to find a solution, but otherwise, just do the assignments and turn them in when you finish them.

Take care of yourself

This semester is going to be a challenge. We are still working through a pandemic among other stressful issues in the country. These are stressful times. I will do my best to keep this class from adding unnecessarily to that.

Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:

CaPS: 412-268-2922

Resolve Crisis Network: 888-796-8226

If the situation is life threatening, call the police:

       On campus: CMU Police: 412-268-2323

       Off campus: 911

If you have questions about this or your coursework, please let me know.

Diversity, equity and inclusion at CMU

We are diverse in many ways, and this diversity is fundamental to building and maintaining an equitable and inclusive campus community. Diversity can refer to multiple ways that we identify ourselves, including but not limited to race, color, national origin, language, sex, disability, age, sexual orientation, gender identity, religion, creed, ancestry, belief, veteran status, or genetic information. Each of these diverse identities shape the perspectives our students, faculty, and staff bring to our campus. We at CMU will work to promote diversity, equity, and inclusion not only because diversity fuels excellence and innovation, but because we want to pursue justice. We acknowledge our imperfections while we also fully commit to the work, inside and outside of our classrooms, of building and sustaining a campus community that increasingly embraces these core values. Each of us is responsible for creating a safer, more inclusive environment.

Unfortunately, incidents of bias or discrimination do occur, whether intentional or unintentional. They contribute to creating an unwelcoming environment for individuals and groups at the university. Therefore, the university encourages anyone who experiences or observes unfair or hostile treatment on the basis of identity to speak out for justice and support, within the moment of the incident or after the incident has passed. Anyone can share these experiences using the following resources: If you are comfortable with it, you may share them with me Center for Student Diversity and Inclusion: csdi@andrew.cmu.edu, (412) 268-2150 Report-It online anonymous reporting platform: www.reportit.net (username: tartans password: plaid). All reports will be documented and deliberated to determine if there should be any following actions. Regardless of incident type, the university will use all shared experiences to transform our campus climate to be more equitable and just.

Academic honesty

All work is expected to be your original work. You may work with class members to solve the homework problems, but you must turn in your own solutions. It is cheating to turn in someone else’s work as your own. If you use code from the internet or the course notes, you should note this in your solution. Duplicated assignments (e.g. two students who turn in the same work) will receive zeros and a warning. Repeat offenses will be reported as academic dishonesty.

When in doubt, review this website: https://www.cmu.edu/policies/student-and-student-life/academic-integrity.html, and ask if anything is unclear /before/ you get in trouble.

Here are some examples of acceptable collaboration (adapted from https://www.cmu.edu/teaching/designteach/syllabus/checklist/integritypolicy.html) :

  • Clarifying ambiguities or vague points in class handouts, textbooks, or lectures.

  • Discussing or explaining the general class material.

  • Providing assistance with Python, in using Jupyter notebooks, or with editing, debugging, and Python tools.

  • Discussing the code that we give out on the assignment.

  • Discussing the assignments to better understand them.

  • Getting help from anyone concerning programming issues which are clearly more general than the specific assignment (e.g., what does a particular error message mean?).

Here are some examples of unacceptable collaborations/activities. As a general rule, if you do not understand what you are handing in, you are probably cheating. If you have given somebody the answer, you are probably cheating. In order to help you draw the line, here are some examples of clear cases of cheating:

  • Copying (program or assignment) files from another person or source, including retyping their files, changing variable names, copying code without explicit citation from previously published works, etc.

  • Allowing someone else to copy your code or written assignment, either in draft or final form.

  • Getting help from someone whom you do not acknowledge on your solution.

  • Copying from another student during an exam, quiz, or midterm. This includes receiving exam-related information from a student who has already taken the exam.

  • Writing, using, or submitting a program that attempts to alter or erase grading information or otherwise compromise security.

  • Inappropriately obtaining course information from instructors and TAs.

  • Looking at someone else’s files containing draft solutions, even if the file permissions are incorrectly set to allow it.

  • Receiving help from students who have taken the course in previous years.

  • Lying to course staff.

  • Copying on quizzes or exams.

  • Reading the current solution (handed out) if you will be handing in the current assignment late.

  • Signing someone into the attendance sheet

  • Taking quizzes/exams somewhere other than the designated location without prior authorization.

  • Not informing course staff of academic integrity violations is a form of an integrity violation.

Please note that in accord with the university’s policy you must acknowledge any collaboration or assistance that you receive on work that is to be graded: so when you turn in a homework assignments, please include a section that says either:

  1. “I worked alone on this assignment.”, or

  2. “I worked with __________ on this assignment.” and/or

  3. “I received assistance from _________ on this assignment.”

Accommodations

If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.

Religious holidays

We will accommodate religious holidays when possible. If your work will be affected by a religious holiday, you must inform Professor Kitchin as early as possible to work out an accommodation in advance.

Resources

Student Academic Resource Center: SASC Program Description.docx SASC focuses on creating spaces for students to engage in their coursework and approach learning through a variety of group and individual options.