Skip to content

Intro

  • We rely on causations to make things happen
  • Why certain thing failed ?
  • Observation: X% users churned last month
  • Causation: because of Y (ex: pricing, features)
  • Action: change Y to prevent churn
  • Problems with AB testing ?
    • If treatment users have new feature, and react with control group that does not have that feature
    • Marketplaces: (Uber, DoorDash, or Airbnb) resources shared, the treatment can affect the control users by changing the supply and demand in the market.
    • Because of such interference between control and treatment group, it is hard to make causal claims about the treatment effect.
  • Problems with observational data?

    • Selection bias:
      • Observational data: exposed and unexposed user engagement metrics
      • Causal effect (what we want): Engagement of exposed users – Engagement of initially exposed users
      • selection bias (observation – causation) : Engagement of exposed users had they not been exposed – engagement of unexposed users
        • positive bias: treatment is better off anyway (creates illusory causation)
        • negative bias: treatment is worse off to begin with. (hides true causation)
    • Example:
      • FB launched new AI system to help tackle harmful content.
      • Metrics to consider as: Impact is on user engagement
      • Get exposure and compare with engagement: User retention, and other metrics
      • Reasons:
        • exposure may differ from user to user
        • engagement levels might differ based on user profile.
  • How to make causal inference from observations (a/ selection bias) ? Alternative Methods:

  • Regression
  • Matching

Regression:

  • Model: lm(engagement ~ exposure + age + country + education + other fields , data=harmful)
  • make sure to include relevant controls.
  • Interpret partial slope: how much being exposed to harmful content affects engagement when control variables are held at any constant values.
  • Only works on linear relationship features
  • Visualize: partial regression plot
    • Confounders: An extraneous variable whose presence affects the variables being studied so that the results do not reflect the actual relationship between the variables under study.
    • Question: As mentioned, confounders will be held at constant values, how to find such constant values ?
    • The short answer: it doesn't matter much in practice, so don't manually assign constant values to cofounders, the estimator (model fit) should be able to take account of their influences.
    • \[Engagement_0 = \hat\beta_0 + \hat\beta_1(exposure = 0) + \hat\beta_2 {controls} \]
    • \[Engagement_1 = \hat\beta_0 + \hat\beta_1(exposure = 1) + \hat\beta_2 {controls} \]
    • $$\hat\beta_2 {controls} $$ in both above equations can be 0
    • $$\hat\beta_1 = \hat{engagement}_1 - \hat{engagement}_0 $$ What are few pitfalls of using regression:
    • Omitted-variable bias: relevant variables left out
      • Solution: understand all factors , how variables are connected.
          1. Function form of the connections, example, while using linear regression to control for confounders, then it can only control for confounders that have linear relationship. For unknown or other kinds of relationship, SLR does not do the job
          1. A Causal structure for defining how variables are connected. For example,
          2. Directed Acyclic Graph (DAG):
            • nodes: random variables
            • edges: (causal) relationships. casual_inference_using_regression.excalidraw.svg
  • Mistake 1: Mediators (middle of a chain):
    • controlling for # of shared posts ⇾ social circle size no longer has influence on exposure to harmful content. (No. Of posts shared with you)
    • i : we can look at users that get tons of shared posts, then we may conclude: users with many friends have higher risk of exposure to harmful content.
    • If we cannot control mediators, we might lose generating relationships in the data.
  • Mistake 2: Colliders (common effect of other variables):
    • controlling for time spent late at night ⇾ “spurious correlation” between popularity and insomnia.
    • Example: users who are super engagement late at night, chances are many of those users have lot of friends, so they have the lot of people to talk to, those users might have insomnia that cannot fall asleep. If through data, we can see that somewhat bigger amount of such insomnia traffic is effecting our experimentation. That might be a spurious correlation that does not actually exist in reality.
    • It is hard to know what variables are there that we should control for, let along how they are connected. So using regression blindly can be really dangerous.

Matching:

  • Matching can deal with any sort of function forms (use over linear regression)
  • how?
    • A short answer: for each treated unit like an exposed user, we find one or several untreated units, in this case like unexposed users. Then match on characteristics.
    • Assumptions: Users are only different on whether they are getting treated or not, but not different otherwise? Then we can attribute the difference in their outcomes, in this case, engagement, just to treatment and not to other things. That is the main idea of matching.
    • How to match untreated (unexposed) and treated (exposed) users ?
      • Exact Matching:

        - If we have two dataset with exposed and unexposed users, we can find similar categorical values for matching. Ex: age, location, and education qualification.

        • Disadvantage:
          • Curse of dimensionality: go through matching many times.
          • Lack of common support: hard to find enough exact matches from lesser data. This might be really problematic for companies with smaller datasets.
    • Any method to overcome such disadvantages?
      • Propensity Score Matching:
        • Idea: instead of matching users directly based on their characteristics, we build a model that takes those characteristics. A model predict the possibility of them getting exposed to harmful content. This kind of predicted probability is called propensity score for each user. And after users are matched, we can compare their engagement levels to see if there is any difference that can attribute to the treatment.
          • Challenge 1:
            • we need a good model that accurately predicts propensity to receive treatment.
          • Challenge 2:
            • we require a good algorithm that can quickly find similar users in terms of propensity.
            • Greedy vs optimal, 1-to-1 vs 1-to-many, caliper radius etc.
          • Both the challenges are pretty demanding and if not then we cannot find comparable groups that can draw valid conclusions.
  • Another selection bias example: Say you are designing ad campaign for HelloFresh, and you want to see people who click on their ads are more likely to buy food because of this ad or not.
  • Because of selection bias, it can be difficult to compare conversion rate between users who are the ad clickers and users who are not.
  • i: Because just like news feed, ads are also personalized, users shown these ads may have higher interests in cooking, so they will be more likely to buy food from you with or without clicking on this ad.
  • To get a better idea, we can use propensity score matching.
  • Firstly, we use characteristics to predict, in this case, prediction is more complex as we need to predict that how likely it is that a user is shown this ad
  • Secondly, then after being shown the ad, how likely is the user will click on it ?
  • We can get propensity scores for clickers and non-clickers and compare their rate of conversion. There are many use-cases as such.

Impact of COVID-19 on economy (how to measure)