Randomize labels for experiment using Python
Purpose
The purpose of this script is to:
- randomize a list of participants into treatment and control groups
- create a label for each participant that can be affixed to his or her experimental materials
- create a master table that links each participant to his or her assignment
Rationale
The script may be used for any randomized control trial (RCT) in which participants are known ahead of time, who may be nested within groups, and who have observable and known characteristics upon which further stratification is required.
This script assumes that treatment and control group members will receive different materials but are unaware of the difference, that is, the materials themselves will not indicate experimental condition. It’s important, therefore, that each participant gets the correct materials, especially if participants take part in the experiment concurrently. As part of its randomization routine, this script automatically creates labels that can be affixed to the proper materials ahead of time.
Supplementary file requirements
- Requires this code in the same directory saved as
pdflabels.py
. If not found, the script automatically downloads and saves the file. - Requires a
*.csv
file with participant names and any information required if sampling should be blocked within groups (e.g., classroom id, student gender, student race or ethnicity)
To Use
Initialize
In terminal (works on OS X…not tested in other systems), navigate to the script directory and type:
./randomizelabel.py
or, if you want to set the Python interpreter manually:
python<3> randomizelabel.py
Note that this script requires Python 3.x.
Choose task
-----------------------
What do you want to do?
-----------------------
( 1 ) Randomize and make labels
( 2 ) Generate labels from prior randomization
CHOICE:
If you’ve already randomized a roster and simply want to reprint the labels, choose the second option (see instructions below).
(1) Randomize and make labels
Locate *.csv
file
You will be prompted for the location of the *.csv
file. The script will first search the local directory for all *.csv
files and list them:
------------------------------------------------------------
Which CSV file contains the names of those to be randomized?
------------------------------------------------------------
( 1 ) fakeclasslist.csv
( 2 ) File not in this directory
CHOICE:
If you place the names file in the same directory, you can just choose it from here. If you don’t, you should select the number for File not in this directory
. You will then be prompted with:
Please give path to CSV file:
You can give the full or relative paths. For example, each of the below should work:
/Users/<username>/randomizelabel/fakeclasslist.csv
~/randomizelabel/fakeclasslist.csv
./randomizelabel/fakeclasslist.csv
Set seed
Give integer seed of at least 6 digits:
This seed is saved as seed.txt
in the working directory. If you lose all assignment files, but have the roster and seed, you should be able to reproduce the same assignments.
Choose primary unit of randomization
---------------------------------------------
Which column contains the randomization unit?
---------------------------------------------
( 1 ) classid
( 2 ) id
( 3 ) name
( 4 ) gender
( 5 ) racecat
CHOICE:
NB: Randomization unit column cannot contain duplicate values.
Decide if you want to block randomize
------------------------------------
Should random assignment be blocked?
------------------------------------
( 1 ) Yes
( 2 ) No
CHOICE:
If you choose yes
then:
----------------------------------------
On which column(s) do you wish to block?
----------------------------------------
( 1 ) classid
( 2 ) id
( 3 ) name
( 4 ) gender
( 5 ) racecat
You may choose more than one category. Separate multiple choices with a space.
NB: You cannot block on the primary randomization unit.
Check your options
To make that you get what you are expecting, the program will give you some descriptive information about your randomization choices. For example, if you chose to randomize on id
, group on classid
, and stratify across gender
and racecat
, you will see the following:
================================================================================
For the randomization unit: id
................................................................................
Number of unique values = 400
================================================================================
================================================================================
For the grouping category: classid
................................................................................
Number of unique values = 17
Unique values:
ENGL101.01
ENGL101.02
ENGL101.03
ENGL101.04
ENGL101.05
ENGL101.06
ENGL101.07
ENGL101.08
ENGL101.09
ENGL101.10
ENGL101.11
ENGL101.12
ENGL101.13
ENGL101.14
ENGL101.15
ENGL101.16
ENGL101.17
================================================================================
================================================================================
For the stratification category: gender
................................................................................
Number of unique values = 2
Unique values:
Female
Male
================================================================================
================================================================================
For the stratification category: racecat
................................................................................
Number of unique values = 3
Unique values:
1
2
3
================================================================================
Decide the number of treatment groups
-------------------------------------------------
How many treatment conditions, excluding control?
-------------------------------------------------
( 1 ) 1
( 2 ) 2
( 3 ) 3
( 4 ) 4
( 5 ) 5
CHOICE:
Choose the type of labels
--------------------------
Which labels will you use?
--------------------------
( 1 ) Apli-01277
( 2 ) Avery-3422
( 3 ) Avery-5160
( 4 ) Avery-5161
( 5 ) Avery-5162
( 6 ) Avery-5163
( 7 ) Avery-5164
( 8 ) Avery-8600
( 9 ) Avery-L7163
CHOICE:
Choose what you want on the labels
---------------------------------------
What do you want on the printed labels?
---------------------------------------
( 1 ) classid
( 2 ) id
( 3 ) name
( 4 ) gender
( 5 ) racecat
CHOICE:
Separate multiple options with a space keeping in mind that the order matters. For example, 3 2 1
, would gives labels that showed:
<name>
<id>
<classid>
Output
Two primary files are placed in the working directory:
assignment.csv
assignmentlabels_*.pdf
sheets with the labels
assignment.csv
Is a long file that contains, the randomization column and the treatment condition. Example:
id | assign |
---|---|
q4NSkKLNNags | C |
NRIL0Ewhq8A5 | T |
UXCFYfIM6JGn | T |
MMNjGO4CtvlL | T |
5Pe8c9rHidi8 | C |
For merging purposes, it’s probably a good idea to randomize using a uniquely identifiable variable.
assignmentlabels_*.csv
There will be one *.pdf
for the labels for each experimental group. If you only have one treatment and one control, the you will have two files:
assignmentlabels_T.csv
assignmentlabels_C.csv
If you have, for example, two treatment groups and one control, you will have:
assignmentlabels_T1.csv
assignmentlabels_T2.csv
assignemntlabels_C.csv
The labels themselves will not indicate experimental group status (for obvious reasons) so this printing scheme will mitigate mix ups. The number of pages for each group will depend on the types of labels choosen.
(2) Generate labels from prior randomization
If you have already randomized your roster and want to reprint the labels, choose the second option from the first prompt. You will be asked:
-----------------------------------
Which CSV file contains the roster?
-----------------------------------
( 1 ) assignment.csv
( 2 ) fakeclasslist.csv
( 3 ) File not in this directory
CHOICE:
which should be the original roster file, and,
---------------------------------------
Which CSV file contains the assignment?
---------------------------------------
( 1 ) assignment.csv
( 2 ) fakeclasslist.csv
( 3 ) File not in this directory
CHOICE:
which should be the assigment.csv
file generated the first time. These two files will be merged on the randomization column. After these steps, you will once again be asked to choose the types of labels and what you want printed on them.
Adjustments to labels
When reprinting labels, you will able to adjust the printing placement of the labels as well as the font size.
-----------------------------------------------------
Do you want to adjust label margins and/or font size?
-----------------------------------------------------
( 1 ) Yes
( 2 ) No
If you say yes, you will get the following options:
Please enter horizontal adjustment (negative number for left):
Please enter vertical adjustment (negative number for up):
Please enter font size from 7 to 15 (default is 11):
Horizontal/vertical adjustments are additive. A positive number moves the labels to the right and down. Negative numbers are reverse the direction. Units are in millimeters.
Font size must be one of the following options: 7, 8, 9, 10, 11, 12, 13, 14, 15
.
Acknowledgements
- Originators and contributors to PyFPDF
- List of random names and Mark Heckmann at ryouready for helping me generate my fake class data